Steven Mosher was looking through the code for TempLS v1.4 when he saw something puzzling and asked me about it. And sure enough, it was an error. One of those things a fresh pair of eyes can notice.
It was in the section that takes the station list and assigns them to a cell number, depending on what 5x5 lat/lon block they fall into. It doesn't matter what the number is, as long as the stations in the block have the same number and others have different numbers. The stations with that number are counted, and the sum is used in the least squares weighting.
Here are the relevant lines of code:
d = floor(tv[,5:6]/5); # station lat and long;
cellnums = unlist(d*36+d+1333); # 5 deg x 5 deg cells numbered 1 to 72*36=2592;
The first line divides the lat/lon by 5 and rounds to an integer. So each station has an integer pair identifying its cell.
They range from (-18,-36) to (17,35) - there are 2592 cells.
The next line tries to number these from 1 to 2592. Numbering a rectangle array is normally done either by row or by column. The 1333 is an offset to bring them to a positive range. I'm numbering in rows, so 36 should be the number of columns. But alas, there are 36 rows and 72 columns. I made the mistake because lat/lon, on the map, is actually in (y,x) order rather than the conventional (x,y).
I'll post the revised code at Ver 1.41. Ver 2.0 should be out soon.
What is the effect?Basically, two cells get assigned to every number. It turns out that they are opposite in longitude. This affects the weighting, which is meant to correct for station concentration. When an area is well covered by stations, the least squares process downweights each individual station, so the area does not get undue attention. But by combining with a cell on the other side of the world, this downweight could be out by a factor of two.
I should hastily say that it doesn't mean that temperatures from the other side of the world are modifying the readings. They don't - they only modify the weight given to the readings.
Fortunately, it won't matter for any regional studies. The reason is that there are then no stations included from the other side of the world. The weights are unaffected.
A digression on weighting
Another way of seeing the use of weighting by inverse density is that it does what you would do if you were trying to use the stations in a numerical integral expression. As such. it has a fault when grid cells can be empty. The natural thing to do with an empty cell is just to leave it out, as GISS does, and I suspect the other indices. So did I. But leaving it out, when compiling a global average, is effectively the same as including it with a value artificially assigned to it, which is equal to the global average. And when you look at spatial patterns of anomalies etc, it's clear this often isn't optimal. You might have regions like the Arctic where the cells that do have stations have high (or low) values, but many are missing. Implicitly assigning these to have average value biases the result down (or up).
GIStemp get criticised for extrapolating over wide regions in the Arctic. But it's the best thing to do (in a bad situation). leaving the regions out seems more conservative, but it isn't.
Anyway I've been looking at ways of using irregular triangular subdivisions instead of a regular grid. The idea is that when stations get sparse, you just make the subdivisions bigger. No part of the Earth lacks a representative cover. As I mentioned, sparseness is a problem - here the triangles get big, so the coverage degrades. But it's better than none.
Back to the errorI recalculated some of the main global plots of recent posts. Here old and new are contrasted. As you'll see, the discrepancies are noticeable but not major. Mainly the error made peaks and valleys more extreme. I don't believe the revised data would show out on Zeke's combined blogger plots. At the bottom of each plot the modified trend is shown. The changes are small.
There are sometimes some biggish changes in late 1930's. I think this reflects the fact that these years were hot in ConUS, which were then overweighted in the older version.
Global Land and Sea
Global Sea (SST)
61 station subset