Recent independent codes, including TempLS, and notably Muir Russell, have verified that not only are the indices calculated properly, but the various red herrings about adjustments, airports etc do not create a noticeable bias. However, they are all based on GHCN and a few ocean data sets (mainly HADSST2) so an alternative dataset would be welcome.
Ron Broberg went in search of a set based on SYNOP reports, which include a lot more stations (as recommended by Gavin). He found the GSOD data at NCDC, and made the huge efforts needed to get the data scattered over thousands of files into a GHCN format. I downloaded his files from here.
This format is ideal for TempLS, and the expanded number of land stations is useful. So I looked to see how well the new dataset matches GHCN-based results. In future posts I'll look at regional results. There are, for example, many more Arctic stations than in GHCN. And there has been no great reduction in recent years. On the other hand, there's very little data pre-1940, and it's fairly gappy up to the mid 70's.
The data set is generally looser than GHCN, and for me the main practical difficulty was that some data did not have corresponding entries in the inventory file. So I upgraded the preprocessor to handle this. I'll release this soon as V 1.5.
Here is a plot of the GSOD results compared with the indices, and some independent calculations. The figure is adapted from Zeke. Then I'll show side-by-side presentation of TempLS calculations for GSOD and GHCN data.
Anomalies based on 1971-2000. Comparison of GSOD TempLS results with indices and some blogger efforts. The GSOD result is in thick black, indices are also the thicker lines.