Monday, July 12, 2010

Using TempLS on an alternative land temperature data set - GSOD.

A few years ago there was a fuss about the unavailability of the main codes for preparing temperature indices. They were said to perform illegitimate data manipulations to exaggerate temperature trends. Then GISS released the code and data for Gistemp. There were various nit-pickings, but the folks demanding code seemed at that stage to not really get it working. Suspicions remained, and started to focus on the GHCN dataset which was the major resource for all the indices. Thermometers were being eaten, a southward march etc.

Recent independent codes, including TempLS, and notably Muir Russell, have verified that not only are the indices calculated properly, but the various red herrings about adjustments, airports etc do not create a noticeable bias. However, they are all based on GHCN and a few ocean data sets (mainly HADSST2) so an alternative dataset would be welcome.

Ron Broberg went in search of a set based on SYNOP reports, which include a lot more stations (as recommended by Gavin). He found the GSOD data at NCDC, and made the huge efforts needed to get the data scattered over thousands of files into a GHCN format. I downloaded his files from here.

This format is ideal for TempLS, and the expanded number of land stations is useful. So I looked to see how well the new dataset matches GHCN-based results. In future posts I'll look at regional results. There are, for example, many more Arctic stations than in GHCN. And there has been no great reduction in recent years. On the other hand, there's very little data pre-1940, and it's fairly gappy up to the mid 70's.

The data set is generally looser than GHCN, and for me the main practical difficulty was that some data did not have corresponding entries in the inventory file. So I upgraded the preprocessor to handle this. I'll release this soon as V 1.5.

Here is a plot of the GSOD results compared with the indices, and some independent calculations. The figure is adapted from Zeke. Then I'll show side-by-side presentation of TempLS calculations for GSOD and GHCN data.

Anomalies based on 1971-2000. Comparison of GSOD TempLS results with indices and some blogger efforts. The GSOD result is in thick black, indices are also the thicker lines.


  1. Roy Spencer's ISH project is also relevant, in the same way. Roughly speaking, the ISH is complementary to the GSOD, but shows 4 daily measurements, instead of a daily mean. I think.

    What do you mean by 'looser'?

    Speaking of thankless labor, I suppose at some point, somebody could build up the inv file for the extra stations. The satellite index shouldn't be too hard; the rest could maybe be left as dummy values.

    At that point, I think the bloggers would have made something that the professionals would actually want to use.

  2. oh, and I absolutely love that the Russell panel went and did their own calculation. Unexpected, but awesome. You don't need to copy somebody else's code to do something for yourself.

  3. Good! Good! And I'm happy to take feedback. Let me know about processing issues. I'll take another look at the inv/mean station matching. I thought I had caught all of those running GISTEMP. Sorry about that.

    Comment: You should use the same scale on your side-by-side charts

  4. Ron,
    Thanks for the suggestion - I've rescaled the side by side plots.
    CE, by looser I meant mainly that in GHCN for each inventory line there is data, and for each data, an inventory entry. That's fixable. But GSOD has also, for example, the missing year 1972 etc, which meant I had to compare only back to mid 70's. And I expect that there is less QC on GSOD (but I may be wrong).

  5. I think less QC on GSOD is a safe assumption.

    I don't think GSOD is globally useful before 1973, based on the maps Ron made. But maybe some specific regions.