Tuesday, September 20, 2016

CRUTEM (HADCRUT) versions are documented and accessible

I have encountered at WUWT ongoing complaints about HADCRUT 4 updates. It is currently in a thread here, but goes back to an earlier post here. The complaints typically say that the new versions always raise current anomalies, and suggest that they are poorly documented. In fact, the changes are extensively noted; see directory here.

In the earlier thread Tim Osborn commented here, to say mainly that the changes were due to changes (mainly additions) to station data, and listed the particular additions to HADCRUT 4.3. He also later made the important point that there is a good reason why the trends rise with new data. HADCRUT is a land/ocean set, but the empty cells are mainly on land, and the new data allows some of them to be filled. HADCRUT is an average by grid (area-weighted), in which cells without data are simply not included. That has the effect of assigning to them the global average, which is dominated by sea temp. If new stations assign to empty cells genuine land values, that will increase the trend, because land is warming more rapidly. HADCRUT had artificially low trends because of this missing value policy, as was remedied by Cowtan and Way (2013) - discussed here in a series of posts, with links here.

But another feature of HADCRUT transparency is insufficiently appreciated. For Ver 4, at least, they give a complete listing of station data for each version, with each station file documented. Here is a typical version file; it is for 4.4, but just change that URL to 4.2 or whatever you want. Each links to a zip file of the station data for that version (except for Poland), which has a URL like http://www.metoffice.gov.uk/hadobs/crutem4/data/previous_versions/ I'm spelling out the URL because if you click on it, it will immediately download about 18Mb. But again, you can edit for other versions.

I couldn't find, though, inventory files, except for 4.5. But it's easy enough to make them from the file headers. So I've done that, and placed the zipfile here (612 Kb). It has a csv file for each of 4.2-4.5, and the columns have 3 letter abbreviations meaning:
  • num - a unique HAD station number
  • nam - name of station
  • cou - country name
  • lat - latitude in deg
  • lon - longitude in deg
  • alt - altitude in m
  • sta - start year of data
  • end - end year
  • sou - source id code
UEA has an explanations file here, which is the best source I have found for the source id codes, but unfortunately it dates from 2012, and there is a new one pretty much with each new dataset that has come in. I'd be glad to hear of something more recent. It isn't really a problem, because the later numbers are in order of addition, so are easy to work out. Note that the files are in number order, but countries are not necessarily consecutive blocks.

So I thought I would just post this information, so that people who really want to know what HADCRUT is up to can look it up. I may in future produce a Google map.


  1. "HADCRUT is an average by grid (area-weighted), in which cells without data are simply not included. That has the effect of assigning to them the global average, which is dominated by sea temp."

    I'm embarrassed to say that despite reading through some of the relevant publications and documentation I didn't realize that was the case. So 1/3 of Africa and 1/5 of South America are simply omitted... that is something I'll keep in mind in future.

    1. Actually, there is a wrinkle that I should have mentioned; they average by hemispheres, so in effect the hemisphere average is used there. But yes, and the Arctic too. That is basically what C&W were doing - replacing the imputed NH average of those cells by something more locally appropriate.

  2. Nick, your explanation of the changes in HadCRUT4 makes good sense to me. However, I still have trouble with their "95% confidence limits", which average at +/- 0.15C for January through July of 2016. Based on my extensive experience in working with surface air temperature measurements and data for over 40 years now, my subjective feel for the uncertainty is more like at least about +/- 0.3C and maybe as much as 0.5C and increasing the farther back you go in the estimates (probably more like +/- 0.5C to 1.0C for the 1850-1900 period where they show an average of +/- 0.3C).

    1. Bryan,
      Hadcrut has been committed to the ensemble approach used also in forecasting, and I expect that is where th confidence comes from. As with any approach, the answer depends on what you vary going into the ensemble (or equivalent). When GISS or NOAA quote (larger) uncertainty, my understanding is that the largest component is station spread - ie how much different would it be if we measured in different places. I don't know if HADCRUT includes this. But they have written quite a lot about it - I should look it up.

  3. Bryan, when you write "...my subjective feel for the uncertainty is more like at least about +/- 0.3C and maybe as much as 0.5C..." do you believe they left out important uncertainties or disagree with the values they place on individual uncertainties?

    To accept uncertainties of +/- 0.5 C we'd pretty much have to ignore all of the phenological evidence we've accrued. Either that or make the argument that most earth systems are much more sensitive to small changes than currently accepted. For biological systems that might make some sense, but for lake ice, glaciers, sea-level and other non-biological systems one has a hard time escaping the physics. One can ignore thermometer or temperature readings and simply look at phenological data and arrive at the same conclusion.