Tuesday, August 2, 2011

Global surface temperature coverage.

Global surface temperature coverage.

There have been articles in the blogosphere in recent years with titles something like "the dying of thermometers", and showing a plot something like this on the variation of GHCN stations over the years.

Raw station numbers aren't a good guide. GHCN was originally a grant-funded archiving exercise. It did not focus on even coverage - they largely archived what was available. The ongoing version of GHCN (since the mid 90's) relies on the submission of CLIMAT forms by countries, and this seems to be rationalised by area.

I was looking into the statistics of the cell weighting schemes I described in the previous post, and realised that these provided a good quantification of coverage over the years. In cell weighting I divide the surface into cells and calculate the number of stations reporting in each cell in each month. Empty cells represent coverage failure, so I've catalogued their occurrence. They are therefore a measure of coverage over time. Below the jump, I'll show how the global coverage of CRUTEM3, GHCN and HADSST2 has varied over the years. It has been generally improving.

My original aim was to compare the lat/lon cell scheme with the equal area scheme (see previous post). I did a run from 1900 to 2010, computing for each month the sum of weights as a fraction of what they would have been with every cell reporting. This is, put another way, the area of reporting cells as a fraction of total. As a reminder of the schemes and the empty-cell pattern, here are the two schemes. The stations shown are those that reported in April 2009; the yellow cells had no reporting stations. You can click to see double-size.

So for the CRUTEM3 analysis I plotted the time sequence of the fractional area occupied by the blue cells:

The graph shows both the monthly pattern and a 12-month running average. There is a lot of seasonality in the readings.

Commenting first on the weighting methods, yes, as expected the equal area method, tagged as 0, gives better coverage. Of course, this figure has to be balanced against resolution - either method would give perfect cover with large enough cells. But in this case, both have the same 5x5° cells at the equator, which is small enough to expect that variations within the area would not be large.

On total coverage, the effects of the two world wars on SST measurement are very evident. Otherwise, while coverage might be said to have peaked about 1980, it has been fairly even since. The big post-1988 decline in station numbers is associated with a small reduction in cover. You can see that expanded in this graph:

Looking more into the seasonality, here is an expanded plot from a period when it was very pronounced. It peaks around Dec-Jan. That may be due to better Southern Ocean SST coverage - a very large area.

CRUTEM3-GHCNV3 comparison

There isn't much to say here about total cover - GHCN very slightly greater. I'll show scheme 2 - the other is the same. In the legend, Giss means GHCNV3.

The actual pattern of empty cells may be of interest - here is a comparison of GHCN and CRUTEM3 for weighting style 0, for April 2009, as in the previous post: Remember both are using HADSST2, so don't expect any difference at sea:

GHCN Stations April 2009CRUTEM 3 Stations April 2009


  1. Eli would suspect that the WWI and WWII sst measurements are still keeping company with the Ark of the Covenant in some government warehouse. There was a lot of sea traffic.

  2. Very useful post. I just encountered the station dropout argument for the first time, and made good use of it.

    Kevin C

  3. more data--
    7676 monthly stations from canada..
    wont change the answer, but My machine is now processing all the monthly data from environment canada. Turns out some researchers would like have it. anyway


    WRT to SST in the SH:

    if you look at ICOADS data (raw) in the SH ( an animation) you can see that the coverage has seasonality in it. beats very nicely. I have that work around here somewhere. I think I showed it to peter webster.. so I must have it somewhere.. ( mac is pooping out )

  4. Pretty cool work Nick.

  5. Thanks, Jeff. R is fun to work with.

  6. Hi Nick,

    I had not been aware of the improved coverage. Thanks for nicely documented presentation. As is often the case when data is presented clearly, you get ideas for follow-on questions. In my case, I am curious whether you are in a position do do statistics on your data sets to assess "box quality". For example, you could do exactly what you did before, but calculate coverage of boxes that have at least ten stations in them. Obviously, in those plots, the effect of the reduced number of stations will be somewhat more pronounced. My ten stations minimum is not based on any deep knowledge. The amount of variance between the stations in a typical block should govern.

  7. Will,
    Yes, that's fairly easy to do. I think 10 is a bit high; in fact if spread evenly the current average would probably be less than two. GHCN has spread then fairly evenly, but also thinly.

    I'll do a plot that colors the cells according to number.

    Thanks for the kind words.

  8. Will,
    I should follow up this comment by saying that it is really only meaningful over land. SST is already an aggregate of a number of readings (of different kinds) in each cell.