Saturday, January 23, 2010

GHCN Station selection.

WUWT has featured Marc Sheppard's American Thinker article, which gives a prominent place to E.M.Smith's claims about how GHCN has been selectively dropping cool stations to boost apparent global warming. I was going to blog extensively on this, but I see that Real Climate. and more completely Zeke Hausfather have it well covered. However, maybe there's a bit to add to the WUWT discussion.

First some elementary things which people like Smith and Sheppard really should check before going into this stuff. All climate trend work is done in terms of temperature anomalies. For each station a base average for each month is established over some base period (1961-1990 for GHCN). For each month, the anomaly is the difference between the average for the month and the base average. So the base average takes out the effect of location, and just reports whether, for that station, the temp was higher or lower than "usual". This is vital in establishing a regional or global average - otherwise you really do have to worry about whether you have a fair distribution of hot and cold places. That is hard, and its why you'll see few references to a global temp in deg C.

So selectively dropping cool stations would not at all make global anomaly bigger or smaller.

The other elementary step is to enquire why the stations were discontinued. The NOAA explains their station selection thinking here. Zeke expands in his post. The bottom line is that GHCN was a historic database. It could include many more stations because it had years to process the data. But once you start trying to maintain such a database in real time, it's much harder. Data doesn't come preprocessed and checked down the wire from much of the world. A lot comes in print, and has to be digitised. You need to rather carefully work out just how much you need. That's why NOAA did not try to keep up many of the stations that contributed to the historic network.

Anomalies and Zeke's analysis.

The basic study of the independence of anomalies from local topography etc was Hansen and Lebedeff (1987). It showed that over a period of time, anomalies correlated well over distances up to 1200 km and more, without taking account of long term average temp. That means that you can indeed select representative sites without trying to balance this factor.

I'm still not convinced by E.M.Smith's claim that discontinued stations are cooler. But suppose they are. Does it make a difference? Zeke says no. Here's his plot, showing that if you take stations currently reporting, and compare with stations that have a long record in the GHCN database but do not currently report, then there is no real warming or cooling effect.

And to show that the high-latitude altitudes still get coverage, here's his distribution of reporting sites:


In the course of the WUWT thread, I did check one Sheppard/Smith claim in detail. They scoffed at the representation of Bolivia, showing this map:

and saying:

“There’s a wonderful baseline for Bolivia — a very high mountainous country — right up until 1990 when the data ends. And if you look on the [GISS] November 2009 anomaly map, you’ll see a very red rosy hot Bolivia [boxed in blue]. But how do you get a hot Bolivia when you haven’t measured the temperature for 20 years?”

Well, we see a big red patch around Bolivia. It seems uniformly pretty warm, and if that applies to Andean topography generally, it seems suggesting that Bolivia was warmer than usual is not unreasonable. So I looked around for stations near Sucre, Bolivia, and found:
1.63C.....346km..3410m..La Quiaca
2.66C.....595km...950m..Jujuy Aero

A good range of altitudes there. The Dist is distance from Sucre. Arica is by the sea (desert); others are inland. The anomalies show some variation, but they are all pretty warm, as the red patch suggests. No reason to expect Bolivia to be different.


  1. First glad to see you are blogging and keep up the good work!

    If you check the data, you will find there is an effect from latitude. Stations that have a higher latitude (e.g., closer to NP) have larger temperature trends than those on equator. Also, land has a greater trend than ocean.

    Shifting the distribution of stations and doing a straight sum without gridding it is guaranteed to have a measurable effect. If you grid it and properly interpolate for missing stations, shouldn't see much of a difference.

    I worry a bit about the treatment of extreme topography (mountains) I deal with that in my own work (we need to infer the temperature and wind profiles above mountain ranges for example).

    I'm guessing in the end that the percent of the Earth's surface covered by extreme topography is very small, and the weighting from the total sum is correspondingly tiny. Hence the treatment for extreme topography could be way off, with the trend not being significantly affected. It's not much of a test in other words and until climate models get enough fidelity to be able to accurately model climate on that fine of a resolution, it hardly matters.

    Another thing to look at would be comparing satellite measurements to surface instrumentation reconstructions. For "problem areas" like Bolivia how do these compare?

    Finally a suggestion: I'd show the temperature difference between the two curves as well as superimpose the two curves on the same graph. The residual plot is much better at pointing out differences between the two curves, superimposing them tends to blot that out.

  2. Thanks for your welcome to the world of blogging. I'm not aiming to be a new Lucia, but it is good to have somewhere to be a bit more expansive and put lengthier propositions that hopefully can be argued in more detail (with pictures).

    I agree about the extreme topography. One thing I noticed about the current GHCN set is that it tends to avoid mountain-top readings which some weather services (like our BoM) are fond of. As you say, such sites don't represent much, so while you might hope their anomalies track reasonably well, there's no point in risking it.

    I agree too about the drift in trend. It's rather a second order effect, though, locally. As far as I could tell from E.M.Smith's blog, any overall shift in distribution is fairly small. If you were calculating a global or zonal temp in C, that would be important, but for anomalies it should be minor.

    Ironically, shifting stations away from the poles should then have a cooling trend.

    I think satellites aren't much use for extreme topography. If you look at this RSS plot, it blanks out around Bolivia and Tibet. The resolution is a problem too.

    Thanks for the suggestion about the two curves. It's Zeke's plot though, and I don't have the data. I think the point of it is to show that there's no obvious systematic difference (especially in trend) between the "continued" and "discontinued" sets, at least for long records.

    The numbers of stations concerned vary over the years, with the discontinued fading to zero about 1995, so it probably shouldn't be over-analysed.

  3. Yep, the discontinuous line get kinda wonky near the end, since the number of stations available (e.g. has data for 1995 but was "dropped" prior to my 2000 cutoff) decreases pretty dramatically. You can see that here:

    To do a proper spatial analysis, you could divide the world up into some fairly broad grid cells, assign each station to a grid cell based on its lat/lon, and make a graph comparing anomalies in all grid cells that include at least one discontinuous and continuous station for each year. Unfortunately doing it reasonably quickly is a tad beyond my programming ability.

  4. this is good work, I am going to download the GCHN data myself so I can scrutinize some of the claims being made against the surface records. The fact that the claimers seemed to ignore the GHCN documentation clearly mentioning why station dropout occured doesn't speak well of the claims at all.

  5. You know what? E. M. Smith is questioning the interpolation process. There are a couple ways of checking that directly. One is to use climate model output, but some people don't like models so we'll leave that aside.

    The other possibility is to go back into the 1980s, interpolate the temperature anomaly at some point in Bolivia using the non-Bolivian stations listed above, and then compare it to the actual data from Bolivia.