Thursday, January 2, 2014

Just 60 global stations - area weighting shown with GL graphics.

Twice before I've written about using just 60 met stations to estimate the Earth's global temperature anomaly. The first time I simply chose 61 stations according to "quality" criteria (rural, 90 years records). The resulting average was quite close to HADCRUt land only and the GISS met stations index. This was somewhat surprising, because the stations were in effect not area-weighted, but sin(latitude) weighted.

Then I wrote about trying to get proper area weighting using triangular meshing. In doing that, I used sequential meshes to try to get more even distribution. That post came just before my recent interest in better graphics, and I could not then present the mesh results properly. I can now. The temperature analysis there still stands, and is not developed in this post, but hopefully soon.

It is worth reviewing the motivation. The first post was a response to a challenge by Eric Santer, who thought that 60 stations should be enough for a good global average. I thought so too, but there is more to it. I'm in fact doing a version of the GISS Met stations only index, which is different from land-only in that the stations are weighted by global surface area, so ocean areas are represented by land stations. Compared with land/ocean, which uses SST, this has the plus that it measures the same thing everywhere (air temp 1.5 m above surface), and the met stations have a rich and explicit history. The minus is that the data for ocean areas is far sparser than SST.

When a large area (oceans) is sparsely covered, there is little benefit to the average in having part of the land area densely covered. It makes sense to select a subset of stations, trading density for quality. If the subset can be reduced to 60, say, then the individual stations can be more closely scrutinised. "Quality" here is a combination of long records and apparent freedom from biases like UHI (rural stations).

We want stations of quality that are also fairly evenly distributed - ie have approx equal area weighting. This post does that by compromising initially on quality to get a larger subset, and then in several steps remeshing, eliminating stations of least weight. This is shown by a display of the area attributed to each station at each stage of reduction. The display is in WebGL, which means you can rotate the Earth track-ball style. It also shows a histogram of weights, and lets you click to see the station names.

The plot

Initially, 622 stations are shown. These are GHCN stations classified rural, with at least 50 years of record continuing to at least 2010. Then at each of six stages, the bottom third of area weighted stations are removed, and the remainder remeshed. The radio buttons show the result.

The area assigned to each station is formed by connecting the circumcentres of each of the mesh triangles. Each area is the set of points where that station is the nearest node. See below for some details here. The plot shows the area and the station with a purple triangle. You can click near this to show the station name top left. Bottom right is a histogram of the area weights.

Update - there is a glitch. There are two plots below, and only one can be active (for trackball etc) at a time. That is determined by which was last selected (by radio button).

Quality specs
at least 50 years
Some data since 2009
Number stations


The circumcentres are not exactly the plane centres, because they are projected onto the sphere. The lines are made of two sections, with the midpoint apso projected onto the sphere. This makes them a little more like great circles.

In fact the weight is not quite area. It is for each triangle the volume of the tetrahedron formed by the triangle and earth centre. This is area (triangle) times height, which is very close to the earth radius.

You might expect the areas under the histograms to be much the same, while they seem to increase with subdivision. The reason is that the y-axis varies, but I wanted to keep the same plot size and the same x-axis.

Clicking to show station names only works for the plot where you last clicked a radio button.


A merit of this approach is that a small change can generate a substantially different mesh. So one can test how much the average depends on the specific mesh. That's the next project.

In the plot below, I specify a higher quality spec (70 years of data). That means fewer stages of subdivision, so the penalty is less even weighting. The initial disparity is such that the subdivision only affects part of the Earth, as you can see.

Quality specs
at least 70 years
Some data since 2009
Number stations


  1. Oale commented on an earlier post, saying among other helpful things that the last plot had been covering the comments section. Thanks, fixed.

  2. Moving the discussion here, the Berkeley earth research for the two stations at Hawaii I talked about in the last post.
    Would still indicate that Lihue station has a better early record than Hilo (General Lyman Airport) station. There has possibly been more inaccurate in-terms reading the thermometer for the larger amount of rejected observations but the amount is still quite small, and the station location hasn't moved too much in the early record.

  3. too bad the Lihue station location has wrong coordinates, if we're to believe bing.maps and Berkeleley earth research being accurate. So the Hawaii station to be included would likely be Hilo, I guess in tropics the heating from the houses doesn't affect much.

    1. It's surprising that the criteria as stated do not include a Hawaii station. At this stage I've mainly been concerned with merging quality and spatial criteria, without thinking too much about what the quality criteria should be.

      Probably the requirement of data after 2009 is too severe, though gently relaxing that would not add Hawaii stations. I think the rural designation has to be maintained.

      You've made a valuable point though. Partly the idea of reducing to a small number of stations is that they can then be carefully scrutinised for quality. But as you say, there are also critical stations that have been omitted, but would give better spatial coverage if included. It is not clear how to look at these systematically.

    2. "But as you say, there are also critical stations that have been omitted, but would give better spatial coverage if included. It is not clear how to look at these systematically."

      Calculate for each station a nearest_station value. Then it should be possible to include those that are very isolated. You can then vary this by including the xx most isolated stations to see if the concern actually has an effect.

    3. Kevin,
      The present system gives the area of surrounding, which is a measure of isolation. The problem really is that it is refined on quality first, and then spatially. It would be better to do both simultaneously, being more tolerant of quality for isolated places.