Wednesday, April 5, 2017

Global 60 Stations and coverage uncertainty

In the early days of this blog, I took up a challenge of the time, and calculated a global average temperature using just 60 land stations. The stations could be selected for long records, rural etc. It has been a post that people frequently come back to. I am a little embarrassed now, because I used the plain grid version of the TEmpLS of the day, and so it really didn't do area weighting properly at all. Still, it gave a pretty good result.

Technology and TempLS_ has advanced, I next tried using triangular mesh with proper Voronoi cells (I wouldn't bother now). I couldn't display it very well, but the results were arguably better.

Then, about 3 years ago, I was finally able to display the results with WebGL That was mainly a graphic post. Now I'd like to show some more WebGL graphics, but I think the more interesting part may be tracking the coverage uncertainty, which of course grows. I have described here and earlier some ways of estimating coverage uncertainty, different from the usual ways involving reanalysis. This is another way which I think is quite informative.

I start with a standard meshed result for a particular month (Jan 2014), which had 4758 nodes, about half SST. I get the area weights as used in TempLS mesh. This assigns weight to each nodes according to the area of the triangles it is part of. Then I start culling, removing the lowest weights first. My culling aims to remove 10% of nodes with each step, getting down to 60 nodes after about 40 steps. But I introduce a random element by setting a weight cut at about 12.5%, and then selecting 4/5 of those at random. After culling, I re-mesh, so the weights of many nodes change. The rather small randomness in node selection has a big effect on randomising the mesh process.

And so I proceed, calculating the new average temperature at each step from the existing anomalies. I don't do a re-fitting of temperature; this is just an integration of an existing field. I do this 100 times, so I can get an idea of the variability of temperature as culling proceeds.

Then, as a variant, I select for culling with a combination of area and a penalty for SST. The idea is to gradually remove all ocean values, and end up with just 60 land stations to represent the Earth.

The objective here is just to see how coverage uncertainty affects one integration. That uncertainty is the uncertainty of what would happen if you had chosen stations indifferent places. With the original set, we can't do this, but the random culling has that effect.

In future, I expect to make a much more general criterion, which would include length of record, and rural satus etc. And this would generate a global time series to compare. But as said, for now I just want to see the effect on one integration.

So here is a plot of 100 sequences of global average as I cull to 60 stations in 40 steps of constant ratio.

An important thing is that the mean anomaly seems to remain fairly level at about 0.9°C. Culling obviously increases spread, but doesn't seem to bias the result. And it doesn't increase spread that much. It's obviously large at 60 stations, but at about 535 stations is not so large. It is of course very small near 4000, but that isn't meaningful, because at that stage the randomness is having limited effect. Anyway, I next quantify the mean and spread:

And again the mean is steady, even down to 60 stations. There is no fancy statistics here; sigma is just the standard deviation of the 100 runs at each step. But it's less than 0.1 after 30 steps, which is about 139 stations. And even the 2 sigma range is less than 0.1 at 535 stations.

So how bad is that? The thing is, individual months are noisy anyway.With a bit of smoothing, that uncertainty will have quite a small effect on the error of a time series, of annual averages, say. We'll see when I do that.

Now for the results of discriminating against ocean locations, which actually eliminates most of them by about halfway, and all of them well before the 60 station limit. Now of course the mean does change with the culling:

One might expect it to approach an asymptote. I think what actually happens is that SST fades fast, and in the late stations island stations get higher weighting, which may explain the late decline. The spread is now generally greater, because the coverage for the same number of stations is a little worse, and possibly because land stations are more variable anyway. And here is the graph of mean and quantiles:

Finally here is the WebGL plot of the progress of a single culling run of each kind. The radio buttons have first a range of snapshots of uniform culling, then of culling favoring land. The details of this app are now on the Moyhu page here. Remember, you can click on any one of the plots to show the name and anomaly at the nearest node (NA mans ocean). You can also use the checkboxes to toggle on/off any of the objects.

Appendix - some sphere fitting technicalities

I'll describe an intersting technical issue in producing these plots. When you get down to 60 nodes, the triangles that connect them are not small relative to the radius, and the result looks non-spherical around the edges. I sought to improve by sub-meshing the triangles (and edge lines), interpolating on to the new nodes, and projecting them onto the sphere surface.

But a surprise. It sort of worked, but there were gaps; it looked as if the surface had been carved up, mainly toward the periphery. The reason is that adjacent triangles are sub-meshed in different ways, and the new nodes often don't match up. The deviations aren't large, but they look funny if you can see through.

The solution was to not join the triangles to each other, but to their common join, now well below the spherical surface. WebGL will happily shade all that in, and it is fairly continuous, so no discontinuities any more. But it does involve adding an extra fringe of triangles to each subdividion.

All this is done with the setting U.sphere=2. If you want to download the user file for this app, it is here.


  1. Could you create your own "Ts" series?

    1. cce,
      Yes. The last few in the penalise SST series are effectively Ts. I would just need to penalise more heavily.

      The next in this series will be one of selecting a station set for the long term, based not just on weight, but on long record, with preference for rural. With Ts, one might as well cull a lot of land stations, since ocean coverage is so stretched. Hopefully some time this month.

    2. Thanks. I realize that this isn't the point of your post, but is there a technical reason you can't/shouldn't do a run with land at full force and no SST?

    3. cce,
      It sort of is the point of the post, and as I say, the bottom 60-station represents the limiting implementation. It's all land. More land stations could be included, but it wouldn't do that much to help coverage error.

      The technical problem is that vast areas of sea have to be represented by land stations, and the question is, which stations? It can make a big difference, because a few stations get very large weights. The purpose of the gradual culling is to spread that weighting as evenly as possible.

    4. Nick, there is much less noise in maritime met stations, compared to continental stations, probably due to the dampening effect of oceans.
      If one could do a perfect ocean SAT-index, based on coastal and island met stations only, models predict that ocean SAT would be very similar to global blended SST/SAT temperatures.
      Compare dotted yellow and blue graphs in the following chart:

      Notice that the blended Gistemp is not loti, but one with SST temps (-1.8) at sea ice. Loti would be a bit higher, approaching 1.00 C by latest data point

    5. Hi Olof, I agree that oceans probably have a dampening effect. I suspect you didn't mean humidity.