Monday, April 8, 2019


New methods of integration in TempLS V4 for global temperature.


Background

TempLS is a program that takes the extensive data of surface temperature measurements and derives a global average of temperature anomaly for each month over time. It also produces maps of temperature anomaly distribution. The basic operation that enables this is spatial integration. As with so much in science and life, for Earth's temperature we rely on samples - it's all we have. To get the whole Earth picture, it is necessary to interpolate between samples. Integration is the process for doing that, and then adding up all the results. The average is the integral divided by the area.

The worst way of getting an average is just to add all the station results and divide by the total. It's bad because the stations are unevenly distributed, so the result reflects the regions where stations are dense. This generally means the USA. Some kind of area weighting is needed so that large areas with sparse readings are properly represented. Early versions of TempLS used the common method of gridding based on latitude/longitude. The default method of spatial integration is to form a function which can be integrated, and which conforms in some way to the data. In gridding, that function is constant within each cell, and equal to the average of the cell data. But there is the problem of cells with no data...

Since V2, 2011 at least, TempLS has used unstructured mesh as its favored procedure. It is basically finite element integration. The mesh is the convex hull of the measurement points in space, and the area weight is just the area of triangles contacting each node. For over seven years now I have reported average temperature based on the mesh method (preferred) and grid, for compatibility with Hadcrut and NOAA.

Early in the life of V3, some new methods were added, discussed here. The problem of cells with missing data can be solved in various ways - I used a somewhat ad-hoc but effective method I called diffusion. It works best with grids that are better than lat/lon. I also used a method based on spherical harmonics, with least squares fitting. As described here, I now think this should be seen as an enhancement which can be applied to any method. It is spectacularly effective with the otherwise poor method of simple averaging; with better methods like mesh or diffusion, there is much less room to improve.

So why look for new methods?

We don't have a quantitative test for how good a method is applied to a temperature field. The best confirmation is that the methods are relatively stable as parameters (eg grid size) are varied, and that they agree with each other. We have two fairly independent methods, or three if you count SH enhancement. They do agree well, but it would be good to have another as well.

V4 changes.

V4 does introduce a new method, which I will describe. But first some more mundane changes:
  • Grid - V4 no longer uses lat/lon grids, but rather grids based on platonic solids. Currently most used is the cubed sphere, with ambitions to use hexagons. All these grids work very well.
  • Spherical Harmonics - is no longer a separate method, but an enhancement available for any method. It's good to have, but adds computer time, and since it doesn't much enhance the better methods, it can be better to use them directly.
  • I have upgraded the diffusion method so that it now solves a diffusion equation (partial differential) for the regions without cell data. The process is very simple - Southwell relaxation, from the pen and paper era, when computer was a job title. You iterate replacing unknown values by an average of neighbors.

The LOESS method

The new method uses local regression - the basis of LOESS smoothing. Other descriptive words might be meshless methods and radial basis functions. The idea is that instead of integrating the irregular pattern of stations, you find a set of regularly spaced points that can be integrated. In fact, using an icosahedron, you can find points so evenly spaced that the integral is just a simple average. To estimate the temperatures at these points, weighted regression is applied to a set of nearby measurements. The regression is weighted by closeness; I use an exponential decay based on Hansen's 1200 km for loss of correlation. But I also restrict to the 20 closest points, usually well within that limit.

The regression can be relative to a constant (weighted mean) or linear. The downside of constant is that there may be a trend, and the sample points might cluster on one side of the trend, giving a biased result. Linear fitting counters that bias.

I'll show test results in the nest post. I think the LOESS method is at least as accurate as the mesh method, which is to say, very accurate indeed. And of course, it agrees well. It is flexible, in that where data is sparse, it just accepts data from further afield, which is the best that can be done. You could think of a grid method as similarly estimating the central values, which can then be integrated. The grid method, though, artificially cuts off data that it wall accept at the cell boundary.

The LOESS method also gives a good alternative method of visualisation. My preferred WebGL requires triangles with values supplied at corners, when GL will shade the interior accordingly. I have used that with convex hull mesh (eg here), but when triangles get large, it produces some artifices. Using the underlying icosahedral mesh of LOESS has uniformly sized triangles. Of course, this is in a way smoothing over the problem of sparse data. But at least it does it in the best possible way.

Here is a WebGL plot of June 2019 (changed later) temperature anomaly, done the LOESS way. As usual, there are checkboxes you can use to hide the mesh overlay, or the colors, or even the map. More on the facility and its use here.



You can contrast the effect of the LOESS smoothing with the unstructured mesh representation here. Both present unadjusted GHCN V4, which clearly has a lot of noise, especially in the USA, where quantity seems to degrade quality. None of this detracts from global integration, which smooths far more than even LOESS. I think that which it is occasionally of interest to see the detail with the mesh, the LOESS plot is more informative. The detail of mesh had been useful in GHCN V3 for spotting irregularities, but in the US at least, they are so common that the utility fades. In much of the rest of the world, even Canada, coherence is much better.









0 comments:

Post a Comment