Tuesday, September 22, 2015

Better gridding for global temperature

Computing global temperature is an exercise in spatial integration with scattered data. I have written a lot about it previously, eg here or earlier here. A spatial integral is a weighted average, so it comes down to calculating the weights. With TempLS, I first used a grid method, as is traditional. Then, to overcome the problem of empty cells, I used an irregular triangular mesh, as in finite element integration. I have also developed, and will soon describe, a method using spherical harmonics. I think the later methods are better. But grids also have some advantages, and I have long wanted to get a rational infilling basis.

Numerical integrtaion

Integration is usually defined as a limiting process, whereby the region is subdivided into finer and finer regions, which can then each be evaluated with some local estimate of the integrand. There is theory about whether that converges. With a finite amount of numerical data, you can't go to a limit. But the same idea applies. You can subdivide until you get a result that seems to depend little on changing the subdivision. Sometimes that won't happen before you run out of data to meaningfully estimate the many subdivisions. That's one reason why temperature anomalies are important. With absolute temperature, you would have to divide very finely indeed to be independent of topographic variation, and there just aren't enpigh reading locations to do that. But anomalies take out a lot of that variation, making practical convergence possible.

You might ask - why bother with different methods, rather than finding just one good one? The answer is with this idea of reaching an invariant zone. If we can find an integral estimate that agrees over several different methods, that will give the greatest confidence in the result.

Grid considerations

With gridding, you can choose a coarse grid so that every cell has data. But then the data may not be a good estimate of the whole grid area. You lose resolution. A finer grid will start to have cells with no data. Traditionally, these are just omitted, meaning in effect that they are assumed to have the global average value. This was improved by Cowtan and Way 2013, using kriging. I proposed a simpler approach using latitude band averaging, which gave some of the same benefit. In this post I'll look at upgrading the infill process, using numnerics similar to solving the diffusion equation. It tries to find a local average to use for each missing cell.

Improving on lat/lon grids

To do this, I need a better grid system than lat/lon. That creates a big problem at the poles, where cells become very small and skewed. The essential requirement of a grid is that you can quickly allocate a set of scattered data to the right cells, and you know the area of the cells. There are many other ways of doing this. Lat/lon is based on gridding the sphere as if it were a flat surface, which it very much isn't. You can do much better using a somewhat similar 3D object. Regular platonic polyhedra are attractive, and an icosahedron would be best of these. But a cube is more familiar, and good enough. That is what I'll use here. The cube is gridded in the normal way, with a square grid on each face. The sphere surface is radially projected onto the cube.

I'll give details, with the infill process, and tests of the improvement of the results, using spherical harmonics, below the fold. And of course there will be the usual WebGL active picture. It will show the cube grid projected on the sphere, and infill for a typical month, with lines to show the infill dependency.

Cubic grid

Think of a Rubik style cube, but with finer gridding. I number each face by rows, with the numbering accumulating from face to face. I use a unit sphere and unit cube (each edge from -1 to 1). So a point (x,y,z) on the sphere projects to (x,y,z)/Max(|x|,|y|,|z|) on the cube. And to map back to the sphere, just divide by the magnitude.

The cell area on the sphere, which is what is needed for area weighting, is simple. The area on the cube face cell centered at r is reduced by the cos of a view angle, which is 1/|r|, and by the inverse square of the projection, which is |r|^-2, being a total factor of |r|^-3. And all those face cells have equal area on the cube.

Diffusion

After allocating the data points to the cells by projection, I then have populated and unpopulated cells. Populated I estimate by the simple average of data in the cell. Then I check all the unpopulated that are next to one or more pop. They get the average of the adjacent cell values (cells that share an edge), and are marked populated. Then I repeat, until all cells are "populated".

If I was actually doing the integration, I would then multiply those cell average values by the cell areas and add them up. That would end up being a weighted sum of the data. For TempLS, I just want those weights. That is just accountancy. Every time one of the original cell values is used in another cell, I increase its weight.

As a matter of practicality, I limit this, so cells are deemed to inherit values from the four closest actual readings. "Closest" is determined by the stage at which they were associated in the diffusion process. For the cells that were not reached until the third or later stages, there are some arbitrary cutoffs, but the cells chosen should still be close.

In the plot below, for the TempLS data (GHCN+ERSST) for April 2015, you see first the populated cells with a colored checkerboard, different colors for each cube face. Note how the cells do not differ hugely in size, but shrink near the 8 cube corners. Stations reporting in April are marked with black dots. Then the infilled cells are marked in drab colors - grey for the first level, then brown, bluish etc. I'm using here a 24x24 grid on each face. That is finer in parts than the usual 5° lat/lon grid, and since the SST is on a 4° lat/lon in TempLS, there are gaps in sea temp as well. I've used this finer mesh to show multiple layers in the diffusion.

In the unpop (drab) cells, you see little white lines from the centres. These show the connections to the actual pop cells from which they are averaged.

Test of integration

I've tested using spherical harmonics of order up to L=5 (36 functions). Products of these should integrate to produce an identity matrix (orthonormal). So I integrate, first using the grid as if all were present. Then I integrate using only the "populated" cells for that month of April. Then I integrate a third time infilling for the unpopulated cells. For those integrations I use the weighted formulation.

I tabulate the results below, as a sum of squares of differences between the matrix of results and the unit matrix. I have broken into columns according to the L value. The number there is actually an average for products in that section, so it is in fact a power spectrum (more on this in future posts). You can think of L as rising (spatial) frequency. In any case, the test of good integration is that the numbers should be small. I have left out L=0, because that is zero by normalization.

12345
Full grid00001e-06
Infilled grid8.8e-050.000290.0010450.0020150.003635
No infill0.0076320.0273350.0493270.0644930.075291


The all grid integration is accurate to six figures. That reflects that the mesh is fine enough (24x24) to resolve these functions well. For the rest, infilling makes a dramatic improvement, by nearly two orders of magnitude, though arguably the square root should be the criterion. Both deteriorate at higher frequency, infilling at a faster rate.

Conclusion

Grid integration with diffusion infilling looks good, and may be competitive with triangle mesh. If so, this is welcome, because it manages things like land/sea separation more easily. I'll post more soon on comparisons of use in TempLS.




14 comments:

  1. A icosahedron grid would be elegant for this approach. Precomputed grids can be found here.

    P.S. Promised table is missing. Typo: "thoise cell average values".

    ReplyDelete
    Replies
    1. Victor,
      Apologies about the table. I just forgot to include it. It's there now.

      Yes, I'd like to use an icosahedron. I can easily grid the faces and locate arbitrary points in that grid. The fiddly part was deciding which of the 20 faces a point was projected on to.

      There is also an issue of identifying neighbor cells on face edges (needed for the diffusion approach). That is not so easy even on a cube.

      Delete
    2. Have any tessellated grids been developed for the ellipsoid, or is that considered too difficult a problem and/or a second-order effect that can be neglected?

      Delete
    3. Well, an ellipsoid is just a stretched sphere, so you can stretch a spherical tesselation too. Or if the semiaxes are very different, the method here of projecting from a cube could use a box shape with a non-square grid.

      I'm about to post about a more general method which allows division into arbitrary polygons (though symmetry i still a sanity aid), and a more general approach to tesselating the subsets.

      Delete
  2. Victor - the image appears on FireFox. In my experience, Safari often fails to display Nick's graphics.

    OT: wood for trees is down for the 2nd day. Hopefully not for good.

    ReplyDelete
    Replies
    1. I did see the image, but not the table (windows10). Now on another computer (Windows7), I do not see the image (just frame), and still no table anywhere.

      Delete
    2. Not working for me with Safari or FireFox. The homepage opens, but any attempt to click to any other page is met with error 404, page not found.

      Delete
    3. Maybe a firewall problem? It's working smoothly for me, including on Safari.

      Delete
    4. Works today. Probably AT&T problem. The cable TV has been acting up as well.

      Nick - NCEP appears to not be updating.

      Delete
    5. JCH,
      NCEP isn't very regular. You can check the 2015 file date here. Generally, if the log file at the bottom of the page shows no update, it means there is no new version of the file.

      Delete
    6. NCEP missed another regular update time.

      Delete
    7. You have it at .379C. That's probably the peak for the month. I'm not very good at winning the lotto, but I suspect .300C plus for September is pretty much a lock.

      Delete
  3. Should have read more carefully. When I zoomed out, an "orient" button appeared on the upper right, but can't find a table either.

    ReplyDelete