Monday, October 13, 2014

A catch-up on TempLS

I've been writing a lot about TempLS (my global temperature index) recently, and realizing that I don't have a unique reference that explains exactly what it is and what has recently been happening to it.

TempLS dates back to a period in early 2010 when there was a flurry of amateur efforts to replicate the monthly global surface temperature indices from the major producers (which some thought suspect). This post by Zeke (with links to earlier) gives an overview. Jeff Id and Romanm started it with a reconstruction that used a least squares method for aggregating a single cell, yielding offsets rather than requiring a fixed anomaly period. I thought that could be applied to the whole recon.

So I developed TempLS, which was basically a big OLS regression, based on GHCN unadjusted station monthly averages. It was quick to run, and I incorporated choice mechanisms which made it easy to calculate regional or special (eg rural, airport) averages. A rather complete summary of this stage of development is here. An important feature was the incorporation of SST data. This comes gridded, often 5°x5°, and so I simply entered these as stations.

I made a point of using unadjusted GHCN, because there were many claims that warming was an artefact of adjustment. I have myself no objections to adjustment, though I did show that it makes relatively little difference to the index.

TempLS combines weighted regression with spatial integration, much as BEST did later. It weighted initially by the inverse of grid density, estimated by stations/cell in a 5°x5° grid. I posted at one stage a very simple version for incorporation in Steven Mosher's RGHCNv3. You can regard this weighting as that which a spatial integration formula woud provide, with each grid estimated by its station average anomaly, or equivalently, each function value (observed average) assigned an area equal to its share of the cell.

Version 2 in August 2010 generalised the idea of regression, so that spatial variation (among others) could be included (and maps produced). The math basis was set out here.

Meanwhile I was experimenting with other kinds of weighting. The problem with cell averages, which I now cell grid weighting, is that many cells have no stations. These would be best represented by local data, but the weight that can be assigned is limited to one cell area, and near the poles, where there is a lot of sparseness, the cell areas actually diminish, so such regions are under-represented. In early 2011, I did a series on Antarctica (final and links). This developed various forms of mesh-based weighting. This means that every point is estimated based on local data. I tinkered with Voronoi tessellation, but have now settled with weighting that assigns to each node a weight equal to a third of the area of the triangles (from a convex hull, which provides an irregular triangular mesh) which touch it. This is equivalent to a finite element piecewise linear integration formula.

An interesting exercise, which I started in early days, is to see if some very small subset (60) of stations can give a reasonable world estimate. This is the latest in that series, with links back.

Version 2.1, which incorporates some revised verion, is described here. This is the last formal release with code.

Meanwhile, in July 2011 I started a regular cycle of posting a monthly average estimate, based on GHCN and SST - initially HADSST2, but I soon switched to ERSST. I tried to make sure to post ahead of the majors, so that a prediction record could be established. I then published a comparison with the GISS result for the month. A list of links to those posts is given here; each has a list to its preceding TempLS post. I also included the TempLS results with the other indices on my latest data page, and in the graphical comparisons there.

I posted a review of the comparison of TempLS with other indices in December 2012. It was clearly in the mainstream, closest to NOAA and HADCRUT. More recently, I posted a mini-review noting that the closeness to NOAA had become more pronounced.

I had thought of switching to mesh weighting for regular monthly, but was deterred by the hour+ it takes to do 1200 meshes to cover a century. But you don't need a new mesh each month; most historic months change the population of stations very rarely. So with a scheme of stored weights and detecting changes, I can do it, and I think I should. I expect the results to be now closer to GISS. I also expect that the results will compare well with the revision of Cowtan and Way; I'll post on that soon.

Appendix/Update

Carrick asked for more details on the use of meshes in integration. The FEM idea of integration is that you build up an interpolation which is exact at each node, and takes some polynomial form in between. They do it with basis functions, but we don't need to deal with that here. Just imagine that you have a linear interpolation on each triangle. They will match at the edges.

On the integration, first I'll formalize the trapezoid analogy. If you have two points z1,z2, with function values f1,f2, then the area of the trapezoid on the graph for the linear interpolation is (z2-z1)*(f1+f2)/2. Base length times mean height.

The corresponding formula in 3D is volume = base area * (f1+f2+f3)/3.

To prove it, let the linear approx be f=a.z+c. (a,z,vecs, . scalar product). Then f1 = a.z1+c etc.

To save writing determinants I'll invoke what I hope is a familiar proposition. The centroid z0=(z1+z2+z3)/3 is also the center of mass. ie ∫ z dA = z0 A, where A is the area of triangle (z1,z2,z3).

So ∫ f dA = a . ∫ z dA + ∫ c dA = (a . z0 + c)*A
= A * Σ_i ( a . z_i + c)/3 = A * (f1+f2+f3)/3

Now when you collect all the formulae for each triangle (FEM assembly), you get a big sum in which each node value f_i is multiplied by 1/3 of the sum of all the areas of triangles of which it is a node. That is the weighting I use (actually, no need to divide by 3).
<

3 comments:

  1. Thanks Nick. I find this interesting but don't immediately follow it:

    I tinkered with Voronoi tessellation, but have now settled with weighting that assigns to each node a weight equal to a third of the triangles (from a convex hull, which provides an irregular triangular mesh) which touch it. This is equivalent to a finite element piecewise linear integration formula.

    Can you expand on the mathematics of this?

    Also, any chance you'll end up writing up something for publication on this? What if we helped pay the publication costs via a tip jar?

    ReplyDelete
    Replies
    1. Carrick,
      I've added an appendix, which I hope helps there.

      Thanks for the thought on publication. I'm not much worried about page charges; if necessary CSIRO would (probably) pay if I go through their system. It's mainly that it has all been a bit bitty. But I think the general proposition of using irregular meshes may well have enough merit to publish. It is more direct than what they currently do, and I think includes the C&W benefits.

      I'm currently going right back through GHCN unadjusted with QC. Then we'll be able to see better what it does.

      Delete
    2. Nick, I agree it is not only more direct (less ad hoc), but in some sense an optimal method for infilling.

      That seems to be an important contribution.

      Delete