Thursday, October 17, 2019

Methods of integrating temperature anomalies on the sphere

A few days ago, I described a new method for integrating temperature anomalies on the globe, which I called FEM/LOESS. I think it may be the best general purpose method so far. In this post I would like to show how well it works within TempLS, with comparisons with other advanced methods. But first I would like to say a bit more about the mathematics of the method. I'll color it blue for those who would like to skip.

Mathematics of FEM/LOESS

It's actually a true hybrid of the finite element method (FEM) and LOESS (locally estimated scatterplot smoothing). In LOESS a model is fitted locally by regression, usually to get a central value. In FEM/LOESS, the following least squares parameter fit is made:

The standard FEM approximate function is f(z)=ΣaᵢBᵢ(z) where z is location on the sphere, B are the basis functions described in the previous post, and aᵢ are the set of coefficients to be found by fitting to a set of observations that I will call y(z). The LS target is

SS=∫(f(z)-y(z))^2 dz

This has to be estimated knowing y at a discrete points (stations). In FEM style, the integral is split into integrals over elements, and then within elements the integrand is estimated as mean (f(zₖ)-y(zₖ))^2 over points k. One might question whether the sum within elements should be weighted, but the idea is that the fitted f() takes out the systematic variation, so the residuals should be homogeneous, and a uniform mean is correct.

So differentiating wrt a to minimise:
∫(f(z)-y(z))Bᵢ(z) dz = 0
Discretising:

Σₘ Eₘ (ΣₖBᵢ(zₖ)Bₙ(zₖ)aₙ)/(Σₖ1) = Σₘ Eₘ (ΣₖBᵢ(zₖ)yₖ)/(Σₖ1)

The first summation is over elements, and E are their areas. In FEM style, the summations that follow are specific to each element, and for each E include just the points zₖ within that element, and so the basis functions in the sum are those that have a node within or on the boundary if that element. The denominators Σₖ1 are just the count of points within each element. It looks complicated but putting it together is just the standard FEM step of assembling a mass matrix.

In symbols,this is the regression equation
H a = B* w B a = B* w y
where B is the matrix of basis functions evaluated at zₖ, B* transpose, diagonal matrix w the weights Σₘ/(Σₖ1) (area/count), y the vector of readings, and a the vector of unknown coefficients.

To integrate a specific y, this has to be solved for a:
a=H-1B* w y
and then ΣaᵢBᵢ(z) integrated on the globe
Int = ΣaᵢIᵢ where Iᵢ is the integral of function Bᵢ,

H is generally positive definite and sparse, so the inversion is done with conjugate gradients, using the diagonal as preconditioner.

For TempLS I need not an integral but weights. So I have to calculate

wt = I H-1B* w

It sounds hard, but it is a well trodden FEM task, and is computationally quite fast. In all this a small multiple of a Laplacian matrix is added to H to ensure corresponding infilling of empty or inadequately constrained cells.

Comparisons

I ran TempLS using the FEM/LOESS weighting with 9 modes, h2p2, h2p3, h2p4,...h43,h4p4. I calculated the RMS of differences between monthly averages, pairwise, for the years 1900-2019, and similar differences between and within the advanced methods MESH, LOESS and INFILL (see here for discussion, and explanation of the h..p.. notation). Here is a table of results, of RMS difference in °C, multiplied by 1000:

MESHLOESSINFILLh2p2h3p2h4p2h5p2h2p3h3p3h4p3h5p3h2p4h3p4h4p4h5p4h2p5h3p5h4p5h5p5
MESH0181926201919211817191817171919191922
LOESS1802524221922202220231922212422232427
INFILL1925027191714211413111711111014111012
h2p22624270242526212424262225252626262728
h3p22022192401716161116171612171816171721
h4p2191917251701415128161213101414161720
h5p2192214261614017111161310101206812
h2p32120212116151701616191017171917192023
h3p3182214241112111601011126101211111316
h4p3172013241681116100111194911111216
h5p3192311261716619111101599106048
h2p41819172216121310121115013121513151620
h3p417221125121310176991308101091014
h4p417211125171010171049128061091014
h5p41924102618141219129101510601210912
h2p5192214261614017111161310101206812
h3p5192311261716619111101599106048
h4p51924102717178201312416101098406
h5p52227122821201223161682014141212860


The best agreement between the older methods is between MESH and LOESS at 18. Agreement between the higher order FEM results is much better. Agreement of FEM with MESH is a little better, with LOESS a little worse. The agreement with INFILL is better again, but needs to be discounted somewhat. The reason is that in earlier years there is a large S Pole region without data. Both INFILL and FEM deal with this with Laplace interpolation, so I think that spuriously enhances the agreement.
As will be seen later, there is a change in the matchings at about 1960, when Antarctic data becomes available. So I did a similar table just for the years since 1960:

MESHLOESSINFILLh2p2h3p2h4p2h5p2h2p3h3p3h4p3h5p3h2p4h3p4h4p4h5p4h2p5h3p5h4p5h5p5
MESH0111123171614181312121512111014121216
LOESS1101019141210141099109891091014
INFILL111002116161317131211151110913111013
h2p22319210202020182020201820202120202022
h3p21714162001515131014151311141615151519
h4p21612162015012121071491091312141620
h5p214101320151201511961110101106813
h2p31814171813121501213161013141615161720
h3p3131013201010111209128691111121317
h4p31291220147913901097489101116
h5p31291120151461612100139996049
h2p415101518139111089130991211131418
h3p412911201110101367990791091115
h4p41181020149101494997061091015
h5p410992116131116118912960119913
h2p514101320151201511961110101106813
h3p51291120151461612100139996049
h4p51210102015168171311414111098407
h5p51614132219201320171691815151313970


Clearly the agreement is much better. The best of the older methods is between LOESS and INFILL at 10. But agreement between high order FEM methods is better. Now it is LOESS that agrees well with FEM - better than the others. Of course in assessing agreement, we don't know which is right. It is possible that FEM is the best, but not sure.

Here are some time series graphs. I'll show a time series graph first, but the solutions are too close to distinguish much.





Difference plots are more informative. These are made by subtracting one of the solutions from the others. I have made plots using each of MESH, LOESS and INFILL as reference. You can click the buttons below the plot to cycle through.



There is a region of notably good agreement between 1960 and 1990. This is artificial, because that is the anomaly base period, so all plots have mean zero there. Still, they are unusually aligned in slope.

Before 1960, LOESS deviates from the FEM curves, MESH less, and INFILL least. The agreement of INFILL probably comes from the common use of Laplace interpolation for the empty Antarctic region. In the post 1990 period, it is LOESS which best tracks with FEM.

However, I should note that no discrepancies exceed about 0.02°C.

Next steps

After further experience, I will probably make FEM/LOESS my frontline method.









4 comments:

  1. Hum, it seems that the discrepancy might be related to some artifact in the 1940's. Maybe you have discovered a test to pinpoint anomalous data here.

    ReplyDelete
  2. The issue with the 1940's is that the corrections are abrupt, drawn along the war-year endpoints when measurement techniques changed dramatically. Differing filtering techniques will either exaggerate or suppress the adjustments made at the boundary years. A high-pass filter will enhance, while a low-pass will suppress. BTW, the strongest multiple El Nino peak also occurred in the early 1940's making it eve more difficult to adjust.

    ReplyDelete
  3. I think I understand your new method . It has got me interested again in the use of icosahedral grids (FEM) for the earth's surface. You seem to be using what I call a level 4 Icosahedral grid with 2562 nodes, then using LOESS to fit the station values to a polynomial for each triangle. The result are nice smooth anomaly distributions. Do you also interpolate into empty triangles?

    I have instead been using a brute force method by stepping up to a level 5 icosahedral grid with 10224 nodes and then averaging all temperatures within the 4 times smaller triangles. Perhaps I should be using R !

    ReplyDelete
    Replies
    1. Clive,
      "You seem to be using what I call a level 4 Icosahedral grid with 2562 nodes"
      You tend to work in terms of iterated bisecting; I do fractional divisions. Each triangle is divided into n² elements. So h2 has 80 elements in total (42 nodes); h3 has 180 (92), h4 has 320 (162). So a lot fewer elements.

      "Do you also interpolate into empty triangles?"
      With higher order polynomials, it's harder, but there is a good solution. You need enough nodes (not just 1) to get a unique fit. The solution is using the Laplace operator. Here is how I do it for the simpler grid infill method. You have a whole lot of grid equations
      NᵢTᵢ=yᵢ
      where N is the number of nodes in a tri i, T is the average anomaly, and y is the sum of anomalies. It's a system of equations but may be singular if some Nᵢ=0.

      Embed it in a matrix system
      AᵢₖTₖ=yᵢ
      (summing over k - summation convention) where A is a diagobal matrix with N's on the diag.

      Let B be a Laplacian matrix. Good enough is for BT=0 to say that each cell value is the average if its neighbors - ie 3 on the diagonal (triangle grid), 0 elsewhere except, for each row, a value of -1 at each neighbor col.

      Then solve (A+εB)T = y
      ε needs to be large enough to make A+εB not too ill-conditioned, but small enough not to do too much smoothing of the known data. There is a lot of room there.

      My new method uses sparse matrix structures and conjugate gradient methods for solving that last equation. With the FEM version, A is like a mass matrix. It's the matrix for the local regression.

      Delete