This post follows last week's on
temperature integration methods. I described a general method of regression fitting of classes of integrable functions, of which the most used to date is
spherical harmonics (SH). I noted that the regression involved inverting a matrix HH consisting of all the scalar product integrals of the functions in the class. With perfect integration this matrix would be a unit matrix, but as the SH functions become more oscillatory, the integration method loses resolution, and the representation degrades with the condition number of the matrix HH. The condition number is the ratio of largest eigenvalue to smallest, so what is happening is that some eigenvectors become small, and the matrix is near singular. That means that the corresponding eigenvector might have a large multiplier in the representation.
I also use fitted SH for plotting each month's temperature. I described some of the practicalities
here (using different functions). Increasing the number of functions improves resolution, but when HH becomes too ill-conditioned, artefacts intrude, which are multiples of these near null eigenvectors.
In the previous post, I discussed how the condition of HH depends on the scalar product integral. Since the SH are ideally orthogonal, better integration improves HH. I have been taking advantage of that in recent TempLS to increase the order of SH to 16, which implies 289 functions, using mesh integration. That might be overdoing it - I'm checking.
In this post, I will display those troublesome eigen modes. They are of interest because they are associated with regions of sparse coverage, and give a quantification of how much they matter. Another thing quantified is how much the integration method affects the condition number for a given order of SH. I'll develop that further in another post.
I took N=12 (169 functions), and looked at TempLS stations (GHCN+ERSST) which reported in May 2017. Considerations on choice of N are that if too low, the condition number is good, and the minimum modes don't show features emphasising sparsity. If the number is too high, each region like Antarctica can have several split modes, which confuses the issue.
The integration methods I chose were mostly described
here
- OLS - just the ordinary scalar product of the values
- grid - integration by summing on a 5x5° latitude/longitude grid. This was the earliest TempLS method, and is used by HADCRUT.
- infill - empty cells are infilled with an average of nearby values. Now the grid is a cubed sphere with 1536 cells
- mesh - my generally preferred method using an irregular triangular grid (complex hull of stations) with linear interpolation.
OLS sounds bad, but works quite well at moderate resolution, and was used in TempLS until very recently.
I'll show the plots of the modes as an active lat/lon plot below, and then the OLS versions in WebGL, which gives a much better idea of the shapes. But first I'll show a table of the tapering eigenvalues, numbering from smallest up. They are scaled so that the maximum is 1, so reciprocal of the lowest is the condition number.
| OLS | grid | infilled | mesh
|
Eigen1 | 0.0211 | 0.0147 | 0.0695 | 0.135
|
Eigen2 | 0.0369 | 0.0275 | 0.138 | 0.229
|
Eigen3 | 0.0423 | 0.0469 | 0.212 | 0.283
|
Eigen4 | 0.0572 | 0.0499 | 0.244 | 0.329
|
Eigen5 | 0.084 | 0.089 | 0.248 | 0.461
|
Eigen6 | 0.104 | 0.107 | 0.373 | 0.535
|
Eigen7 | 0.108 | 0.146 | 0.406 | 0.571
|
Eigen8 | 0.124 | 0.164 | 0.429 | 0.619
|
And here is a graph of the whole sequence, now largest first:
The hierarchy of condition numbers is interesting. I had expected that it would go in the order of the columns, and so it does until near the end. Then mesh drops below infilled grid, and OLS below grid, for the smallest eigenvalues. I think what determines this is the weighting of the nodes in the sparse areas. For grid, this is not high, because each just gets the area of its cell. For both infilled and mesh, the weight rises with the area, and apparently with infilled, more so.