moyhu: Surface temperature sparsity error modes

Thursday, August 24, 2017

Surface temperature sparsity error modes

This post follows last week's on temperature integration methods. I described a general method of regression fitting of classes of integrable functions, of which the most used to date is spherical harmonics (SH). I noted that the regression involved inverting a matrix HH consisting of all the scalar product integrals of the functions in the class. With perfect integration this matrix would be a unit matrix, but as the SH functions become more oscillatory, the integration method loses resolution, and the representation degrades with the condition number of the matrix HH. The condition number is the ratio of largest eigenvalue to smallest, so what is happening is that some eigenvectors become small, and the matrix is near singular. That means that the corresponding eigenvector might have a large multiplier in the representation.

I also use fitted SH for plotting each month's temperature. I described some of the practicalities here (using different functions). Increasing the number of functions improves resolution, but when HH becomes too ill-conditioned, artefacts intrude, which are multiples of these near null eigenvectors.

In the previous post, I discussed how the condition of HH depends on the scalar product integral. Since the SH are ideally orthogonal, better integration improves HH. I have been taking advantage of that in recent TempLS to increase the order of SH to 16, which implies 289 functions, using mesh integration. That might be overdoing it - I'm checking.

In this post, I will display those troublesome eigen modes. They are of interest because they are associated with regions of sparse coverage, and give a quantification of how much they matter. Another thing quantified is how much the integration method affects the condition number for a given order of SH. I'll develop that further in another post.

I took N=12 (169 functions), and looked at TempLS stations (GHCN+ERSST) which reported in May 2017. Considerations on choice of N are that if too low, the condition number is good, and the minimum modes don't show features emphasising sparsity. If the number is too high, each region like Antarctica can have several split modes, which confuses the issue.

The integration methods I chose were mostly described here

OLS - just the ordinary scalar product of the values
grid - integration by summing on a 5x5° latitude/longitude grid. This was the earliest TempLS method, and is used by HADCRUT.
infill - empty cells are infilled with an average of nearby values. Now the grid is a cubed sphere with 1536 cells
mesh - my generally preferred method using an irregular triangular grid (complex hull of stations) with linear interpolation.

OLS sounds bad, but works quite well at moderate resolution, and was used in TempLS until very recently.

I'll show the plots of the modes as an active lat/lon plot below, and then the OLS versions in WebGL, which gives a much better idea of the shapes. But first I'll show a table of the tapering eigenvalues, numbering from smallest up. They are scaled so that the maximum is 1, so reciprocal of the lowest is the condition number.

	OLS	grid	infilled	mesh
Eigen1	0.0211	0.0147	0.0695	0.135
Eigen2	0.0369	0.0275	0.138	0.229
Eigen3	0.0423	0.0469	0.212	0.283
Eigen4	0.0572	0.0499	0.244	0.329
Eigen5	0.084	0.089	0.248	0.461
Eigen6	0.104	0.107	0.373	0.535
Eigen7	0.108	0.146	0.406	0.571
Eigen8	0.124	0.164	0.429	0.619

And here is a graph of the whole sequence, now largest first:

The hierarchy of condition numbers is interesting. I had expected that it would go in the order of the columns, and so it does until near the end. Then mesh drops below infilled grid, and OLS below grid, for the smallest eigenvalues. I think what determines this is the weighting of the nodes in the sparse areas. For grid, this is not high, because each just gets the area of its cell. For both infilled and mesh, the weight rises with the area, and apparently with infilled, more so.

Here is an active graph to show the errant modes. You can cycle through "Style", which means style of integration (grid, mesh etc) and mode, starting from 1 (buttons top right).

It's dominated by Antarctica; the lowest modes focus, with some Arctic activity too, and it isn't for a while that modes bob up in Africa, with some effect in S America. The weakest style (OLS) is almost all polar in the first 9 modes, while mesh starts showing Africa from about mode 4 up, and later Brazil shows up.

Here is the WebGL plot - I show just the mesh style. It gives a better proportion for the polar behaviour, and shows finer features elsewhere. It is the usual trackball, with radio buttons for the modes. Dots are the stations.

Next post will take this further. I'll do a more systematic look at which styles work best in which circumstances. The next main interest is whether I can get better resolution by restricting to a space without the problem nodes. In principle, one could take a very large collection of SH, and collect the eigenfunctions, which are truly orthogonal with respect to the integration style. A subset with moderate eigenvectors would still have a large orthogonal basis.

9 comments:

@whutAugust 26, 2017 at 6:00 AM
Talking about spatio-temporal relationships, here is a lone ranger research effort that demonstrates how much we can still discover with respect to geophysics

ContextEarth.com/2017/08/24/lunisolar-forcing-of-earthquakes

Had some discussion on whether he somehow mistakenly or accidentally plotted X = X, but I don't think so.

ReplyDelete
Replies
PepAugust 29, 2017 at 12:36 AM
What I'd like to know is if there is à way to assess if integration is done correctly. In general, is there à way to assess how à dataset is able to reflect the Real World ?
ReplyDelete
Replies
@whutAugust 29, 2017 at 3:44 AM
"What I'd like to know is if there is à way to assess if integration is done correctly. In general, is there à way to assess how à dataset is able to reflect the Real World ?"

The way to do this is to run cross-validation tests. If you have a physics-based model, this works very well because you can test if your integration extrapolates from the training interval over to the test interval.

For a standing wave phenomenon such as ENSO, there are two aspects, a spatial one and a temporal one. The spatial aspect is super easy to cross-validate, as the ENSO forms an almost perfect spatial dipole that shows opposite signs at Darwin and Tahiti. So that at any one time, one can show that if you have knowledge of the temperature or atmospheric pressure at Darwin, you can accurately predict the temperature or pressure at Tahiti by reversing the value of the anomaly.

For ENSO temporal cross-validation, this has been a longstanding challenge and a problem that no one has been able to solve. But I can demonstrate it convincingly here by assuming the lunar tidal forcing:

http://contextearth.com/2017/08/08/enso-split-training-for-cross-validation/

With this lunisolar model of ENSO, one can take any time interval of the ENSO measure and predict the value at any point in time backward or forward.

ReplyDelete
Replies
Nick StokesAugust 30, 2017 at 2:19 PM
Pep,
"Nick, you may have already explained this in a precedent post but I was wondering which method you consider as the most accurate to give a global mean temperature."
Well, I still think irregular triangular mesh is best. I look for other methods partly to seek agreement, and also because, as with these modes, they can tell you something else. I don't think there is really much difference, though, provided you do something about empty cells. GISS interpolates, which should work. I think kriging is fine, but overkill. Every point which isn't measured is, or should be, estimated from local values. There is sufficient noise that looking for perfect interpolation isn't really hlping much. I did a comparison of methods here. It's one approach to a quality standard.

As for BEST, I think what you have in mind is their use of least squares to avoid requiring a fixed anomaly base interval, which tends to exclude stations which don't have ata there. I have used that (pre-BEST) in TempLS too, and I think it is the right thing to do.
ReplyDelete
Replies

Add comment

An interactive topic index for all Moyhu posts.
Latest Ice and Temperature data
Climate Data Portals
A gallery of Javascript-enhanced graphics
Temperature trend viewer
Google Maps and GHCN
WebGL map of past GHCN/SST station temperatures
WebGL map of GHCN/SST station temperature trends
HiRes NOAA OI SST with WebGL and Movie
Regional Hi-Res SST movies
WebGL Facility
TempLS Guide
More pages, and blog glossary

moyhu

Thursday, August 24, 2017

Surface temperature sparsity error modes

Surface temperature sparsity error modes

9 comments:

Maintained Pages

Search This Blog

Recent Comments

Blogroll

Blog Archive

Translate

Resources

About Me

moyhu

Thursday, August 24, 2017

Surface temperature sparsity error modes

Surface temperature sparsity error modes

9 comments:

Maintained Pages

Search This Blog

Recent Comments

Blogroll

Subscribe To

Blog Archive

Translate

Resources

About Me