A
few days ago I posted an extensive ANOVA-type analysis of the successive reduction of variance as the spatial behaviour of global temperatures was more finely modelled. This is basically a follow-up to show how the temperature field can be partitioned into a smooth part with known reliable interpolation, and a hopefully small residue. Then the size of the residue puts a limit on the coverage uncertainty.
I
wrote about coverage uncertainty in January. It's the uncertainty about what would happen if one could measure in different places, and is the main source of uncertainty in the monthly global indices. A different and useful way of seeing it is as the uncertainty that comes with interpolation. Sometimes you see sceptic articles decrying interpolation as "making up data". But it is the complement of sampling, which is how we measure. You can only measure anything at a finite number of places. You infer what happens elsewhere by interpolation; that can't be avoided. Just about everything we know about the physical world, or economic for that matter, is deduced from a finite number of samples.
The standard way of estimating coverage uncertainty was used by
Brohan et al 2006. They took a global reanalysis and sampled at sets of places correponding to possible station distributions. The variability of the resulting averages was the uncertainty estimate. The weakness is that the reanalysis may have different variability to the real world.
I think analysis of residuals gives another way. If you have a temperature anomaly field T, you can try to separate it into a smoothed part s and a residual e:
T = s + e
If s is constructed in such a way that you expect much less uncertainty of interpolation than T, then the uncertainty has been transferred to e. That residual is meor intractable to integrate, but you have an upper bound based on its amplitude, and that is an upper bound to coverage uncertainty.
So below the jump, I'll show how I used a LOESS type smoothing for s. This replaces points but a low-order polynomial weighted regression, and the weighting is by a function decaying with distance, in my case exponentially, with characteristic distance t (ie exp(-|x}/r). With r very high, one can be very sure of interpolation (of s), but the approximation will not be very good, so e will be large, and contains a lot of "signal" - ie what you want to include in the average, which will then be inaccurate. If the distance is very small, the residual will be small too, but there will be a lot of noise still in s. I seek a compromise where s is smooth enough, and e is small enough. I'll show the result of various r values for recent months, focussing on Jan 2017. I'll also show WebGL plots of the smooths and residuals.
I should add that the purpose here is not to get a more accurate integral by this partition. Some of the desired integrand is bound to end up in e. The purpose is to get a handle on the error.