I thought I might be out on a limb when TempLS showed slight cooling from May to June. Both satellite measures showed significant warming. But GISS showed cooling, from 0.64 °C to 0.56 °C. Time series graphs are shown here
Meanwhile, Arctic ice has been melting. Still a little behind last year, but keeping pace.
As usual, I compare the previously posted TempLS distribution to the GISS plot.
Friday, July 13, 2012
Tuesday, July 10, 2012
June TempLS down slightly from May
The TempLS analysis, based on GHCNV3 land temperatures and the ERSST sea temps, showed a monthly average anomaly of 0.50°C for June, down slightly from 0.52 °C in May. UAH seems to be the only other result out, showing a 0.08 C increase. There are more details at the latest temperature data page.
I'll show here the usual spherical harmonics plot of area distribution of temperature. But I'll also show a more elaborate interactive plot of station temperatures. This gives a good guide as to what data is currently in, as well as a basis for comparing the spherical harmonics smoothing. It's the same style I showed last November
Above is the spherical harmonics plot done with the GISS colors and temperature intervals, and as usual I'll post a comparison when GISS comes out. And next is the interactive plot, with some brief usage guidance below.

How it works
The flat map at top right is your navigator. If you click a point in that, the sphere will rotate so that point appears in the centre.The buttons below allow modification. Set what you want, and press refresh. You can show stations, and the mesh, and magnify 2×, 4×, or 8× (by setting both). You can click again to unset (and press refresh).
Then you can click in the sphere. At the bottom on the right, the nearest station name and anomaly will appear. You may want to have stations displayed here. You'll see two faint numbers next to "stations". This indicates how much your clicked missed the station by (in pixels).
Monday, July 2, 2012
CPS proxy reconstruction  analysis and selection bias
CPS is an old and fairly simple reconstruction method for temperature proxies. The proxies are simply normalised  mean subtracted and divided by standard deviation  and averaged. Then the average is scaled to some target temperature during a calibration period  a multidecade period where the instrumental measure is available. This can be done by regression. Finally, the reconstruction is just the average in precalibration times, scaled by that same formula.
It solves the problem of overfitting  having more proxies than years of measurement of target temperature. But it also has limitations. It can't make use of local temperature information. The notion of standard deviation is not really appropriate, since neither proxies nor the target temp are stationary random variables. And, as normally implemented, proxies that show little temperature dependence are swept in, and add only noise.
It's a convenient method to analyze, so I thought I would study the effect of selection of proxies by correlation in the calibration period  recently controversial. Not too much should be made of it, since selection is motivated by avoiding overfitting, which is not a problem in CPS. But I'm developing the methods for use in more complex algorithms.
So the N proxies are formed:$$P_i=F_i*T+W_i,\ \ i=1...N$$
T is the target temperature, f is a constant factor (S/N) for each proxy and W the noise. I'll assume for arithmetic simplicity that both P and T are normalized relative to the calibration period, and I'll use a symbol \(\cdot\) to indicate a dot product mean over the years of that period.
Then f is estimated by regression in the calibration period:$$\hat{f}=p\cdot T/T\cdot T=p\cdot T$$ So the recon $$\hat{T}=\frac{p}{p\cdot T}$$
Now using our special knowledge of proxy structure, $$p\cdot T=f+w\cdot T$$ so $$\hat{T}=\frac{f*T+w}{f+w\cdot T}$$ or with \(e=w/f\) $$\hat{T}=\frac{T+e}{1+e\cdot T}$$
So this is in principle a nearly unbiased estimator. e should have zero mean, and the expected value of \(e\cdot T\) should also be zero. Let's see how that works out.
I'll show first the temperature, with calibration period in red. There's not much point in showing a bare proxy  it just looks like noise.
The next plot shows, in red, the CPS recon as described above. The green is an "unbiased" recon where T has the corresponding amount of white noise added.
Noise is dominant, but the HS shape is there, reflecting the way the proxies were constructed. A smoothed curve will show any biases better (emphasising them greatly relative to the noise):
There is a small apparent bias in the scale factor which multiplies the green curve to get the red. This is the \(1+e\cdot T\) denominator and can go either way. In this case, the se for this is about 0.14 about a mean of 1. In fact in this realization it was 0.83, which is on the low side.
There is an actual bias, because of the nonlinear operation of inversion. However, in this case it is only about 0.02  ie 2% difference.
$$\rho_i=P_i\cdot T = F_i+w_i\cdot T$$
The intent of selection is to choose for high S/N  ie F. But the selection cannot distinguish between Ï that is high for this reason or high because of the second factor  a chance alignment of noise with T in the calibration period. One of these will persist precal, the other probably will not.
Note that this effect is dependent on the distribution of F values. If they are all equal  a common choice for toy examples  then the effect is maximised, because selection is purely on the random alignments of w and T. And if the S/N is low, again random will dominate. But if there is a subset with high S/N, then selection for these will work as intended, and create little bias.
So in the recon, the denominator \(1+e\cdot T\) no longer has expected value 1, and so is a source of genuine bias. Let's see how that works out, choosing the top 24 by calib Ï out of the 59:
You can see that the red now lies abobe the black and green, clearer in a smoothed plot:
In fact, the denomninator \(1+e\cdot T=1.66\) which is the source of the bias.
The main thing to say is that for CPS, the only case analyzed here, the effect is not so important, because selection by correlation after normalization is not usually done. A variant which may be of some importance is where a correlationweighted sum of proxies is used.
Code is here
It solves the problem of overfitting  having more proxies than years of measurement of target temperature. But it also has limitations. It can't make use of local temperature information. The notion of standard deviation is not really appropriate, since neither proxies nor the target temp are stationary random variables. And, as normally implemented, proxies that show little temperature dependence are swept in, and add only noise.
It's a convenient method to analyze, so I thought I would study the effect of selection of proxies by correlation in the calibration period  recently controversial. Not too much should be made of it, since selection is motivated by avoiding overfitting, which is not a problem in CPS. But I'm developing the methods for use in more complex algorithms.
Pseudoproxies
Pseudoproxies are created by taking some computed temperature curve going back centuries. Here I use output from CSM from 850AD to 1980, as described in Schmidt et al. Then noise is added. It could be red noise thought to be characteristic of real proxies, but for present purposes, to focus on analytic approximation, I'll use white noise.So the N proxies are formed:$$P_i=F_i*T+W_i,\ \ i=1...N$$
T is the target temperature, f is a constant factor (S/N) for each proxy and W the noise. I'll assume for arithmetic simplicity that both P and T are normalized relative to the calibration period, and I'll use a symbol \(\cdot\) to indicate a dot product mean over the years of that period.
Math of CPS
So with the normalization, all that is needed is to take the mean of the proxy construction:$$p=f*T+w,\ \ i=1...N$$ where p,f,w are the mean over proxies of P,F,W. The mean p will still have zero mean, though not unit SD; the averaging will reduce the noise by a factor of about √N.Then f is estimated by regression in the calibration period:$$\hat{f}=p\cdot T/T\cdot T=p\cdot T$$ So the recon $$\hat{T}=\frac{p}{p\cdot T}$$
Now using our special knowledge of proxy structure, $$p\cdot T=f+w\cdot T$$ so $$\hat{T}=\frac{f*T+w}{f+w\cdot T}$$ or with \(e=w/f\) $$\hat{T}=\frac{T+e}{1+e\cdot T}$$
So this is in principle a nearly unbiased estimator. e should have zero mean, and the expected value of \(e\cdot T\) should also be zero. Let's see how that works out.
Example
I created 59 proxies as described. The factors F, which are approx the S/N) were randomly generated in a range from 0.03 to 0.18. Once generated these factors were kept constant through the analysis. Unit white noise was added and the result normalized.I'll show first the temperature, with calibration period in red. There's not much point in showing a bare proxy  it just looks like noise.
The next plot shows, in red, the CPS recon as described above. The green is an "unbiased" recon where T has the corresponding amount of white noise added.
Noise is dominant, but the HS shape is there, reflecting the way the proxies were constructed. A smoothed curve will show any biases better (emphasising them greatly relative to the noise):
Update. I originally had colors wrong on the legend. To fix this, I did a new run. As I say below, the apparent "bias" can go either way  in the earlier plot it was above by a large margin; this time it's below. I've changed the relevant numbers.
There is a small apparent bias in the scale factor which multiplies the green curve to get the red. This is the \(1+e\cdot T\) denominator and can go either way. In this case, the se for this is about 0.14 about a mean of 1. In fact in this realization it was 0.83, which is on the low side.
There is an actual bias, because of the nonlinear operation of inversion. However, in this case it is only about 0.02  ie 2% difference.
Selection
So now what happens if a subset of proxies is chosen according to the correlation in the calibration period? The correlation, with above normalizing and notation, is:$$\rho_i=P_i\cdot T = F_i+w_i\cdot T$$
The intent of selection is to choose for high S/N  ie F. But the selection cannot distinguish between Ï that is high for this reason or high because of the second factor  a chance alignment of noise with T in the calibration period. One of these will persist precal, the other probably will not.
Note that this effect is dependent on the distribution of F values. If they are all equal  a common choice for toy examples  then the effect is maximised, because selection is purely on the random alignments of w and T. And if the S/N is low, again random will dominate. But if there is a subset with high S/N, then selection for these will work as intended, and create little bias.
So in the recon, the denominator \(1+e\cdot T\) no longer has expected value 1, and so is a source of genuine bias. Let's see how that works out, choosing the top 24 by calib Ï out of the 59:
You can see that the red now lies abobe the black and green, clearer in a smoothed plot:
In fact, the denomninator \(1+e\cdot T=1.66\) which is the source of the bias.
Implication
Having identified the source of the bias, it would be nice to be able to correct for it. However, the denominator involves f which is the very thing we are after.The main thing to say is that for CPS, the only case analyzed here, the effect is not so important, because selection by correlation after normalization is not usually done. A variant which may be of some importance is where a correlationweighted sum of proxies is used.
Code is here
Subscribe to:
Posts (Atom)