moyhu: Ryan's code

Sunday, February 20, 2011

Ryan's code - testing.

I set out to do, as promised, a repetition of Ryan's sensitivity test with S09 using more AVHRR PCs. But I was distracted by what seemed to me to be a simpler and more informative test, which I'll report here. It gives a quantitative comparison and also can act as a kind of calibration.

I detrended all the raw station data. That is, I estimate and subtract the trend for each station, leaving the mean over the observation period unchanged. So we have data which should return a zero trend. What do we get?

I looked at four cases:

SO9 with 3 PC's (Eric Steig's original)
SO9 with 5 PC's
OLMC10 with RLS (regularized least squares)
OLMC10 with E-W (eigenvalue weighting)

The RLS results are the most quoted, and have been used in recent sensitivity tests. I took the time period from 1957-2006, all year.

Well, none of the methods reported a zero trend, although the magnitudes were reduced relative to real data.

Update 2. As noted below, detrending the raw data is not a good idea - it is much better to calculate the detrend slope using anomalies. I did this, and found that it had little effect on the S09 results, but a big effect on the OLMC10 method. The RLS performance was now better. Qualitatively, though, the results are similar. The RLS method still gives a negative trend, by about as much as S09 is positive.

Here are the plots, using the same color range as in my last post:

S09 3 PCs

S09 5 PCs

RLS

E-W

Here are the new numerical trend averages in C/Decade:

	East	West	Peninsula	Continent
S09_3	0.0285	0.05210	0.0357	0.0338
S09_5	0.0221	0.04810	0.0534	0.0289
RLS	-0.0282	-0.00757	0.0309	-0.0214
E-W	0.0183	0.03990	0.0627	0.0247

This suggests also that S09 tends to return high trends - RLS low, to ~~a slightly greater~~ about the same extent, and E-W also slightly high. The magnitudes of the differences are comparable to the real data discrepancies. As a comparison of the magnitude of differences, here are the actual trends from real data, taken from Table 1 of OLMC10:

	East	West	Peninsula	Continent
S09 3	0.10	0.20	0.13	0.12
RLS	0.03	0.10	0.35	0.06
E-W	0.02	0.06	0.32	0.04

The continent difference of ~~0.081~~ 0.055 between S09/3 and RLS is ~~slightly larger than~~ comparable to the much debated real data discrepancy of 0.06. ~~There may be something in Eric Steig's claim that ridge regression is reducing the trend.~~

Update. As soon as I posted this, I realised that it would have been better to detrend anomalised data rather than the raw data. Estimating trends of seasonal series with missing values and irregular endpoints is noisy. I'll post revised results in about an hour when the computing is done.

Here is the table I calculated earlier detrending the raw data:

	East	West	Peninsula	Continent
S09 3	0.0285	0.0521	0.0357	0.033
S09 5	0.0221	0.0481	0.0536	0.028
RLS	-0.0572	-0.0353	0.0463	-0.048
E-W	0.0001	0.0204	0.0520	0.0066

15 comments:

AnonymousFebruary 20, 2011 at 11:48 PM
I presume the calculations are all linear?

i.e. if you take your detrended anomalies and now add a constant trend to all of them, the resulting overall trends will all increase by exactly the added term?

Kevin C
ReplyDelete
Replies
NicLFebruary 21, 2011 at 9:13 AM
Nick

Interesting test, but I don't think that it actually proves anything, unfortunately. :)

Overall, temperatures in Antarctica rose quite strongly from 1957 to the early 1980s, and then were much more stable (in linear trend terms) from then until 2006. Our station set includes many AWS, which only started operating in the early 1980s.

Detrending their records before input to RegEM will therefore have little effect. But detrending the long-record manned stations - which are what really drive the long term trends in the reconstruction - will generally result in reduced, but still positive, uptrends from 1957 to the early 1980s, followed by downtrends from then until 2006. It is the uptrending values in the first period that are used to impute the missing values for the AWS stations.

As a result, the infilled AWS records can be expected to have uptrends until the early 1980s and then have a flat trend, giving an overall positive trend. Hence the RLS reconstruction, overall, derived as it is from the infilled records of both manned and AWS stations, can be expected to have an overall upwards trend. That is in fact an entirely logical and correct result from data with these characteristics.

So please don't think that this test tells you anything at all about bias of the OLMC10 method (or indeed about that of the S09 method). Sorry!

Have you tried the test I suggested, of comparing the trends of the infilled records of Byrd manned station and AWS? I found that most instructive. I can probably find the archived results somewhere, if you don't have the time to perform this test.
ReplyDelete
Replies
Nick StokesFebruary 21, 2011 at 11:16 AM
Kevin,
No, it isn't linear. I did a run (S09, 5PCs) with added trend; here are the results:
Trend E_____ W_____ Penin Cont
0 0.0219 0.0496 0.0549 0.0292
0.1 0.0929 0.0727 0.1180 0.0896
0.2 0.1980 0.1280 0.1880 0.1820
ReplyDelete
Replies
Nick StokesFebruary 21, 2011 at 11:26 AM
NicL
Thanks for that explanation. Yes, the fallacy seems to be that you can't say that combining two time series, trendless over different time periods, gives a trendless result.

Still, one can treat it as just another set of perturbed data, and I think it is interesting that the difference is mostly (except for peninsula) reasonably similar to real data. I'll do more testing.

I'm still working on the original Byrd test, and I do hope to do your test as well. But if you do find the archived results, I'd be interested to read them.

I was interested in that post from last year that you pointed CCE to, too.
ReplyDelete
Replies
AnonymousFebruary 21, 2011 at 8:52 PM
Thanks. I am surprised about that. Also concerned, although I'll have to think through the implications for a bit.
Kevin C
ReplyDelete
Replies
CarrickFebruary 23, 2011 at 8:35 AM
Kevin, it's my understanding that when you truncate PCs in the process of regularizing a matrix, the relationship between the inputs and outputs becomes nonlinear (though it depends on signal to noise ratios as to whether this even matters.)
ReplyDelete
Replies
AnonymousFebruary 25, 2011 at 12:34 AM
Yes, I think you're right about that.

However, do the PCs actually change? (If not, then the nonlinearity is coming from elsewhere e.g. the station weightings- do these vary?)

If I understand correctly the spatial PCs are coming from the AVHRR data?

At this point I have to sort out rows and columns for the PCA calc, my brain comes out my ears and I start saying stupid things. Be warned:

The two dimensions of our data are time and position. The PCs are spatial, so the different time values are being considered as different datasets. So the PCA is calculating the eigenvalues/vectors of a covariance matrix whose rows and columns represent the different dataset dates, and whose values are the covariances between the data at any pair of dates. Is that right?

So what happens if we add a linear trend? Then a constant offset is added to each spatial dataset, where the constant varies from one dataset to another.

But a constant offset doesn't change the covariance, and so doesn't affect the PCs at all?

Or am I confused?

Kevin C
ReplyDelete
Replies
CarrickFebruary 25, 2011 at 5:40 AM
Kevin, generally one subtracts the mean of each vector before computing the covariance matrix, so constant offsets would just get subtracted out (i.e., have no effect).
ReplyDelete
Replies
AnonymousFebruary 25, 2011 at 7:03 AM
And I'm also confused about the parameters too. Part of the problem is that I've had a paper published on PCA, so I thought I understand it. But my paper was on a very unusual application - analysing the behaviour of a system of equations. It's becoming increasingly clear that I haven't grokked the more normal statistical usage!

Kevin C
ReplyDelete
Replies
Nick StokesFebruary 25, 2011 at 10:32 AM
Kevin,
I am also on a learning curve in regard to the code. I'm pretty sure the basis functions (AVHRR EOF's) don't change. One source of nonlinearity is clearly the Tikhonov regularization parameter
as Tamino describes. There is fitting here, and it goes into the denominator. But there are likely to be others.
ReplyDelete
Replies
CarrickFebruary 25, 2011 at 5:08 PM
Nick, EOFs are typically built up the singular value decomposition of a covariance matrix of the empirical data... and since the covariance matrix is built up from the product of the column vectors of the empirical measurements (either centered or noncentered), wouldn't adding linear trends to the data be expected to be inherently nonlinear and as well to change the resulting EOFs?
ReplyDelete
Replies
Nick StokesFebruary 25, 2011 at 6:25 PM
Carrick,
I think the answer is yes and no. The E-W method of OLMC10 doesn't seem to change the PCs. There is a call to recon() in which the original AVHRR PC's are supplied, and you can trace down to where the averages are printed, and they are unmodified. I think this is true for the S09 method too. However, the RLS (iridge) method does create a set of new PC's (rls.pcs).

So I guess the operative answer is yes.
ReplyDelete
Replies
AnonymousFebruary 26, 2011 at 12:24 AM
OK, got my head round the geophys PCA terminology thanks to this paper linked from the wikipedia page on EOFs:
http://andvari.vedur.is/~halldor/TEXT/eofsvd.html

So IIUC we have 5592 grid point in the AVHRR (or 1392 in the reduced grid), and data spread over ~300 months?
So we can look at the data as 300 temperature maps or 5592 (1392) time series.

The procedure is then to calculate a covariance matrix of every grid point with every other (i.e. the covariance of the time series between two grid points). The covariance calculation implicitly removes the mean of the time series (or we can explicitly set the mean of each time series to 0 and calculate mean squared differences). This gives us a 5592x5592 (1392x1392) variance-covariance matrix.

We then diagonalise this. The eigenvectors are then the EOFs. The loadings give the contribution of each EOF to each month's map. (Confusingly, geophysicists seem to sometimes refer to the loadings as PCs).

There are 5592 (1392) eigenvectors, each of which is a 5592 (1392) point map. However, since we only have 300 months, I would expect all the eigenvalues past 300 to be ~zero, and the remaining eigenvectors to be noise.

If that's right, then adding a linear trend will change the variance-covariance matrix: the variances will increase, but the covariances won't because we are adding the same linear trend to both time series. The matrix becomes more diagonal. As a result, the EOFs will change.

Hopefully that's correct!
Kevin C
ReplyDelete
Replies
AnonymousFebruary 26, 2011 at 12:45 AM
Nope, wrong again! Both variance and covariance change. But the EOFs still change. KC
ReplyDelete
Replies
CarrickFebruary 26, 2011 at 4:21 AM
Nick, when you say the PCs aren't changes, are you referring to the singular values?

Also, I'd love to see what happens when you do area weighting.
ReplyDelete
Replies

Add comment

An interactive topic index for all Moyhu posts.
Latest Ice and Temperature data
Climate Data Portals
A gallery of Javascript-enhanced graphics
Temperature trend viewer
Google Maps and GHCN
WebGL map of past GHCN/SST station temperatures
WebGL map of GHCN/SST station temperature trends
HiRes NOAA OI SST with WebGL and Movie
Regional Hi-Res SST movies
WebGL Facility
TempLS Guide
More pages, and blog glossary

moyhu

Sunday, February 20, 2011

Ryan's code - testing.

Ryan's code - testing.

15 comments:

Search This Blog

Maintained Pages

Recent Comments

Blogroll

Blog Archive

Translate

Resources

About Me

moyhu

Sunday, February 20, 2011

Ryan's code - testing.

Ryan's code - testing.

15 comments:

Search This Blog

Maintained Pages

Recent Comments

Blogroll

Subscribe To

Blog Archive

Translate

Resources

About Me