Thursday, February 17, 2011

Ryan's code - S09 with more PCs

In my last post, I was stuck. With help from Carrick (see update), I've been able to get a working copy of the code. I've revised the HTML markup file.

A full run takes about 46 min on my 32-bit Windows setup. Most of that is iridge. If I take out just the parts for S09, it's about 80 sec. So I did that, and started tinkering with the number of retained PCs. Ultimately, I want to see how much of the difference between RO10 and S09 is due to this factor.

For the moment, I have just plotted the all spliced trend maps for the period 1957-2006 for retained PCs from 3 to 7, using the RO10 implementation of Steig's method. Here they are:

3 PCs retained

4 PCs retained

5 PCs retained

6 PCs retained

7 PCs retained

8 PCs retained

9 PCs retained

10 PCs retained
Here for comparison is Ryan's recent plot from the Air Vent (which seems to have its puff restored).

As soon at Steig's method retains more than 3 PCs the trend plot comes to rather more resemble Ryan's, at least for 4:6 PCs retained. With 7 there is a change, which suggests that perhaps noise is becoming a factor.

Update. Here are the average trends for the various regions, in C/Decade. They are comparable to the values in Ryan's figure. Note that with higher PC numbers, there is some movement towards Ryan's values, although a significant difference remains. There is also a big change at 7 PC's, suggesting that this is the noise limit.

Update 2. I have added the cases 8:10 PCs retained. It seems to confirn that things get ragged once more than 6 Pcs are retained - possibly noise, though Ryan hints in comments that something else is involved. 

East West Peninsula Continent
3 PCs 0.09663 0.1911 0.1253 0.1180
4 PCs 0.09361 0.1723 0.2463 0.1167
5 PCs 0.07966 0.1667 0.2533 0.1054
6 PCs 0.07435 0.1632 0.2709 0.1014
7 PCs 0.11340 0.2364 0.2988 0.1473
8 PCs 0.1322 0.2245 0.3053 0.1590
9 PCs 0.1719 0.2575 0.2860 0.1949
10 PCs 0.1605 0.1083 0.2100 0.1514


  1. Nick, these images are very helpful to the layperson in understanding the differences between the Steig'09 and O'Donnell'10 approaches.

    RomanM displayed maps for the separate trends represented by PC1, PC2, and PC3 in O'Donnell's emulation of Steig'09's algorithm. They are consistent with your map of PCs 1-3. Together, the two sets emphasize the limits of the spatial resolution of the implementation of Steig'09.

    The interesting question discussed by Carrick, SteveF, you, and others at Lucia's is "what is/are the correct method(s) for determining the number of PCs to include in the analysis?"

    Rule of thumb? Heuristic? A derived parameter that highlights the range of acceptable choices by quantitating "stability" and discriminating "signal" from "noise"?

    The Mark One eyeball would suggest a number between 4 and 6 in this case. But can that choice be shown to be objectively superior to 3 or fewer (or >6)?

  2. As in all statistical arguments there is no "correct", there might be a best and there certainly are better.

  3. Nick, thanks greatly. Eli's take home from this is that the 4, 5 and 6 PC's retained are quite consistent and tell pretty much the same story, while the OO10 pushes all the warming into the peninsula area and the OS09 smears it out. Is that reasonable?

  4. Nick,

    Very nice.

    A couple of things to think about (I will not tell you my opinions; rather, I think it will be interesting to see whether you reach the same conclusions on your own):

    1. Both the number of AVHRR PCs you retain and the "n.eof" setting in emFn() matter. Experiment with both. See what happens.

    2. It is useful to compare the "spliced" vs. "unspliced" reconstructions for S09.

    3. Try HUGE numbers of PCs. Like 200. Or 288.

    4. Rather than running our verification statistics scripts, try to figure out how you would determine what is the appropriate number of PCs to retain.

    I'm glad you got the scripts working and you are playing around with it. This is the part that was fun for me . . . once we figured out how to replicate S09, then we started to play. And that's where all the learning occurred.

  5. One last thing. You suggest 7 PCs as a noise limit because of a major change. As you play, think of whether there might be an alternative reason, what that reason might be, and how you would verify whether it is true or not.

    Also play with the differences between the "progressive" and normal settings on emFn().

    So many things to play with! Much fun . . .

  6. I said "one last thing", but I lied. If you have any questions about how something works, you can feel free to contact me via email.

    It's kind of cool to see someone else making posts with our scripts, by the way!

  7. Thanks, this is a really interesting post - it's surprising to see the difference between 3pcs and 4+ ...

  8. By the way, Nick, do you have a copy of our paper?

    Remember, there are 3 primary issues we outline WRT S09. Number of AVHRR PCs is only part of one of them.

  9. Thanks Nick. I was thinking of doing the same thing. What got me curious was Steig's review claim (refering to the initial TTLS version of O10) that the differences between O10 and S09 were overwhelmingly due to the number of retained EOF's.

  10. Nick, one way you can look at the effect of noise is to add a known amount of random white noise to one or more temperature sensors. Ryan's trend analysis is one way of doing this, but this lets you measure the impact across all frequencies of interest.

    That disambiguates structural shifts in the PCs from noise amplification issues.

    Also, generating your own simulated data so you know what the "truth" is, is another way of measuring the spatial distortion associated with truncation of the series. Unfortunately that starts resembling a day-job type chore in a hurry.

  11. Very nice Nick. Did I understand correctly that you will also add synthetic regional trends to test for fidelity as Ryan O did?


  12. Eli,
    "OO10 pushes all the warming into the peninsula area and the OS09 smears it out."
    Well, at least you got the second half right. O(10) retains the peninsula warming on the peninsula, consistent with the temperature trends reported by the manned ground stations on the peninsula.


  13. Ryan

    You have not explained why there was a need to attack Steig personally, as if you were on a holy crusade to slay the devil.

  14. If Eli looks at the maps with 4, 5 and 6 PCs retained not all of the warming is on the peninsula and most of the cooling is where OD says it should be. Wanna try again?

  15. Eli,

    Perhaps you might want to look here:

    before asking people to try again.

  16. Eli,
    "If Eli looks at the maps with 4, 5 and 6 PCs retained not all of the warming is on the peninsula and most of the cooling is where OD says it should be. Wanna try again?"
    Maybe if you can bring yourself to do it, it would be helpful to read Ryan's summary at the Air Vent. The issue is really very simple: the original S(09) reconstruction showed peninsula warming that was much lower then the well known and documents warming on the peninsular, while the O(10) reconstruction shows warming on the peninsula in line with the measured warming there. When you said "O10 pushes all the warming into the peninsula area" that was simply not correct. But you were correct that S(09) did blend the peninsula warming with the rest of the continent, especially West Antarctica. Until/unless Nick or someone else run a sensitivity test on the S(09) method modified to include more PC's it is hard to say if 4, 5, or 6 PC with S(09) will yield reasonable fidelity.


  17. Ryan,
    I do have the versions of your paper that you posted on the CA directory, thanks. I don't have your email address though. Mine is here.

    I did modify the n.eof setting to be consistent. I'll post my modified code shortly. My next plan (SteveF) is to do the sensitivity tests you have been describing.

  18. notice that the continental warming is about the same up to 6 PCs, which is an interesting point.

  19. Nick,
    "My next plan (SteveF) is to do the sensitivity tests you have been describing."
    That will be very interesting to see; I look forward to it.


  20. I don't understand the algorithms well enough to know if this is relevant - you both almost certainly know this already, but I'll say it just in case...

    The other main way of objectively establishing how many PCs to retain is cross validation - you leave out some of your data when fitting the PCs to the data, and then see how well the resulting reconstruction matches the omitted data. Vary the number of PCs, and pick the number of PCs which gives the best prediction of the omitted data.

    You could leave out some stations completely, or you could leave out part of a station history.

    It may well be that the result is too noisy to make a reliable determination. Then you need to do it multiple times omitting different stations to build up a more accurate estimate of predictive power vs PCs.

    The limitation of this method is when the data are correlated. e.g. if there are two stations very close together and so highly correlated, leaving out one is uninformative. In this case you may need to sort stations into correlated batches and leave out a batch at a time.

    That's what we do in our field at least. Whether it applies to yours I leave to you to judge.

    Kevin C
    Kevin C

  21. Carrick:

    I'm interested in your suggestion of adding noise to a dataset. I might have a use for that. Can you suggest a source (preferably general and not too advanced) where I could read about this approach?

    Kevin C

  22. Oh, maybe I get it. Here's a though experiment:

    Suppose we have a planet warming at a uniform rate, with a tilted axis giving rise to seasons. We have a load of weather stations. There is lots of noise due to weather.

    If we calculate PCs, then hopefully we will see the first two PCs containing some combination of linear increase and annual cycle, and the rest will account for the noise.

    If we add noise to any station (or to multiple stations), then the first two PCs will be unaffected, but all the noise PCs will get completely shuffled.

    Is that how it works?
    Kevin C

  23. Nick,

    You are correct that I am hinting something else is involved. If you want more than hints, I'm glad to help . . . but if you prefer to arrive at your own answer, I will be quiet.

    I will only provide one additional hint at this point: kgnd and the number of AVHRR PCs to retain are two different things. For our reconstructions, we retain 135 PCs. The "7" number that has been tossed around all over the blogosphere is not the number of AVHRR PCs, but rather the "n.eof" setting in emFn().

  24. Nick, sent an email to what was (I think) your email address. If you didn't get it, let me know and I'll re-try.

  25. Kevin C, I'd probably use the AVHRR to construct say a 2-d spatial Fourier expansion, on a month-by-month basis. Then, compute the Fourier transform from 1982-2006 on a coefficient by coefficient basis.

    Once you have that, throw away the phase info (keep the same Fourier amplitude) and replace the phase with uniform white noise (bounded by +/- pi). Inverse Fourier transform each coefficient (this will give rise to hopefully realistic time-varying coefficients of the 2-d spatial field), which can then be used to construct a synthetic temperature field.

    It sounds complicated, but it could probably be done with a reasonably few number of lines of matlab code.

  26. Carrick,

    Nic Lewis and I have done that with much success when evaluating different regression methods for infilling instrumental data. Slick, fast, and preserves both the covariance and autocorrelation structure of the original data.

  27. Nick, Congratulations on an excellent thread. It is good to have a place for dispassionate technical discussions about the math & stats involved. And I like your hyper-linked version of Ryan's code - very user friendly!

    I have another suggested test that you might like to use for comparing methods for infilling the Antarctic ground station data. You need a station set that includes both Byrd manned station and Byrd AWS. So you could either use the OLMC10 63 station set, or just add Byrd AWS to the S09 42 station set. Then compare the trends of the infilled Byrd manned station and AWS. They should in principle be the same (although their absolute temperatures need not be: the two stations locations are 2.5km apart and they may have different microclimates, sensor heights, etc).

    I think that you will find the relationship between the two Byrd station trends at different levels of RegEM TTLS truncation levels quite interesting. Then compare those differences with the difference using iridge.

  28. Carrick, Ryan - oh, yes; that sounds like a good idea. Keeping the temporal amplitudes gives us the right sort of time variation.

    Do we maintain the spatial correlations too? Yes, I think so, because the range of amplitudes as a function of spatial frequency is also preserved.

    The only thing you might lose is any implicit phase relationships in the spatial FT corresponding to systematic features in the original spatial data. For example large flat areas (e.g. sea?) lead to relationships between nearby phases in Fourier space. (This comes from the convolution theorem - if you can multiple your 2d dataset by a mask function without changing it, then you can convolute the FT with the FT of the mask. If the mask is low resolution, its FT is concentrated around the origin, so the FT makes relationships between nearby phases.) These phase relationships will be lost. My gut feeling is that it won't make a significant difference though.

    Kevin C

  29. Kevin,

    I haven't found it to make a difference when infilling other temperature series, be it highly spatially coherent (like Japan) or incoherent (all of North America). It seems to work quite well.

  30. KevinC, that is a very good question. It's something I would have to look at, to be sure. The short answer (really guess) "is yes I think they would", but it would need to be examined more carefully.

    Cool to hear that Ryan and Nic have already done this. It's the starting place for any sort of realistic evaluation of a method, it's strengths and weaknesses.

    The next stage of a Monte Carlo analysis is to add realistic signal distortions, like loss of data for part of the year, instrument moves, change of instruments, etc. Ideally, you'd like to put bounds on the error that these introduce to your signal.

    Finally, you'd probably want to your method to enforce causality in your data. AVHRR almost certainly is going to have an acausal component (I mean in the signal processing sense, not the physics faster than light sense). I achieve this, when needed, by "wavenumber filtering" my signal, and tossing out unphysical/unrealizable portions of the Fourier domain.

    How well this sort of methodology would work when applied to these sorts of met data would be a research project.

  31. NicL,
    Thanks for that. I'll look at that test. I've been looking at the test currently done for S09 with Byrd and Russkaya, and I see that Byrd AWS is not in the set usually used for S09. And Byrd manned seems to cover only about the first 15 years from 1957, and Russkaya a similar period. So just adding trend to the existing (manned) data wouldn't be expected to make a big perturbation overall (which is what I'm finding). Then it seems the test must be done including AWS somehow.

  32. Nick,

    Exactly. This is why, for the test I did, I spliced the manned and AWS Byrd together (both for our reconstruction and S09). Otherwise, it's almost Q.E.D. that there is no difference.

    If you splice them together, then you can add trends and expect a response. But you have to splice first.

  33. This may be obvious for the smart people among us, but can't this method it be applied to other parts of the world? We have AVHRR SST since 1982. Ron Broberg's GSOD data (and presumably the upcoming GHCN updates) are very dense for certain decades. Shouldn't we be able to construct a Lukewarmer approved method for interpolating the globe? I know that the NCDC analysis does something similar, but would there be benefits to this method?

    Sorry if this is somewhat off topic.

  34. CCE,
    Yes, I think it might be useful for the Arctic, and maybe places like Central Africa.

    The key idea is the use of the satellite eigenvectors (EOFs)as fitting functions for sparse station regions. That is diluted when those EOF's get to look like any other set of smooth functions that you might choose. I suspect that would happen if it was tried on a global scale. But it would be good for other specific regions.

  35. Every region becomes sparse the further back you go, so I think it would be useful just about everywhere. You could divide the world into logical regions and then assemble the results.

    Speaking of the Arctic, how would it work with sea ice? The AVHRR data would only work for areas and times of the year where it is guaranteed to be sea ice or guaranteed to be open water. I would think that (and the sea-ice covered areas down south) would be the trickiest of all.

  36. CCE,
    You might like to look at a post I did at tAV a year ago:

    I combined long record GHCN station data with satellite TLT data (available worldwide except near the poles, unlike AVHRR data) to generate a spatially complete temperature history using RegEM for station infilling and RLS for spatial reconstruction.

  37. Nic,

    That's interesting. I'm curious as to why you didn't use the Reynolds SST, which would give you a much better correlation than TLT.

    For land, there is Ron Broberg's GSOD data, which isn't 100% spatially complete, but a lot denser than GHCN after the 1970s. e.g.

    There is also the MODIS/Terra LST since March 2000 (not sure if that's long enough). It has its own various problems due to changing land surface, but it might also be an option.

    For your long temperature series, you could choose stations that are "really rural" -- ones that don't show up in or near the union of all the various Urbanity proxies (such as those that Ron Broberg or Steve Mosher have been working on).

    You could process various regions seperately to avoid some of the problems you encountered. You might also choose the datasource that that works best for a particular area.

  38. cce,

    At the time I was doing htis work I had a relatively limited knowledge of the available datasets (I still do, but less so!), and was really just experimenting. I had since concluded that using AVHRR rather than TLT satellite data would make much better sense for SST, but I haven't got around to doing so.

    The Reynolds SST data is already a composite of satellite and in-situ reading, made using a different method from mine, so it wouldn't really makes sense to use it, IMO.

    I agree that the MODIS land temperature data may be preferable to the TLT data, but I doubt the length of data available is long enough yet.

    I agree that processing data regionally would improve results.

    I did try to pick truly rural stations in North America, insofar as I could. Elsewhere, in most regions I couldn't find very many long record truly rural stations in the GHCN database. The GSOD data at The Whiteboard you refer to (posted some time after my work) seems to have very limited global coverage pre 1950, and therefore may not be of much help.

  39. Nic,

    As far as I know, the Reynolds OIv2 data is a pure satellite product (1982+). This is in contrast ERSST (also Reynolds) which is a traditional SST analysis.

    I was offering the GSOD data as an alternative to using satellite data for the land. There are 3+ decades of relatively dense land measurements. Applied to a grid, you'd get a pretty good representation of almost all of the land surface. In other words, establish the spatial relationships with the GSOD data, and then use a "really rural" subset of GHCN to extend it backwards in time.

  40. cce,

    The reference in the readme file at the URL you gave is to a 2006/7 paper by Reynolds, which says in-situ data is also used. AMSR data is used as well as AVHRR in recent years. But the dataset may indeed provide a better spatial correlation matrix than AVHRR data alone.

    I had misunderstood what you were suggesting about the GSOD data. I will look at this when I return to working on this area.

  41. Nic,

    I think the in situ data is used to correct the AVHRR and AMSR measurements. I may be wrong as the Reynolds paper isn't completely clear, but the only time the in situ data is described, it is with regard to correcting biases in the satellite data.

  42. Nic ans cce

    On your OIv2 discussion look in the Reynolds Et Al 2002 paper. It appears to be a paper showing the method of the OIv2 method when it was first made and might answer your questions: