moyhu: There's more to life than PC1

Friday, September 26, 2014

There's more to life than PC1

There's PC2, PC3, ...

Recent interest in PCA and paleo has got me doing some stuff I should have done a while ago. I think it is bad that Steve McIntyre and Wegman have been able to maintain focus on just the first component PC1, leading people to think they are talking about reconstructions. They aren't, and that's why, whenever someone actually looks, the tendency of Mann's decentered PCA to make PC1 the repository of HS-like behaviour has little effect on recons. I'll show why.

Steve's post showed Fig 9.2 from the NAS report as an example of an upright PC1. That's got me playing with the NAS code that generated it. It's an elegant code, and easily adapted to show more eigenvalues, and do a reconstruction. So I did.

Mann pointed out many years ago that M&M had used too few PC's in their recon. Tamino explained that PCA simply created a different basis, aligned to some extent with real effects, which may be physical. But there is conservation involved - if HS behaviour is collected in PC1, then it is depleted in PC2,3 etc, and in the recon, it averages out.

And it does. For the NAS example, I'll show how the other PC's do have complementary behaviour, and since the HS effect of decentering isn't real physical, but drawn from other PCs, it doesn't last when you use more PCs, as they do.

The NAS, I should note, did not intend their simple example to have real tree-like properties. They chose a AR1 model of extremely high persistence (r=0.9) - decadal weather - to emphasise the effect of decentering. Wegman said that his Fig 4.4 was based on AR1(r=0.2), which may be more realistic, but wasn't what the code did. Anyway, DeepClimate showed the two plots thus (x axis in years):


5 PC1s generated from AR1(.9) (left) and AR1 (.2) (right) red noise null proxies.

Each curve is the result of decentered PCA (last 100 years) on a set of 50 AR1(r) 'data'. Obviously, the r=0.2 version is much less HS-like, but still some. One of the elegant features of the NAS code is that they calculate the theoretical PC for the model, which is just the first eigenvalue of the autocorrelation matrix. I have some analytic theory here, for a coming post, but I'll just say for now that the HS-ness is about the same for low values of r; for 0.2 it is just beginning to increase.

I'll look at the decentered r=0.9 case for emphasis, and as an extreme worst case. The theoretical eigenvalues from the autocorrelation, calculated NAS-wise, look like this:

The NAS PC1 is in black. You can see how the other eigenvalues are picking up the variation in the HS shaft, and settling into a Sturm-Liouville pattern.. The decentering concentrated the blade variability in PC1.
Update. This actually shows what decentering is doing. The S-L pattern with orthogonal polynomials, say, start with constant 1. So you would in PCA, if you didn't subtract the mean. It reflects the chief common pattern, which is offset from the mean.

But that's not interesting, and subtracting the (centered) mean promotes the second eigenvector to leading. It makes no radical difference, but saves arithmetic.

Decentered, that constant PC comes back, with a kink. Again, it makes little real difference. You may just have to use one extra PC in the recon.

Reconstruction

In reconstruction, we project the data (matrix X) onto the space spanned by some small number of eigenvectors, and calculate some average. In a real recon, it will probably be some spatially weighted average. The weights are not dependent on the PCA; they represent whatever integral you are trying to reconstruct. Here it might as well be a simple average. If L is the nxp truncated matrix of orthonormal eigenvectors (n=data, p= eivecs), then the projection is simply, fourier-like:
y=L*(t(L)*X)
You can see why the sign of PCs doesn't matter. L is there twice. Then the recon is just the row mean of this matrix.

So I'll calculate the 5 recons of another set of random data using just one PC. I've used the NAS HS index to orient, and scaled by standard deviation of the whole curve. I should say that this is not justified at all in general; while PC's are sign insensitive, recons certainly aren't. However, we're reconstructing essentially zero plus noise. I'll then show what it looks like without scaling.

With one PC and the scaling, it looks HS-like, reflecting the PC itself. Without scaling it looks like this:

This reflects the fact that while the PC maximises the magnitude of the data in the new basis, the scalar product with another vector, which is the recon, can be of different size and sign. In this very simple analogue, we're reconstructing zero plus noise. Anything can happen.

OK, now we try a recon with 2 PC's, rescaling again by HS index etc. Still the same data (for all recons, not re-randomised).

Still a bit of HS, but not much. How about 3:

All gone. And three PC's would be a small number of PCs to retain in a reconstruction.

Remember, this was the extreme case of AR1(0.9), where the HS effect on PC1 was very large. But it was not creating HS effect from nowhere. It was just transferring it from other PCs to PC1.

The code, adapted from the NAS code, is here.

Appendix.

McIntyre and McKitrick, in their Energy and Environment paper, 2005, showed their emulation of MBH, with and without de-centering. Brandon has alluded to this in comments. I'll show the plot here:

Top is their emulation of MBH, decentered. The bottom figure combines the effect of centering and removal of Gaspe cedars (which I think is unwarranted). It's a pity they didn't show just the effect of decentering, but even so, it isn't much. It shows that PC1 doesn't have much to do with the outcome.

63 comments:

William M. ConnolleySeptember 26, 2014 at 9:48 PM
I struggle to remember this stuff, but:

> the other eigenvalues are picking up the variation in the HS shaft, and settling into a Sturm-Liouville pattern

Doesn't that mean you're seeing what often happens with lower EOFs, which is there's no real physical structure in there. Its just noise. You're seeing (apart from the leading EOF, and required orthogonality to that, and perhaps for the "blade" bit) what you'd see from any old random noise?
ReplyDelete
Replies
CarrickSeptember 26, 2014 at 10:48 PM
Very interesting post, Nick. While I"d say you're "moving the goal posts" here, you're moving them towards practical questions. So progress.

Can you think of a way to test whether there is an effect on the signal to noise in the reconstruction from the non-centered PCA?

I can imagine this might be an issue for systems with very poor SNRs.

I'm traveling now so I'll post infrequently, but will read with interesting any comments you make.
ReplyDelete
Replies
AnonymousSeptember 26, 2014 at 11:36 PM
Nick,

Reconstructions have a temperature history target (eg instrumental temperature record). What is the target for your reconstruction with synthetic data?
Steve Fitzpatrick
ReplyDelete
Replies
AnonymousSeptember 27, 2014 at 1:19 AM
Nick,

I think lining them up with a specific target series is THE key issue here. A reconstruction based on pink noise synthetic data series and a "flat" target (AKA no signal) will of course generate a flat reconstruction (how could it do otherwise?). Reconstructing based on a rising target near the end of the reconstruction period (as is actually done with real data) seems to me the only meaningful test of bias in the reconstruction method. I am not absolutely certain, but I strongly suspect that if you reconstruct a large collection of synthetic pink (eg r=0.2 or 0.3) noise series (say 100) this way, you will find that they do consistently generate artificial hockey stick reconstructions if the target is a sharp rise near the end of the period. What I am saying is that the automatic adjustment in the sign of the correlation, combined with weighting to optimize the match with the target, guarantees the reconstruction will mimic the target, whatever that target shape is. (I am for sure not the first person to suggest this!)

Steve Fitzpatrick
ReplyDelete
Replies
@whutSeptember 27, 2014 at 8:00 AM
Nick, If you are interested in Sturm-Liouville formulations, again consider following the work at the El Nino project. I am contributing a Mathieu differential equation formulation, which has some promise as an exploratory and potentially predictive model.
.
ReplyDelete
Replies
@whutSeptember 27, 2014 at 11:04 AM
The Sturm-Liouville class of behaviors includes the Mathieu formulation as a variant. These all share the same characteristic of being a form of second-order differential equation distinguished by how the potential well is shaped, thus leading to different oscillating responses. The model of red noise definitely fits into this category as a stochastic variant, as the red noise drag factor defines how steep the well is; so when the random walker tries to climb out of the well it will have a bias to revert to the mean. Of course this works in both directions, so the response is symmetric.

ReplyDelete
Replies
...and Then There's PhysicsSeptember 27, 2014 at 5:30 PM
Nick,
Maybe you can clarify something for me. As I understand it, by using AR1(0.9) they were producing random datasets that actually had hockey stick-like features in them because of the persistence. Is this right? So, there is an issue with de-centered versus centered, but at the end of the day, the method will only produce a hockey-stick if there is such a feature in the data being analysed (whether that data is randomly generated or not). If so, the physically motivated question then becomes whether or not such a feature could be climatic in the absence of some kind of change in external forcing (anthropogenic for example). Given that I think the answer to this is, generally speaking, no, that would seem to indicate that discovering such a feature is indicative of some kind of change in external forcing which, for the 20th century, is largely anthropogenic. Is that a fair assessment or am I missing something?
ReplyDelete
Replies
Kevin O'NeillSeptember 27, 2014 at 8:07 PM
Nick, do you have an opinion on why some are so hesitant to admit that the AR1 lag coefficient used to generate the PC1 'hockey stick' was ≈ .9? Consider the following exchange with Brandon Shollenberger, this is not the first time I've run into this resistance - just the most recent.
-----------------
KTO: "Brandon, as the NRC/NAS study showed, they had to use AR1(.9) to achieve the result M&M got with their ‘persistent red noise’ model or that Wegman got with his AR1(.2). So we’re comparing AR1(.2) to AR1(.9) The implied persistence is 1.5 years versus 19 years."

Brandon: "This is complete BS as anyone who reads our conversation knows. Leaving aside the fact you are simply making things up when you say": (kto)"the NRC/NAS study showed, they had to use AR1(.9) to achieve the result M&M got"
-------------------

I've provided Brandon with the following references

” Figure 9-2 shows the result of a simple simulation along the lines of McIntyre and McKitrick (2003) (the computer code appears in Appendix B).”
NAS Report Page 90

http://www.nap.edu/openbook.php?record_id=11676&page=90

phi <- 0.9;
NAS Report Appendix Page 140

http://www.nap.edu/openbook.php?record_id=11676&page=140#p200108c09970140001
ReplyDelete
Replies
Brandon ShollenbergerSeptember 28, 2014 at 6:24 AM
Steve McIntyre and Ross McKitrick have never disputed a hockey stick can be found in the data. Their published work specifically says if you implement PCA properly, you get a hockey stick as NOAMER PC4. If you keep four PCs for the NOAMER network instead of the two MBH kept, you do get a hockey stick. This point has been well established for nearly a decade.

As such, when you say:

I think it is bad that Steve McIntyre and Wegman have been able to maintain focus on just the first component PC1, leading people to think they are talking about reconstructions. They aren't, and that's why, whenever someone actually looks, the tendency of Mann's decentered PCA to make PC1 the repository of HS-like behaviour has little effect on recons. I'll show why.

You are guilty of what you accuse others of. Nearly ten years ago, McIntyre clearly laid out the effect of MBH's de-centered PCA, explained what effects it had on the final results and showed what results one could get when using several alternatives. You haven't done anything like that. You haven't even attempted to discuss the effect various things would have on MBH's results. You haven't even discussed what other data was used by MBH. It's easy to do, so I will. Here is a post where I showed all the proxies used by MBH in their 1400 step:

http://hiizuru.wordpress.com/2014/02/18/manns-screw-up-3-statistics-is-scary/

It shows there were two proxies with a hockey stick shape. One was what is commonly known as the Gaspe series. MBH used it two times. One of those times, it was included in the NOAMER network from which the famous PC1 is calculated. The other time, it was used on its own as a standalone proxy. That time, it was artificially extended so it could be used in the 1400 step. None of this was disclosed by MBH, much less explained (Further discussion and references here). The other was NOAMER PC1.

As long as you include either of those proxies, MBH's methodology will produce a hockey stick. The other 20 proxies used along with them are practically irrelevant. If MBH had implemented it's PCA properly and included only the first two PCs, it would still have gotten a hockey stick because of the (inexplicably duplicated and extended) Gaspe series. If they had implemented PCA properly and removed the Gaspe series, they could have still gotten a hockey stick by including the first four NOAMER PCs.

All of that was established a deacde ago. In the end, the discussion comes back to the 22 series I plotted in the post linked to above. Maybe you think a single tree ring series can be copied out of the NOAMER network, artificially extended so it meets an inclusion criteria then used on its own. Maybe you think we should include as many PCs as we need for the NOAMER network to produce a hockey stick. I don't think either position is justifiable, but whatever. At best, you can come up with two proxies that have a hockey stick shape. Two out of 20+ is ~5%.

MBH rescales proxies by their correlation to the temperature record. That means as long as a single proxy has a hockey stick shape, MBH's methodology will produce a hockey stick. It doesn't matter if that hockey stick come from 5% or less of their data. People focus a lot on the PCA step, but the reality is MBH's rescaling by correlation is even more biased. It's like the screening fallacy on steroids.
ReplyDelete
Replies
Nick StokesSeptember 28, 2014 at 9:03 AM
OT, but I see that Climate Audit has a new post titled "What Nick Stokes Wouldn’t Show You". I've written a response, which has gone into moderation. This can take a while at CA, especially as I believe Steve has a family visit. So I'll record it here:

"Some ClimateBallers, including commenters at Stokes’ blog, are now making the fabricated claim that MM05 results were not based on the 10,000 simulations reported in Figure 2, but on a cherry-picked subset of the top percentile."

Are you denying that Wegman's Fig 4.1 and 4.4 are showing results that had been selected by an undisclosed step wherein only the top 100 of 10000 were sampled?

Are you denying that the curve in Fig 1 of the GRL paper, described as
"Sample PC1 from Monte Carlo simulation using the procedure described in text applying MBH98 data transformation to persistent trendless red noise "
was also # 71 in your set of 100 selected from 10000 on the basis of HS index?

Are you denying that the set of 100 PC1s placed on the GRL SI, described thus
"Computer scripts used to generate simulations, figures and statistics, together with a sample of 100 simulated ‘‘hockey sticks’’ and other supplementary information, are provided in the auxiliary material "
were also the result of this 100 from 10000 selection procedure?

I think there are things you are not telling us. I hardly need mention of the awesome disapproval of the commercial world for this sort of thing. Sarbanes-Oxley and all that.

Re
"Stokes knows that this is untrue, as he has replicated MM05 simulations from the script that we placed online and knows that Figure 2 is based on all the simulations;"

I said at the start of my original post
"I should first point out that Fig 4.2 is not affected by the selection, and oneuniverse correctly points out that his simulations, which do not make the HS index selection, return essentially the same results. He also argues that these are the most informative, which may well be true, although the thing plotted, HS index, is not intuitive. It was the HS-like profiles in Figs 4.1 and 4.4 that attracted attention."

SM: "In today’s post, I’ll show the panelplot that Nick Stokes has refused to show. "
No, it's not that. It's again stratified by HS index, which is your artificial creation. Why not just show, as I did, a random sample, unselected, as output by your program? You could undertake the artifice of inverting by HS index, as Brandon has been demanding. I don't think it's the right thing to do, but it won't make much difference.
ReplyDelete
Replies
Nick StokesSeptember 28, 2014 at 11:23 AM
Thanks, Deep,
The issue of PC retention is one I've been trying to illustrate here. Indeed, they over-truncated.

I think the removal of Gaspe cedars in 1400-1450 is a big part of that discrepancy. Not only is it a thumb on the scales, but as I think they said, it leaves too little data to say anything safely about the period.
ReplyDelete
Replies
AnonymousSeptember 29, 2014 at 1:27 PM
This a-hole Blogspot erased my first comment because it claimed I did not own the Wordpress ID that I do indeed own so I am commenting as Anonymous.

I don't think that you are creating genuine Mannian proxies here for the decentered case.

The methods section at the end of MBH98 indicates that the standardization of the proxies is done within the calibration interval, i.e. the proxy should be divided by the SD of the proxy of the calibration period rather than the SD over the entire data period so that it matches the calibration temperature data. You R script however uses the incorrect SDs thereby decentering the values, but basically retaining a standard variability of the proxies.

Note that a proxy with low variability in this time period will be greatly exaggerated by this procedure.

RomanM
ReplyDelete
Replies
AnonymousSeptember 30, 2014 at 12:39 AM
What I would like to see are reconstructions applied to artificial data: e.g., take a model that is run with our best estimates of forcing, add in tree locations & times based on where we have data, use an algorithm to transform local temperatures into tree ring data (with whatever noise is considered appropriate), see how well the reconstruction matches the actual global temperature series.

Then, take sample periods from the model spin-up where no forcing is applied, and do the same thing. See if the reconstructions automatically insert hockey-stick nature or not.

It seems like those two experiments combined would be very informative about the value of any reconstruction approach...

-MMM
ReplyDelete
Replies
AnonymousOctober 1, 2014 at 2:13 AM
Nick "Still a bit of HS, but not much. How about 3: ... All gone. And three PC's would be a small number of PCs to retain in a reconstruction."

Instead of 5, plot 300. Plot the average. Hockey sticks remain in the aggregate.
ReplyDelete
Replies
willardOctober 1, 2014 at 6:03 AM
Nick,

Have you just resurrected Ron [1]?

That must be important.

[1]: http://neverendingaudit.tumblr.com/tagged/RonBroberg
ReplyDelete
Replies
AnonymousOctober 1, 2014 at 11:51 AM
I think this is the figure you are trying to show at CA (AW2007 Fig 2)?

http://www.nar.ucar.edu/2008/ESSL/catalog/cgd/images/ammann1.jpg

"Fig. 2 Correction of MBH99: our emulation of the real world proxy-based MBH99 reconstruction containing full-period proxy PC-centering corrections and omission of the Gaspé-series during 1400–1449 (solid grey line) is compared to the original MBH99 reconstruction (black line)"

Also from the text: "The overall correction is minimal and averages to roughly 0.04°C over the millennium, but is somewhat larger during 1400–1449 because of the Gaspé-series removal during this time interval."

Here is a chart showing MM2005 with the other two.

https://deepclimate.files.wordpress.com/2014/09/aw2007-mbh99-mm2005-chrt.jpg
ReplyDelete
Replies
AnonymousOctober 1, 2014 at 11:58 AM
Hmmm ... retrying those two links:

="http://www.nar.ucar.edu/2008/ESSL/catalog/cgd/images/ammann1.jpg">AW2007 Fig 1

="https://deepclimate.files.wordpress.com/2014/09/aw2007-mbh99-mm2005-chrt.jpg">MBH99 AW2007 w/MM2005
ReplyDelete
Replies
NoneOctober 2, 2014 at 6:18 AM
Nick, I have personally had this discussion with you previously on your blog. It is incredibly disingenuous to say it's a thumb on the scales to remove Gaspe when the correct statement of affairs is that its a thumb on the scales to INCLUDE Gaspe. MBH performed unique early period padding to make sure it was included, a fact not disclosed until subsequent to the initial MM response. To claim the addition of a single extra strip bark proxy (which is recommended by many sources NOT to be used as a temperature proxy in the first place), makes it possible to say anything safely about the period is a joke. Hahah. Or are you trying to be serious...?
ReplyDelete
Replies
NoneOctober 2, 2014 at 7:10 AM
Btw, I just went back to review some of the comments from the exchange we had (it was the "mcintyre mann and gaspe ceders" post). Having refreshed my memory, I now find the situation is even worse for Mann (and your implied criticism of McIntyre). Not only, as I had remembered, did Mann include the Gaspe data in the early period by "putting his thumb on the scales" and padding the data so it was included, but the number of tree cores for that period was between 1 and 2. So essentially you are claiming that it's the removal of these two strip bark tree cores that "leaves too little data to say anything safely about the period". Or are you really claiming that the addition of 1-2 tree cores of a tree type unsuitable for use as a proxy, by specially padding the data to make sure it gets included, is good practice ? The idea that this passes as science...
ReplyDelete
Replies
Nick StokesOctober 2, 2014 at 7:23 AM
Well, we've been through all this before. The Gaspe cedars had 46 years of data out of fifty. Mann was right to make use of that data. To put it another way, it is wrong to discard it. Whether or not he was consistent, it was still the right thing to do.

But for the moment, it's a methods issue. We're comparing what the algorithm does with and without decentering. That comparison should be made on the same datasets. What M&M have done is remove Gaspe, which actually makes not a huge difference to the expected value, but greatly increases the uncertainty. Basically, the '98 recon should start whenever you deem Gaspe to be available. If you want to be pernickety, that could be 1404 rather than 1400. But scientifically, the worst option is to insist on removing it but retain the period, get a meaningless result, and then say - hey there's a deviation. Decentering must be bad...
ReplyDelete
Replies
NoneOctober 2, 2014 at 7:41 AM
Nick you are right in that we've been through this before.
o. Mann did not include other series in periods where there were even fewer missing years.
o. Noone, absolutely noone at all, would consider 1-2 tree cores a valid number of cores to provide any meaningful signal over a given period yet that's what Gaspe had for the period
o. Many people would say strip bark tree proxies should not be included full stop

I think most people, having been made aware of those facts, would suggest that it's special inclusion in the 1400-1450 step was more likely to make the result less meaningful.

I think almost everyone would agree that these problems should be mentioned, to at least let people make up their own minds.

Only climate scientists accuse McIntyre of "putting his thumb on the scales" for actually pointing out these problems and showing that its removal makes a significant difference to the reconstruction.
ReplyDelete
Replies

Add comment