Recent interest in PCA and paleo has got me doing some stuff I should have done a while ago. I think it is bad that Steve McIntyre and Wegman have been able to maintain focus on just the first component PC1, leading people to think they are talking about reconstructions. They aren't, and that's why, whenever someone actually looks, the tendency of Mann's decentered PCA to make PC1 the repository of HS-like behaviour has little effect on recons. I'll show why.
Steve's post showed Fig 9.2 from the NAS report as an example of an upright PC1. That's got me playing with the NAS code that generated it. It's an elegant code, and easily adapted to show more eigenvalues, and do a reconstruction. So I did.
Mann pointed out many years ago that M&M had used too few PC's in their recon. Tamino explained that PCA simply created a different basis, aligned to some extent with real effects, which may be physical. But there is conservation involved - if HS behaviour is collected in PC1, then it is depleted in PC2,3 etc, and in the recon, it averages out.
And it does. For the NAS example, I'll show how the other PC's do have complementary behaviour, and since the HS effect of decentering isn't real physical, but drawn from other PCs, it doesn't last when you use more PCs, as they do.
The NAS, I should note, did not intend their simple example to have real tree-like properties. They chose a AR1 model of extremely high persistence (r=0.9) - decadal weather - to emphasise the effect of decentering. Wegman said that his Fig 4.4 was based on AR1(r=0.2), which may be more realistic, but wasn't what the code did. Anyway, DeepClimate showed the two plots thus (x axis in years):
|5 PC1s generated from AR1(.9) (left) and AR1 (.2) (right) red noise null proxies.|
Each curve is the result of decentered PCA (last 100 years) on a set of 50 AR1(r) 'data'. Obviously, the r=0.2 version is much less HS-like, but still some. One of the elegant features of the NAS code is that they calculate the theoretical PC for the model, which is just the first eigenvalue of the autocorrelation matrix. I have some analytic theory here, for a coming post, but I'll just say for now that the HS-ness is about the same for low values of r; for 0.2 it is just beginning to increase.
I'll look at the decentered r=0.9 case for emphasis, and as an extreme worst case. The theoretical eigenvalues from the autocorrelation, calculated NAS-wise, look like this:
The NAS PC1 is in black. You can see how the other eigenvalues are picking up the variation in the HS shaft, and settling into a Sturm-Liouville pattern.. The decentering concentrated the blade variability in PC1.
Update. This actually shows what decentering is doing. The S-L pattern with orthogonal polynomials, say, start with constant 1. So you would in PCA, if you didn't subtract the mean. It reflects the chief common pattern, which is offset from the mean.
But that's not interesting, and subtracting the (centered) mean promotes the second eigenvector to leading. It makes no radical difference, but saves arithmetic.
Decentered, that constant PC comes back, with a kink. Again, it makes little real difference. You may just have to use one extra PC in the recon.
ReconstructionIn reconstruction, we project the data (matrix X) onto the space spanned by some small number of eigenvectors, and calculate some average. In a real recon, it will probably be some spatially weighted average. The weights are not dependent on the PCA; they represent whatever integral you are trying to reconstruct. Here it might as well be a simple average. If L is the nxp truncated matrix of orthonormal eigenvectors (n=data, p= eivecs), then the projection is simply, fourier-like:
You can see why the sign of PCs doesn't matter. L is there twice. Then the recon is just the row mean of this matrix.
So I'll calculate the 5 recons of another set of random data using just one PC. I've used the NAS HS index to orient, and scaled by standard deviation of the whole curve. I should say that this is not justified at all in general; while PC's are sign insensitive, recons certainly aren't. However, we're reconstructing essentially zero plus noise. I'll then show what it looks like without scaling.
With one PC and the scaling, it looks HS-like, reflecting the PC itself. Without scaling it looks like this:
This reflects the fact that while the PC maximises the magnitude of the data in the new basis, the scalar product with another vector, which is the recon, can be of different size and sign. In this very simple analogue, we're reconstructing zero plus noise. Anything can happen.
OK, now we try a recon with 2 PC's, rescaling again by HS index etc. Still the same data (for all recons, not re-randomised).
Still a bit of HS, but not much. How about 3:
All gone. And three PC's would be a small number of PCs to retain in a reconstruction.
Remember, this was the extreme case of AR1(0.9), where the HS effect on PC1 was very large. But it was not creating HS effect from nowhere. It was just transferring it from other PCs to PC1.
The code, adapted from the NAS code, is here.
Appendix.McIntyre and McKitrick, in their Energy and Environment paper, 2005, showed their emulation of MBH, with and without de-centering. Brandon has alluded to this in comments. I'll show the plot here:
Top is their emulation of MBH, decentered. The bottom figure combines the effect of centering and removal of Gaspe cedars (which I think is unwarranted). It's a pity they didn't show just the effect of decentering, but even so, it isn't much. It shows that PC1 doesn't have much to do with the outcome.