Thursday, March 11, 2010

On Polynomial Cointegration and the death of AGW

Well, I never thought I'd write a post with such a title. And I must confess that until about two weeks ago, these terms were very foreign to me. But recently, a paper by the econometricians Beenstock and Reingewertz has been circulating among sceptic blogs. It seems unpublished, but with the file called "Nature_Paper091209.pdf" there has been some jumping to conclusions. There was a brief discussion at WUWT, and more recently, David Stockwell has run a series on the topic.
Update.  Bart has commented has commented and linked an interesting discussion on his blog. Tamino has taken this up with a very thorough discussion (with sequels). The first properly sceptical discussion of the Beenstock paper was by Eli Rabett

Further update. VS has noted that the much more significant thread on Bart's blog (now much discussed in itself) is here.

Ringing phrases like:
"Therefore, greenhouse gas forcings, global temperature and solar irradiance are not polynomially cointegrated, and AGW is refuted."
ensure that those who like that sort of thing will keep it going for a while, so I thought I should try to figure it out. I'm still not an expert, but I hope what I did figure out might help.

Time series and unit roots

The theory is derived from the abstract notion of modelling a stochastic time series by a recurrence relation:
y_t + c1*y_{t-1} + ... + cn*y_{t-n} = ε_t
where ε_t is a stationary stochastic variable with zero mean (serially uncorrelated, constant variance).

This can be re-expressed symbolically using the backward difference operator
Dy_t = y_t-y_{t-1}
(a0 + a1*D + a2*D^2 +...+ D^n) y_t = ε_t
This is just binomial algebra

This is the undriven form, but a non-stochastic driver could be introduced. The corresponding non-stochastic recurrence relation (with ε_t=0) has n power solutions of the form b^t (t=0,1,2,...), and the b's are solutions of a characteristic n-th order polynomial. It is expected that these will all have magnitude not greater than 1, otherwise the process would have blown up.

The case where one or more roots are equal to 1 is called the unit root. It's important because the behaviour is qualitatively different. Instead of disturbances decaying with time, the process can just wander. It is a random walk.

Testing for unit roots

The point of the second difference form is that there is a clear criterion for unit roots. There is one root if a0=0, and the series is called I(1). There is a repeated root if a0=0 and a1=0, and the series is called I(2).

Now although people who like this sort of thing like to talk of asymptotic behaviour, there is usually no known underlying mechanics to justify this. There is just a finite period of observations. So the coefficients are never completely known (or of course, even whether it is an exact model). You can't get true asymptotics from a finite observation period.

So you can never say that a0=0. All you might be able to say is that a0 is bounded away from zero - significantly different at some level of confidence, say 95%. That's a test, and variants include the Dickey-Fuller test, and the augmented ADF test.

So when you say a series is I(1), you are saying that a test failed to yield significance. You weren't able to show, at 95% confidence, that a0 was different from 0. That's a negative conclusion, and certainly doesn't mean that you're 95% confident that the series is I(1). You're not 95% confident of anything.

In fact it's hard to quantify confidence in I(1) status in any way. Higher levels, eg I(2), likewise say that we weren't able to show that both a0 and a1 are significantly different from zero. Two failures is even harder to express as a confidence level.


This is used in causal arguments as follows. If, say, a temperature rise is believed to be caused by a CO2 rise, then there is a consistency requirement for the series. The long-term qualitative difference between a series with roots less than unity, and one with unit roots is great. So it would be odd to find that if temperature is I(1), so its a0 = 0 but a1 is not 0, while for CO2 (with coefs b0, b1 etc) b0 = b1 = 0. For that would mean that the temperature differences were stable, with disturbances decaying, even though the corresponding differences of CO2 could drift. That's the supposed significance of the failure of "polynomal cointegration". Though its still a big jump to say "AGW is refuted".

For let me say again, we can never be totally confident, on finite data, of statements about I(1), I(2) etc. But we don't seem to be able to quantify our uncertainty. Even less can we quantify compound statements like T is I(1), CO2 I(2), therefore incompatible.

Back to Beenstock

So how do B&R quantify their claim that AGW is refuted? Explicitly, not at all. I didn't see any confidence level attached to it. They did describe a number of tests on the CO2 series. That is, they quantified the extent to which they can say that the test for non-zero b0 and b1 failed. But oddly, there seemed to be no test statistics at all on temperature. They just said that it is I(1), apparently relying on papers by Kaufmann and others. But no confidence level was cited.

So with no quantitative confidence in one of the parts of a compound proposition, with what confidence can they say that "AGW is refuted".

They go on to make even more remarkable claims
Although we reject AGW, we find that greenhouse gas forcings have a temporary effect on global temperature. Because the greenhouse effect is temporary rather than permanent, predictions of significant global warming in the 21st century by IPCC are not supported by the data.
Again, with no physics but just a century or so of observations, they are able to say what is temporary and what is permanent.


  1. "No physics" being their fatal flaw, it seems to me...

  2. Why does your site crash my Firefox browser?

    I've been playing around with real and synthetic data. It looks like noisy data always has a reduced integration order and smoothed data a higher order. I can take white noise cumsum it twice and make it look like I(0) to most of the tests (ADF, PP, KPSS) by adding noise. I can also take white noise, do a 20 point moving average and have some of the tests report it as I(1). The forcing data cannot have real annual resolution. At least part of it must come from Law Dome ice core data, and that's far from annual resolution in the raw data and has been smoothed with 20 or 75 year filters. The temperature data has monthly resolution and is quite noisy, OTOH. To me, that says it all.

  3. DeWitt,
    Sorry about the crashing. I think it's not an unusual Blogger site - do you have trouble with Roger Pielke, say? It just might be those two posts I did on GHCN adjustments, both of which are close to the page limit. I can't easily modify that where I am, but I'll see if I can get them off the front page.

    On your observations, I think smoothing should create apparent unit roots (as cusum correctly does). Autocorrelation from smoothing looks like a near-unit root to the test. And noise makes the tests more likely to fail.

    That's one reason why the lack of physics is important. The underlying relationship of variables may diffre from what is measured. For example, it's likely that CO2 leads to retained atmospheric heat, as smooth response, which then appears as a noisy temperature signal because of atmospheric eddying etc.

  4. "To me, that says it all."

    Think again. I ran adf on Global and Mauna Loa CO2 annual data -- same result I(2).

    > testroots(co2)
    d Root ADF Padf
    [1,] 0 0.8998215 -1.252000 0.8600463
    [2,] 1 0.3259705 -2.255592 0.4746244
    [3,] 2 -0.4059038 -3.087004 0.1566933
    > co2ml=readCO2()
    > testroots(co2ml)
    d Root ADF Padf
    [1,] 0 0.9417591 -1.521055 0.7669126
    [2,] 1 0.3185445 -3.144009 0.1153996
    [3,] 2 -0.7227226 -4.855469 0.0100000

  5. David,
    Yes, but what does that last test really mean. It says that at level 1, the probability of getting that result if there is a unit root is 0.115. That is, you can't reject, but it's unlikely. This logic turns that around into saying that there is a unit root, so it's not compatible with temperature.

    That's my problem. You have a confidence - 95% - that there is no unit root after 2 differencing. But you have no meaningful confidence that there is a unit root at level 1. And that's what is supposed to make CO2 incompatible with temperature.

    Incidentally, do you know why the ADF test so often returns Padf=0.010000?

  6. Nick,

    I think the crashing had to do with an old bookmark for the no greenhouse effect post (I still don't agree with you. I may yet try to ask Gavin for his opinion.). Going to the home page seems to work fine.


    Which MLO data did you use and did you convert it to a forcing or just used the concentration? I just reran the calculations on the annual data (from here: as a forcing (5.3*ln(CO2(t)/CO2(to)) and the test again rejected a unit root for the first difference.

    > adf.test(mlo,k=2)

    Augmented Dickey-Fuller Test

    data: mlo
    Dickey-Fuller = -2.1515, Lag order = 2, p-value = 0.5135
    alternative hypothesis: stationary

    > mlo<-diff(mlo)
    > adf.test(mlo,k=2)

    Augmented Dickey-Fuller Test

    data: mlo
    Dickey-Fuller = -3.9535, Lag order = 2, p-value = 0.01874
    alternative hypothesis: stationary

    The PP test also rejected a unit root for the first difference. The KPSS test failed to reject for null=Level for the first difference but did reject for null=Trend.

  7. Nick,

    The tests, in R anyway, won't print a p value less than 0.01 or greater than 0.99 (0.1 for the KPSS test). You get a warning message that the p value is less than (or greater than depending) the printed value.

    Firefox crashed again when I posted, but recovered. Let's see if it happens again.

  8. The data is from ESRL. Here is the listing of all three tests. The last column is the number of tests rejecting stationary at 0.05.

    d Root ADF Padf PP Ppp KPSS Pkpss I(d)
    [1,] 0 0.9417591 -1.521055 0.7669126 -1.316249 0.8492467 2.60544306 0.01 3
    [2,] 1 0.3185445 -3.144009 0.1153996 -6.705525 0.0100000 1.44081513 0.01 2
    [3,] 2 -0.7227226 -4.855469 0.0100000 -13.962595 0.0100000 0.01861951 0.10 0

  9. David, I don't understand your last column. I thought these are tests for the presence of a unit root, and they become more prone to reject as you go down. So there is rejection when p<0.05. So shouldn't the numbers rejecting be 0 1 3 instead of 3 2 0?

    Of course, you said "rejecting stationary". I'm not sure what you mean there. But I don't think you can equate that with failing to reject a unit root.

  10. ADF and PP reject the null with an alternate of stationary. KPSS is the other way around.

  11. David,

    I ask again, which data did you get from ESRL? How many data points and what years were covered? You surely didn't get the MLO annual data from 1959 to 2009 because I've tested both the linear and log transformed data and all three tests reject a unit root for both after the first difference.

  12. Yes, it was the annual data from 1959 to 2009. You don't get the same results?

  13. David,
    I've tried to interpret your result here. In this table (with your numbers) I've written R for reject unit root, P for pass, and P' for KPSS, because I understand you to say that a rejection there is an affirmation of a unit root. I can't make it add to the last col - maybe the bottom R total should be 1?


  14. Nick the last value is 0.1>0.05 and should be R shouldnt it?

  15. Well, I was going by DeWitt's earlier comment that 0.1 was the minimum reported by KPSS. But I see I have misinterpreted - 0.1 is actually the maximum (range 0.01 to 0.1). And on looking up the R KPSS documentation and the KPSS paper, I see that the reason is that this was the range that KPSS calculated.

    So yes, it should be an R.

    So what do we have? Level 1 is critical, because that is where T and CO2 are said to differ. ADF says you can't reject the possibility of a unit root. PP says you can. And KPSS says that you can reject level-stationarity (level is the default, which I presume is what you used).

  16. "So what do we have?"

    Good review below, and suggestion on how different orders of integration may cointegrate.

    The reference in B&R on the issue of I(1) vs I(2) is quite old at 1994. Haldrup has written about the meaning of I(2) which I would like to read but all behind a firewall (even for my secret snoop altavisa). Integration of an integration? I wonder about current thinking.

    Unless I understand all the ins and outs of these tests I don't see much point in me discussing them, and I barely grasp ADF.

    Still econometrics has developed a vast facility for countering spurious regression that other sciences would benefit from.

  17. David,

    Yes I did get different results. See my post (#6 in chronological order) above for the ADF test results. I have all tests (ADF, PP and KPSS) rejecting a unit root after one difference operation. I'm using the tests from the tseries package in R version 2.10.1.

  18. I assumed the results in #6 were the transformed variable. Anyway, I think I have found the difference. When run with default lag k=3, but when run with k=2 the test rejects after first difference.

    > mlo<-readCO2()
    > mlo
    Time Series:
    Start = 1959
    End = 2009
    Frequency = 1
    [1] 315.98 316.91 317.64 318.45 318.99 319.62 320.04 321.38 322.16 323.04 324.62 325.68
    [13] 326.32 327.45 329.68 330.17 331.08 332.05 333.78 335.41 336.78 338.68 340.11 341.22
    [25] 342.84 344.41 345.87 347.19 348.98 351.45 352.90 354.16 355.48 356.27 356.95 358.64
    [37] 360.62 362.36 363.47 366.50 368.14 369.40 371.07 373.17 375.78 377.52 379.76 381.85
    [49] 383.71 385.57 387.35
    > adf.test(mlo)

    Augmented Dickey-Fuller Test

    data: mlo
    Dickey-Fuller = -1.5211, Lag order = 3, p-value = 0.7669
    alternative hypothesis: stationary

    > adf.test(mlo,k=2)

    Augmented Dickey-Fuller Test

    data: mlo
    Dickey-Fuller = -1.4393, Lag order = 2, p-value = 0.7998
    alternative hypothesis: stationary

    > adf.test(diff(mlo))

    Augmented Dickey-Fuller Test

    data: diff(mlo)
    Dickey-Fuller = -3.144, Lag order = 3, p-value = 0.1154
    alternative hypothesis: stationary

    > adf.test(diff(mlo),k=2)

    Augmented Dickey-Fuller Test

    data: diff(mlo)
    Dickey-Fuller = -4.0674, Lag order = 2, p-value = 0.01390
    alternative hypothesis: stationary


  19. No matter how interesting the statistics, on physical grounds it is clear that the increase in global avg temp is not merely a 'random walk'.

    A random increase would cause a negative energy imbalance or the extra energy would have to come from another segment of the climate system (eg the ocean, cryosphere, etc). Neither is the case: There is a positive energy imbalance and the other reservoirs are also accumulating energy. Moreover, there is a known positive radiative forcing.

    See also and a lengthy statistics discussion in the preceding thread.

    Bart Verheggen

  20. David,

    What about the PP and KPSS tests? Using default settings, PP rejects a unit root after one difference. Setting null=Trend for KPSS, fails to reject stationary after one difference.

    There's a more flexible version of the ADF test that optimizes k according to different criteria in uroot. Unfortunately, uroot has been removed from the CRAN repository and the archived versions are .tar not .zip, so I don't know how to install it.

  21. What weirdness.

    Can I suggest a different approach? See what other physics you can allegedly refute by using such number games, without bothering to do any physics.

  22. There's a thought. How about cumulative CO2 emission from 1850 to 1978 and atmospheric CO2 as measured by the Law Dome ice cores. adf.test rejects a unit root after one difference for the Law Dome CO2 data. OTOH, adf.test fails to reject a unit root for the cumulative CO2 emission data after two differences. So CO2 emissions (I(3)) cannot explain atmospheric CO2 levels (I(1)). Or maybe it's that there isn't a linear relationship between cumulative emission and concentration. Nor is there a linear relationship on a short time scale between ln(CO2) and temperature. It couldn't be that simple, could it?

  23. here is a thought, run the test on GCM output.

  24. "here is a thought, run the test on GCM output."

    MartinM over at Tamino's writes:

    "In the meantime, the annual global mean temps simulated by GISS ModelE contain a unit root, according to the same tests that VS has been using."


  25. Nick,

    you have linked to the wrong discussion. That random walk thing is nonsense (it simply served as a 'strawman'), and in fact, I have statistically rejected the null-hypothesis of a random walk, in the proper thread.

    This post gives an overview of all my test results and monte-carlo simulations regarding the instrumental GISS temperature record. CO2 forcing results are posted elsewhere in the thread (also a very long post).

    Cheers, VS

  26. Thanks, VS
    I had linked to the most recent Bart post assuming (wrongly) that the discussion would move there. I've updated the post here.