moyhu: Climate and statistical forecasting

Friday, March 25, 2016

Climate and statistical forecasting

About a month ago, there was a minor kerfuffle when the GWPF released a report from an econometrics Prof Terence Mills, titled

"STATISTICAL FORECASTING
How fast will future warming be?"

It got a run in the Murdoch and allied press. I wrote about it here. But it was obvious nonsense, and the fuss died down pretty quickly. I had said I would write up some more analysis, but interest subsided so quickly that I put it off. The actual warming, as Gavin quoted, had already made the forecast look silly, and there was more to say about the recent data.

But these intrusions of misplaced statistical pontification are an intermittent feature of the climate distraction. We had Beenstock and Reingewertz, which didn't exactly forecast, but claimed time series modelling proved the death of AGW. There was Ludecke et al, which was mainly very bad Fourier analysis, but did forecast (naturally) an imminent cooling based on the periodicity of the trigs functions used. And there was Keenan, pulling strings at the House of Lords to promote his Arima (3,1,0) model to claim great uncertainty about trend.

So I want to talk more about the place, if any, of statistical time series forecasting here.

Mills type forecasting of a temperature series T(t) uses a model of the general form
P(B)T = F(t,b) + Q(B)ε
B is the backshift operator; B T is the series displaced one step back. ε denotes a series of iid (independent and identically distributed) random variables. P and Q are polynomials in B; successively applying B is a commutative operator, and so it makes sense to do algebra with it. This goes back at least to Boole. And, critically, F(t,b) is a function of assumed form with fitted parameters. It's often constant or linear.

The forecasting process is that over an observed period, the coefficients (and order) of P and Q, along with the parameters b, are found by some kind of least squares fit. With that fitting, expected values are found in the future. The expected value of ε is zero, so the forecast is dominated by the behaviour of F. And as Mills says:

The central aim of this report is to emphasise that, while statistical forecasting appears highly applicable to climate data, the choice of which stochastic model to fit to an observed time series largely determines the properties of forecasts of future observations and of measures of the associated forecast uncertainty, particularly as the forecast horizon increases.

Well, he said it, but it wasn't the aspect that the GWPF and press emphasised. But it's true.

So the forecast then depends on P(B). This often has roots that are all close to zero, relative to 1. Then the inverse of P is just a smoothing operation, and all the ARIMA has died away. The forecast is a smoothed scaled F, and so depends almost entirely on what form is assumed. IOW, all statistics is doing is estimating the parameters b of F, and smoothing a bit. Alternatively, it may happen that P has one or more roots close to 1 (unit root). That means that it acts like a differentiation, and the forecast is a solution of a DE, but still critically dependent on the form of F.

Now, you can assume various exotic forms for F. There is no constraint, unless you choose to invoke some physics (which econometricians usually don't). if you can find a straight line fit, you can always find a segment of an exponential, or a sinusoid, which will fit equally well, but with very different forecast behaviour.

As I described here, Mills chose for HADCRUT temperature either an IMA form, with F constant, or an AR form, with F linear. Each involves a future trend, but with uncertainty. Being uncertain of the future trend involves a wide scatter of forecasts. But in each case, he found that the range for trend (or drift) included zero, and so proceeded on the basis that it was zero. Thus he changed the structue of the forecast function F(t,b), with radical effects on the forecast. In fact, the forecast just had to be constant. The model allowed no other.

Now statistically, this is very unsound. Saying that zero is within the range of a distribution doesn't mean zero has any preferred status. You could choose any of a large range of other numbers within the range. All you can say is that you can't discriminate (statistically). The effect of all those other choices should be reflected in the uncertainty of the forecast. But Mills calculated his uncertainty on the basis that the trend was certainly zero, and the scatter would just depend on the future ε.

Which comes back to the point, what is the basis for choosing the form of F? It does have to use some knowledge of physics. We are testing a theory that temperatures may be trending (or drifting) because of GHG. You have to test it with a form that at least allows that to happen. To the extent that you are uncertain of the trend, you are uncertain of the forecast.

I think of line fitting in Taylor series terms. If F is a secular function, it's reasonable to think that it may be a smooth function of time, corresponding to the incremental effect of the drivers. A linear assumption is a first-order Taylor approximation at a point, probably the end point. That will hold until the second and higher derivatives start to have a dominant effect. To the extent that we can establish that those are small, the forecast will work to a corresponding time into the future. That may be not very far, and linear trend extrapolation is not a very useful approach. And despite what contrarians sometimes think, it is very little used.

Unit roots

A side story here is the use of Arima models with an I, as in Keenan's (3,1,0). A unit root. This is the difference between a model that has a random perturbation about a mean value, and a random walk. Both Mills and Keenan approach this as just a matter to be resolved by goodness of fit. Beenstock et al, cited, make more of its significance, but their analysis is a total muddle.

But for the reasons described above, it isn't just a matter of goodness of fit. Again, there is an infinite range of possible P,F,Q functions to choose. And statistics can't decide that. The only basis for choice is that the function F (with P and Q), with parameters, is a reasonable model for the physics. And a random walk simply isn't because:

There are physical laws to be satisfied. Temperature determines the rate of outgoing infrared energy, and this in the medium term has to balance incoming. Now there are all sorts of short term variations (weather) that allow transient deviations, but they don't allow temperature just to drift unconstrained to a new level. A random walk does.
As a variant of that, we know that the history is of temperature bounded within ranges. Seas haven't boiled, or (generally) frozen. In fact the range has been quite narrow, up to the Ice Age range at most. That is inconsistent with a random walk.

I made this objection to Keenan's model, and some said - well, why couldn't it just have been a random walk for the period of observation only? But that comes back to the requirement that the P,F,Q model be an explanation for the process, and even one that can be relied on for forecast. Saying it is sometimes RW, sometimes not, just begs the questions, when and why? It's a model that makes no progress.

20 comments:

...and Then There's PhysicsMarch 27, 2016 at 8:10 PM
Really nice, thanks.
ReplyDelete
Replies
@whutMarch 28, 2016 at 1:35 AM
The lines of evidence point toward determinism as driving the natural variability of many climate measures. It's actually misguided to apply Markov or random walk models to a behavior that is forced by a non-random physical process.

"A side story here is the use of Arima models with an I, as in Keenan's (3,1,0). A unit root. This is the difference between a model that has a random perturbation about a mean value, and a random walk. "

Lots of misconceptions about random walk. A pure random walk is a martingale process, and can wander infinitely far from the mean -- that is, given enough time. But there is a kind of random walk that has a reversion to the mean, as modeled by a potential well -- in physics this is called an Ornstein-Uhlenbeck process. And yes, this is categorized in the ARIMA class, but statisticians don't model physics and so use a generic name.

Note that this is different than a random perturbation about a mean, which is white noise jitter and not a Markov process.

There is actually no evidence that large scale processes are best modeled as a stochastic process.

So for example, a process such as ENSO is closer to a deterministic forcing than it is to the red noise of an O-U process. And for the life of me, I can't figure out what the charade is over acting as if the QBO contains any randomness.

IMO, it's just a matter of time before physical models applied to machine learning experiments will root out all the deterministic behaviors, and the stochastic crowd gets pushed to the corner on this topic.

ReplyDelete
Replies
bill hMarch 28, 2016 at 4:15 AM
Nick, Thanks for the summary and for the econometrics background to all this stochastic modelling. I have often wondered why, all those years ago, McIntyre and McKittrick got into using those strange red noise inputs in their Monte Carlo modelling. Now it's clear, it's become a standard tool for economists, presumably for modelling Stock Market fluctuations.

It strikes me that economists would do well to give more account to external forcings, despite the supposed failure of deterministic models. For instance, the effect of prolonged drought in the Middle East on the Syrian economy and polity.
ReplyDelete
Replies
AnonymousMarch 28, 2016 at 5:40 AM
Though being all but a mathematician, I had real pleasure in reading - starting from here, thanks Nick Stokes - lots of posts and comments around Mills forecasting blind-alley.

Best of all was a hint by L. Hamilton in his comment:

http://julesandjames.blogspot.com/2016/02/no-terence-mills-does-not-believe-his.html?showComment=1456358463898#c3485582121726390124

where L. Hamilton reminds us of an earlier paper written by the same Mills

http://link.springer.com/article/10.1007/s10584-008-9525-7#/page-2

' How robust is the long-run relationship between temperature and radiative forcing? '

That paper's abstract ends with

' This result is robust across the sample period of 1850 to 2000, thus providing further confirmation of the quantitative impact of radiative forcing and, in particular, CO2 forcing, on temperatures. '

Delicious...

ReplyDelete
Replies
BindidonMarch 28, 2016 at 9:00 PM
All of us commentators we know: Nick Stokes and Grant Foster are quite a bit busy with lots of things.

But when you see this spurious increase of really poor-minded guest posts e.g. at Climate etc, claiming that "Temperatures do not add" or that "Inappropriate use of linear regression can produce spurious and significantly low estimations" etc, you truly hope for some really scientific contribution published there!

ReplyDelete
Replies
John MasheyMarch 29, 2016 at 8:06 AM
I offer another jolly example of odd "statistics", by a few ex-NASA-Apollo guys who comprise "The Right Climate Stuff."

1) See Hal Doiron in his 14-minute talk, in which he uses a simple climate model, starting with Ljungqvist(2010) and showing 1000-year and 62-year sine-wave cycles to prove global warming is no problem. Exactly where the 62-year cycle came from I'm not sure, and of course, using a 2000-year reconstruction of 30-90N (25% of Earth) seems a little chancey to know a 1000-year cycle. People will be pleased to know that the model shows flat to down temperatures for 2000-2030.

2) If for some reason that isn't enough, there's a longer version by one of his team, Jim Peacock, in a lecture for Doctors for Disaster Preparedness in 2014, using some of the same slides. The first half is Apollo stories, then he talks on their climate models to prove that old Apollo mechanical engineers can do it better than climate scientists.

Note of course, that these guys are a tiny fraction of NASA retirees, and the people at NASA, who are generally pretty competent, actually accept the science, use NOAA data, and model risks from sea level rise.
ReplyDelete
Replies
John MasheyMarch 29, 2016 at 6:02 PM
Hal Doiron and Norm Page are both Hpuston-based signers of the "300 scientists" list Will Happer collected to help Lamar Smith Harass NOAA.
For amusement, Page seems to have subscribed to the expanding Earth hypothesis, i.e., "radius of the Earth has increased by at least 33 percent since the Paleozoic'
ReplyDelete
Replies
John MasheyMarch 29, 2016 at 6:06 PM
Akasofu is a sad case, and I see his paper that Page referenced was in a SCRIP journal, one of Jeffrey Beall's "favorites." Oh, my.
ReplyDelete
Replies
bill hMarch 30, 2016 at 8:33 PM
I see the GWPF spin machine is hard at work on the Mills report. It's got a pliant "environment editor" at the London Times (Murdoch stable) to produce an article claiming that Mills predicts no global temperature rise right up to 2100, with a very clear graphic demonstrating exactly that. The GWPF then posts this article on its website:

http://www.thegwpf.com/planet-is-not-overheating-says-uk-statistician/

To think that the GWPF is run by the UK's former finance minister... (who also happens to be the father of Monckton's brother-in-law)

ReplyDelete
Replies
PGApril 2, 2016 at 9:49 PM
Nate Silver may have made stats sexy but Nick and Tamino have made it essential.
ReplyDelete
Replies

Add comment

An interactive topic index for all Moyhu posts.
Latest Ice and Temperature data
Climate Data Portals
A gallery of Javascript-enhanced graphics
Temperature trend viewer
Google Maps and GHCN
WebGL map of past GHCN/SST station temperatures
WebGL map of GHCN/SST station temperature trends
HiRes NOAA OI SST with WebGL and Movie
Regional Hi-Res SST movies
WebGL Facility
TempLS Guide
More pages, and blog glossary

moyhu

Friday, March 25, 2016

Climate and statistical forecasting

Climate and statistical forecasting

Unit roots

20 comments:

Search This Blog

Maintained Pages

Recent Comments

Blogroll

Blog Archive

Translate

Resources

About Me

moyhu

Friday, March 25, 2016

Climate and statistical forecasting

Climate and statistical forecasting

Unit roots

20 comments:

Search This Blog

Maintained Pages

Recent Comments

Blogroll

Subscribe To

Blog Archive

Translate

Resources

About Me