How fast will future warming be?"
But these intrusions of misplaced statistical pontification are an intermittent feature of the climate distraction. We had Beenstock and Reingewertz, which didn't exactly forecast, but claimed time series modelling proved the death of AGW. There was Ludecke et al, which was mainly very bad Fourier analysis, but did forecast (naturally) an imminent cooling based on the periodicity of the trigs functions used. And there was Keenan, pulling strings at the House of Lords to promote his Arima (3,1,0) model to claim great uncertainty about trend.
So I want to talk more about the place, if any, of statistical time series forecasting here.
Mills type forecasting of a temperature series T(t) uses a model of the general form
P(B)T = F(t,b) + Q(B)ε
B is the backshift operator; B T is the series displaced one step back. ε denotes a series of iid (independent and identically distributed) random variables. P and Q are polynomials in B; successively applying B is a commutative operator, and so it makes sense to do algebra with it. This goes back at least to Boole. And, critically, F(t,b) is a function of assumed form with fitted parameters. It's often constant or linear.
The forecasting process is that over an observed period, the coefficients (and order) of P and Q, along with the parameters b, are found by some kind of least squares fit. With that fitting, expected values are found in the future. The expected value of ε is zero, so the forecast is dominated by the behaviour of F. And as Mills says:
The central aim of this report is to emphasise that, while statistical forecasting appears highly applicable to climate data, the choice of which stochastic model to fit to an observed time series largely determines the properties of forecasts of future observations and of measures of the associated forecast uncertainty, particularly as the forecast horizon increases.Well, he said it, but it wasn't the aspect that the GWPF and press emphasised. But it's true.
So the forecast then depends on P(B). This often has roots that are all close to zero, relative to 1. Then the inverse of P is just a smoothing operation, and all the ARIMA has died away. The forecast is a smoothed scaled F, and so depends almost entirely on what form is assumed. IOW, all statistics is doing is estimating the parameters b of F, and smoothing a bit. Alternatively, it may happen that P has one or more roots close to 1 (unit root). That means that it acts like a differentiation, and the forecast is a solution of a DE, but still critically dependent on the form of F.
Now, you can assume various exotic forms for F. There is no constraint, unless you choose to invoke some physics (which econometricians usually don't). if you can find a straight line fit, you can always find a segment of an exponential, or a sinusoid, which will fit equally well, but with very different forecast behaviour.
As I described here, Mills chose for HADCRUT temperature either an IMA form, with F constant, or an AR form, with F linear. Each involves a future trend, but with uncertainty. Being uncertain of the future trend involves a wide scatter of forecasts. But in each case, he found that the range for trend (or drift) included zero, and so proceeded on the basis that it was zero. Thus he changed the structue of the forecast function F(t,b), with radical effects on the forecast. In fact, the forecast just had to be constant. The model allowed no other.
Now statistically, this is very unsound. Saying that zero is within the range of a distribution doesn't mean zero has any preferred status. You could choose any of a large range of other numbers within the range. All you can say is that you can't discriminate (statistically). The effect of all those other choices should be reflected in the uncertainty of the forecast. But Mills calculated his uncertainty on the basis that the trend was certainly zero, and the scatter would just depend on the future ε.
Which comes back to the point, what is the basis for choosing the form of F? It does have to use some knowledge of physics. We are testing a theory that temperatures may be trending (or drifting) because of GHG. You have to test it with a form that at least allows that to happen. To the extent that you are uncertain of the trend, you are uncertain of the forecast.
I think of line fitting in Taylor series terms. If F is a secular function, it's reasonable to think that it may be a smooth function of time, corresponding to the incremental effect of the drivers. A linear assumption is a first-order Taylor approximation at a point, probably the end point. That will hold until the second and higher derivatives start to have a dominant effect. To the extent that we can establish that those are small, the forecast will work to a corresponding time into the future. That may be not very far, and linear trend extrapolation is not a very useful approach. And despite what contrarians sometimes think, it is very little used.
Unit rootsA side story here is the use of Arima models with an I, as in Keenan's (3,1,0). A unit root. This is the difference between a model that has a random perturbation about a mean value, and a random walk. Both Mills and Keenan approach this as just a matter to be resolved by goodness of fit. Beenstock et al, cited, make more of its significance, but their analysis is a total muddle.
But for the reasons described above, it isn't just a matter of goodness of fit. Again, there is an infinite range of possible P,F,Q functions to choose. And statistics can't decide that. The only basis for choice is that the function F (with P and Q), with parameters, is a reasonable model for the physics. And a random walk simply isn't because:
- There are physical laws to be satisfied. Temperature determines the rate of outgoing infrared energy, and this in the medium term has to balance incoming. Now there are all sorts of short term variations (weather) that allow transient deviations, but they don't allow temperature just to drift unconstrained to a new level. A random walk does.
- As a variant of that, we know that the history is of temperature bounded within ranges. Seas haven't boiled, or (generally) frozen. In fact the range has been quite narrow, up to the Ice Age range at most. That is inconsistent with a random walk.
I made this objection to Keenan's model, and some said - well, why couldn't it just have been a random walk for the period of observation only? But that comes back to the requirement that the P,F,Q model be an explanation for the process, and even one that can be relied on for forecast. Saying it is sometimes RW, sometimes not, just begs the questions, when and why? It's a model that makes no progress.