Tuesday, January 18, 2011

On "Mannian" Smoothing

A recent thread at Climate Audit revisited the topic of "Mannian" smoothing. There's a long history here - here, for example, is a thread to which this seems to be a sequel.

The "Mannian" smoothing to which they refer is described, under the name of Minimum Roughness Criterion, by Mann GRL04 (2004), with a follow-up GRL08.

There was much more discussion of the general idea of Mannian smoothing in some threads on its use by Rahmstorf; at CA: the-secret-of-the-rahmstorf-non-linear-trend
and rahmstorf-et-al-reject-ipcc-procedure, and at Lucia's: more-fishy-how-doesdid-mann-guess-future-data-to-test-projections


In this post I'll try to describe what "Mannian" smoothing does, and why, and how it compares with other methods in the statistical literature.


Smoothing and the endpoint problem

Smoothing time series data (yt) is often done using a symmetric digital filter on a finite base:
Yt=a0yt+Σak(yt-k+yt+k)
summed over the M coefficients (or weights) ak, k=1:M
Symmetric is favored because the result is centred - ie the weighted average that it produces represents the value at the centre of the weighting - no lag. If you apply it to straight line data, it returns the line unchanged.

But at each end of the data, you run out of points. You could stop the smoothing as soon as this happens. But that seems excessive - the quality of the smooth will degrade as you approach the endpoints, but it does not immediately become worthless. And sometimes the most recent data is where you really want to know the best estimate of trend, even if it is unstable and likely to change as new data arrives.

This is not a new problem, of course. Alastair Gray and Peter Thomson wrote several papers in mid 1990's in which they sought filters which would require minimum revision as new data came in. In the process they reviewed the history of one logical way to achieve that, which was to use some forecast based on existing data to "pad" the data so a symmetric filter could be used. I'll say more on the legitimacy of that, but here's what they said about its history:
A common and natural approach involves forecasting the missing values, either implicitly or explicitly, and then applying the desired central filter. The forecasting methods used range from simple extrapolation to model based methods, some based on the local trend model adopted, others based on global models for the series as a whole. The latter include the fitting of ARIMA models to produce forecasts (see Dagum (1980) in particular). The principle of using prediction at the ends of series seems a key one which goes back to DeForest (1877). See also the discussion in Cleveland (1983), Greville (1979) and Wallis (1983).

Mann's papers and method

In GRL04 Mann describes three boundary conditions that one could try to meet by extrapolating:
  1. Minimum norm - ie deviation from some target value, usually a known value ("climatology").
  2. Minimum slope - the final slope should be zero. This allows the smooth more freedom to shape itself to the end data, but is no use if you want an estimate of the end slope, because that has been prescribed (zero).
  3. Minimum roughness - the second derivative should be zero. This allows an estimate of both the final value and slope. It's the least intrusive of the three, in that the second derivative is hardest to estimate accurately, so prescribing it gives least loss of information.
If the smoother operates over 2M+1 points, then M points must be padded at each end. The conditions themselves do not uniquely a padding formula - M glides over this a bit to go straight to formulae which will do the job. They are:
  1. Minimum norm - pad with the constant "longterm mean".
  2. Minimum slope - reflect the M values preceding final t value about that t - ie the extension makes the series symmetric (even) about the endpoint. 
  3. Minimum roughness - reflect the same M values about both the final t and final y, so the series makes an odd function about that point. This means any fitted model will have zero second derivative (and all even higher derivatives).
Mann showed this plot of the effect of the three methods on the annual mean NH temperature series:

The smoother is a Butterworth filter. Note that the min norm deviates most, and is a poor fit near trhe end. The measure of fit is MSE - the ratio of variance of residuals from the smooth vs variance about the mean.

It is important to remember that although Mann's methods invoke a forecast, or symmetry, when you combine the linear forecast formula with the symmetric filter, you end up with an unsymmetric filter on the known data set.

Pegging the endpoint and centering

A necessary feature of the minimum roughness method is that the smooth passes through the final point - "pegged" to it. This has also been a source of criticism - in fact the most recent CA thread seems to focus on it.

It is indeed a clear acknowledgement that the effectiveness of the smoothing has failed right at the end. But something does have to give there. At least the final estimate is unbiased. It will be subject to perhaps more vigorous change as new data comes in than, say, the Gray/Thomson value. But that is not a huge problem.

A virtue of the Mann method is that it is automatically centered. That is, the weighted average used for the smooth is indeed appropriate for the centre value. There isn't a lag. A test of this is whether a line would be returned unchanged. One would hope so. This is not just an issue of "prettiness"; the point of a filter is that it should shake off the noise and return the underlying process, and if it shifts even a line, then it will shift any trend you are using it to find.

It is also desirable, generally, to use positive weights. If the symmetric filter is positive, then so will be the modified weights as you approach the end. And it is then easy to show that if you have centred, positive, possibly asymmetric filter at the final point, that filter must reduce to a one-point one - ie, pegging.

Comparison with a method used in economic statistics.

There was an undercurrent of scepticism about the use of forecast values. Lucia was especially vehement. But it is a well established technique, as the above quote from Gray et al showed. In fact smoothing and forecasting are closely related. Both rely on some model (eg polynomial) fitted to the data, and if you use the model to generate forecast values which are then used to calculate coefficients, then there is no extra assumption.

Mann doesn't quite do that - he uses a symmetry property of the underlying model. Suppose you were forming the smoother by polynomial fitting. The "forecast" for minimum roughness is then just a constraint that the polynomials use odd powers of t (with the t origin at the final endpoint).

The most explicit formulation of a method with forecasting seems to have been a technical report by Musgrave (Musgrave, J. C. (1964). A set of end weights to end all end weights. Working paper, Bureau of the Census, US Department of Commerce, Washington, D.C.). This wasn't published, and doesn't seem to be online, but it has been frequently cited. It is summarised in this 2003 paper by Quenneville et al. Unfortunately, I couldn't find a free version. It invokes the Henderson smoother, which is popular with census folk, and common in economics. In fact, the 25-point Henderson filter with Musgrave end smoothing makes up the widely used X-11 smoothing (or seasonal adjustment) procedure.

The Musgrave endpoint process works thus (Quenneville). With a symmetric filter, for each point where there are insufficient data points:
  1. Fit a OLS reegression y = a + b*t to the points that are available
  2. Pad the data with the line y=a+b*t+g*b*(t-mean(t))
    where g=var(b)/(b^2+var(b)) and the mean is over the needed padding interval
  3. apply the symmetric filter to the region with padding
Note that different padding is used for each smooth point.  The modified line for extrapolation, with the factor g, ensures that the expected adjustment for new data is minimised, rather than SS residuals.


Examples

The code for these examples is called mauna.r, at the Document Store

I'll calculate first a data set where there is a clear linear trend. This is the set of annual average CO2 readings at Mauna Loa. I'll use the AR4 13-point filter, as referenced here. It's similar to a binomial filter, and they used it with minimum slope, describing this as a conservative option. Here are the last few years, showing minimum slope, Mannian and Musgrave endpoint smoothing.


Note that in this case minimum slope is the one out of line. The attempt to make the end slope zero is just inappropriate, and it shows. The fault is even clearer is I superimpose the results that you would have got starting in 2008, and then updating as each years data come in. Not only does minimum slope deviate, but it needs a bigg correction each year,

Now, for something less regular. Here are the NOAA monthly global temperature anomalies. First the most recent plot:

This shows more clearly how Mannian (MRC) pegs to the endpoint, so there is no smoothing there. Musgrave seems to do better. Minimum slope looks better here, mainly because the slope is indeed closer to zero.

Now here is the same plot with three successive months superimposed, as if we had started in October, then plotted for November and December. Now minimum slope looks good, but again, only because there isn't much slope. Musgrave oscillates a bit more, and Mannian much more, because of the pegging.


Conclusion

Firstly, there's nothing wrong in principle with Mannian smoothing. The padding process is similar to the widely used Musgrave method, and it doesn't seem to show any particular bias. But the pegging to the endpoint value does make for greater variation.

Minimum slope is, as the AR4 said, more conservative. But it pushes for a zero end slope when that may be clearly inappropriate.

I think the Musgrave process, which does explicitly seem to minimise adjustment on new data, is probably the best choice.

The code for these examples is called mauna.r, at the Document Store .



12 comments:

  1. I think there is a copy-paste error in the section 'Mann's paper and methods', as #3 point is identical in both numbered lists. And the #2 in the second list seems to end mid-sentence...

    ReplyDelete
  2. are you sure that you have emulated Mann's routine properly? I did a quick test with his matlab lowpass routine and it does not peg the final point when using the min roughness criteria.

    ReplyDelete
  3. Anon,
    I followed the method in GRL04:


    Finally, to approximate the ‘minimum roughness’ constraint, one pads the series with the values within one filter width of the boundary reflected about the time boundary, and reflected vertically (i.e., about the ‘‘y’’ axis) relative to the final value.

    It seems to me that that reflection means that the data becomes an odd function about the final point treated as the t- and y-origin, so any smoothing approx would have to be odd too, and pass through that origin.

    However, it's true that if you look very closely at his graphs, the MRC smooth passes close too butnot exactly on the final point. I'm not very familiar with Matlab, but I'll try to see what is going on.

    ReplyDelete
  4. Anon,
    I checked lowpass.m and it seems to do exactly the same thing that I did, and is as described in GRL04. I don't have Matlab, so I can't implement it.

    ReplyDelete
  5. try octave - it an OS matlab emulator

    ReplyDelete
  6. Anon,
    I think I see it. It's only pinned if the filter coefficients are symmetric. The Butterworth filter isn't. I don't know why Mann chose a Butterworth here - I can't see that causality is an issue,

    ReplyDelete
  7. Can you show what the actual Mann filter will do in your comparison? Or add a figure pointing to the differences associated with different choices for the filter?

    ReplyDelete
  8. Anon,
    As I understand, Mannian smoothing is regarded as the extrapolation method as I described and implemented. You can use it with any digital filter. Mann in his 2004 paper chose a Butterworth filter - I'm not sure why. Symmetric filters make more sense in smoothing time series where you have data on both sides of the smoothing point.

    I chose the symmetric filter used in the AR4 because it has been discussed at CA in the context of mannian smoothing. My first alternative choice would have been the Henderson filter. But I don't think that makes much difference.

    It's possible that the unpinning of the endpoint achieved with a Butterworth filter improves the performance of the Mann process. I'm travelling at the moment and can't test that, but maybe when I get back.

    ReplyDelete
  9. Nick, I meant to comment on this earlier... You can implement a symmetric version of the Butterworth filter...it's "just" acausal. For analyses of this types, the acausality doesn't matter.

    (Matlab's "filtfilt" function does something like this:

    y = filtfilt(b,a,x) performs zero-phase digital filtering by processing the input data, x, in both the forward and reverse directions [...].

    I implement my "symmetric" butterworth filter (I call it cleverly my "butterworth2" filter) in the same way, run the data forward and backwards through the same butterworth filter.

    My preferred way of treating endpoints is truncating the series when the various assumptions for the end points "begin to matter". At the moment, there is no way to know the current long-term slope, because it depends on future climate change. (You could also set bounds on the error over time as you approach the endpoint of the data using a Monte Carlo approach, but I digress...)

    ReplyDelete
  10. Carrick,
    I agree acausality doesn't matter, so I'm wondering why the Butterworth is used at all. Maybe your two-pass version has the flat top and smooth roll-off that a 1-pass does (I expect it would), but it's not obvious that that is optimal for a time series smoothing filter. leaving, say, a low order polynomial invariant, as a Henderson filter does, seems more natural.

    ReplyDelete
  11. NIck, I agree with you here too: The Butterworth is not the most natural choice for smoothing data that contain secular trends in them.

    The transfer function is the magnitude squared of the "single-pass" version of the filter so the roll-off is twice the rate of the single-pass...this is something one needs to keep in mind when using forward-backward filtering. (On the other hand, it is free of phase distortion, and has zero delay.)

    ReplyDelete