moyhu: Derivatives and regression again

Tuesday, March 10, 2015

Derivatives and regression again

I've been writing about how a "sliding" trend may function as a estimate of derivative (and what might be better) here, here, here and here. There has been discussion, particularly commenter Greg. This post just catches up on some things that arose.

Does it differentiate?

Greg is not so sure. I'll give some examples. First here is a graph of some of the windows. I've added a step function, to show that in fact virtually any odd function will differentiate. The one called Welch^2 is the linear window multiplied by the square of a Welch taper. There is information about the Welch taper, and why it arises naturally here, here.

These are scaled to give correct derivative for a line. On the legend, I have written beside each the RMS value. Sqrt of the integral of square. That's the factor that you would expect to multiply white noise, as a variance of weighted sum. Ordinary regression has the least value. That is an aspect of its optimal quality. Better smoothing for HF comes at a cost.

So now to apply these to a sinusoid. In the following plot the yellow is the sine to be differentiated, the continuous curve "Deriv" is the exact derivative, and I have applied the three windows with three different footprints (line types). At each end, a region equal to half a footprint is lost - I have marked those parts with a horizontal line, which tells you how long each footprint was.

The first thing to note is that each gives a sinusoid, with the correct 90° phase shift. The is a consequence of being an odd function. Second is that as the footprint broadens, the amplitude shrinks. The inaccuracy is because a wider footprint is more affected by higher derivatives of the sine. In effect, it smoothes the result, to no good purpose here. But when there is noise, the smoothing is needed. That is the basic trade-off that we will encounter.

You'll notice, too, that both the Welch and step do better than regression. This is basically because they are weighted to favor central values, rather than any more subtle merit.

Now for something different, to differentiate white noise. There should be no real trend. I've dropped the step filter. Again the horizontal end sections indicate the half-footprint.

Now what shows is a marked oscillation, with period about equal to the footprint. The Welch filter is good at damping frequencies beyond this range; however, the actual amplitude of the response is much higher. That is associated with the higher RMS value noted on the first figure, and is commensurate with it.

Integrating with noise

OK, so what happens if we estimate the derivative of a sinusoid with noise. The next fig has sine with gaussian noise of equal amplitude added. Can we recover the derivative?

The pale grey is the original, sine and total. The blue is the derivative of sine. The red OLS regression tracks better than the smoother purple, but has more residual HF. Again the half-footprint is shown by the level sections at each end.

Improving the derivative formula

Pekka, on another thread, recommended taking pairs of points, differencing, and forming some kind of optimal derivative as a weighted sum. Odd function windows automatically give such a weighted sum. The idea of improving the derivative (higher order) is attractive, because it allows the footprint to be expanded, with better noise damping, without loss of derivative accuracy.

A way to do this is to take a family of windows, and make them orthogonal to higher powers by Gram-Schmidt orthogonalisation. I did that using OLS, OLS with Welch taper, and OLS with Welch^2 (W0,W1,W0). Here are the resulting windows. W1 is guaranteed to be accurate for a cubic, and W2 for fifth order:

Again, they are scaled to give the same derivative. We see a more extreme version of the RMS inflation of the first fig. The higher order accuracy is got at a cost of much larger coefficients. They do indeed differentiate better, as the next plot shows:

W2 seems suspiciously perfect. But each result is just the correct sinusoid multiplied by a scalar, and I think it just approaches 1 in an oscillatory way, so W2 just happens to be a zero. Anyway, here higher order certainly works. What about with white noise?

Here is the downside. The higher orders amplify noise, in line with their RMS integrals.

Higher order may sometimes work. If you have an exact cubic, then you can expand the footprint as far as data permits without loss of derivative accuracy, and thus maybe overcome the RMS loss. But generally not.

13 comments:

Pekka PiriläMarch 11, 2015 at 9:37 AM
Nick,

Your examples make it easier to explain, what I meant by an optimal choice.

One more example could be formed by adding to the linear and quadratic terms that allow for an unbiased estimate of the derivative a third order term of unknown strength. Now we assume that the coefficient of the third order term is normally distributed with a known standard deviation and expectation value zero. We assume also that noise is white and also normally distributed. We assume furthermore that all higher order odd terms are negligible.

Based on the above assumptions it's surely possible to determine a filter that leads to the smallest uncertainty for the derivative.
ReplyDelete
Replies
Greg GoodmanMarch 18, 2015 at 3:02 AM
"Now what shows is a marked oscillation, with period about equal to the footprint. The Welch filter is good at damping frequencies beyond this range; however, the actual amplitude of the response is much higher."

This is not an oscillation, it is the band of frequencies that you chose to let through. It is not amplifying, it is attenuating less. All the elements of the kernel should sum to unity. So everything gets attenuated, some things less than others. Nothing should be amplified.

As Carrick pointed out, the average is the best linear solution to remove normally distributed *random* noise. The problem is that it is not the best way to design a filter in the presence of non-random variability.

Thus again, the need to define the objective before choosing the solution.

If you have truly random noise, a mean is usually the best way to remove it. But this is were climatologists usually fail. They want to find CO2 forcing and dismiss the rest of climate variability as random "internal variability". They may say "stochastic" is they want to stop people arguing back.

Random becomes an excuse for anything that cannot be explained by CO2

Then linear trends and running means are applied across the board.

The trouble is, as we saw with the volcanoes and 15y sliding trends, it messes up badly. Volcanoes may be random in the sense that we do not have enough knowledge to predict them but three major stratospheric events with climatic impacts lasting about 4 or 5 years each in 1963, 1982,1991 does not look that random to a filter.

In fact the sample is not sufficiently large to represent the random distribution of major volcanoes. At this point the all bets are off and the nice noise reducing properties Nick shows above fall apart and we find a well chosen low pass filter would be better than some averaging technique.

ReplyDelete
Replies
GregMarch 18, 2015 at 3:12 AM
BTW, it seems that you have some spurious zeros in the first element of your kernels. For example the ramp and the rect in your first graph should start at their finite values, not zero. Same in 5th plot.

You should be able to diff the ramp kernel and get a flat line, not a line with a massive spike at the start.

I suspect W2 ( in particular ) in the final plot should be a fair bit smoother.
ReplyDelete
Replies
Greg GoodmanMarch 18, 2015 at 3:22 AM
It seems that your test uses a kernel 7 units long and your test sinusoid is of period 7. This is a bit of special case where you know what the signal is. If you are looking for a method to produce the derivative, it needs to be more general.

How well do these functions differentiate if you have a test signal period of 4.6 units , or example? That may be more interesting. ;)

ReplyDelete
Replies
@whutMarch 19, 2015 at 1:44 AM
Careful about invoking randomness. In terms of climate it is an indicator of something that you do not understand. A signal such as ENSO is clearly not random and is likely driven by other known factors that themselves have some predictability, such as QBO, TSI, and wobble. Volcanic effects are said to be random only in terms of our inability to accurately predict their occurrence. Man-made GHGs are clearly deterministic. These are the factors on the shorter time-scale, and those on the much longer time-scale such as Milankovitch cycles look to be somewhat predictable as well. That leaves only the mild multi-decadal factors that underlie gradual shifts in PDO and AMO as still mysterious -- and therefore red-noise-random in our mind. But that behavior is also so gradual as to be actually accounted for with a good model.

ReplyDelete
Replies

Add comment

An interactive topic index for all Moyhu posts.
Latest Ice and Temperature data
Climate Data Portals
A gallery of Javascript-enhanced graphics
Temperature trend viewer
Google Maps and GHCN
WebGL map of past GHCN/SST station temperatures
WebGL map of GHCN/SST station temperature trends
HiRes NOAA OI SST with WebGL and Movie
Regional Hi-Res SST movies
WebGL Facility
TempLS Guide
More pages, and blog glossary

moyhu

Tuesday, March 10, 2015

Derivatives and regression again

Derivatives and regression again

Does it differentiate?

Integrating with noise

Improving the derivative formula

13 comments:

Search This Blog

Maintained Pages

Recent Comments

Blogroll

Blog Archive

Translate

Resources

About Me

moyhu

Tuesday, March 10, 2015

Derivatives and regression again

Derivatives and regression again

Does it differentiate?

Integrating with noise

Improving the derivative formula

13 comments:

Search This Blog

Maintained Pages

Recent Comments

Blogroll

Subscribe To

Blog Archive

Translate

Resources

About Me