moyhu: Derivatives, filters, odds and ends

Monday, March 2, 2015

Derivatives, filters, odds and ends

I've been writing about how a "sliding" trend may function as a estimate of derivative (and what might be better) here, here and here. There has been discussion, particularly commenter Greg. This post just catches up on some things that arose.

Welch smoothing and spherical Bessel functions.

In the first post, I gave this plot of the spectra of the 10-year OLS trend operator, and what results if multiplied by a Welch taper W, and then multiply again. I'll explain why those operations below. T the spectra are actually spherical Bessel functions. Or, more exactly,
(π²/3)*(w/2)^(.5-n)*Γ(n+1.5)*J_n+.5(x)
where x=f*π*T, f=freq in 1/Cen, T=trend period,n=1,2,3...

I've given a detailed derivation here. You might think that Bessel functions is overkill. But in fact, j₀(x)= sqrt(π/2x)J_1/2(x) is just the sinc function sin(x)/x, and as j_n increase in order, they are just trig functions multiplied by polynomials in 1/x, in such a was that they are O(x_n) near 0, and decay as 1/x for large x. So x^1-nj_n has the appropriate asymptotics here, and for the order of the polynomials, uniquely so.

General remarks on filters

Greg has been commenting on the merits of other filters, including Gaussian and Lanczos. They are of course not strictly differentiating filters, but could be converted to one. In fact, any of the usual symmetric windows/filters can be converted by differentiating to a differentiating filter (the reason is not quite as obvious as it sounds).

Properties that you are likely to want in a lowish pass convolution filter are:

In the time domain, a bounded domain (compact support). In fact, the domain should about match the period corresponding to the upper limit of frequencies that you want to start attenuating.
In the frequency domain, eventual rapid attenuation with high frequency, which often means controlling side lobes
Sometimes overlooked, a contrary wish for some specific behaviour in the pass band. Possibly a flat response for low pass, or in our case, an accurate derivative representation.

I say contrary, because whatever is wanted for low frequencies is unlikely to include attentuation. So there is an uncomfortable intermediate region which is neither accurate nor attenuated. This is worse for differentiation filters which want to increase amplitude with frequency right up to the transition to attenuation.

The ideal filter with compact support and bounded domain is that gate or boxcar filter H(t). But in the frequency domain is a sinc function sin(ω)/ω, which attenuates very slowly. There is a rule about rates of attenuation. The power |ω|^-n of attenuation is determined by n, the order of the first derivative that fails to exist. This is usually determined by the cut-off at the ends of the compact support of the filter. H(t) is discontinuous, to the first derivative fails.

There are filters, like Welch, or the triangle filters (see Wiki for a list) which terminate with a discontinuous derivative. These attenuate O(|ω|^-2). It is fairly easy to extend to a zero derivative at the end, as with Hann, Blackman, and so increase n to 3 or beyond. These satisfy 1 and 2. For Welch W, raising it to the n^th power improves the termination by a power of n, and the corresponding HF attenuation.

The ultimate in this direction is the Gaussian. It gives best roll-off in the freq domain (FD). It does not strictly have bounded support, but approaches zero so fast that it can be truncated, with only a small amplitude HF tail (though slowly fading). This can be ameliorated. But again, it does not have a flat top.

The Lanczos window tries more to honor requirement 3 by reintroducing aspects of the sinc function in the time domain. It is a product of a narrow and a wider sinc, generally arranged so that the zeroes coincide at the cut-off. It gives a flatter top in the FD, at the cost of being non-positive in the TD.

I find that Welch functions are flexible. You can raise them to any power to give faster attenuation, at the expense of a broader footprint in the FD relative to the support in the TD (so attenuation is eventually faster, but starts later).

Plots

That was all a longish preamble to fulfilling a request from Greg to plot spectra of the Gaussian (truncated at 3 σ), and Lanczos, along with the Welch functions I have been using.

So here are the plots of the windows. I've normalised to area=1, except for Lanczos, shown with area=0.5 (for the sake of the y-axis). But it is restored for the spectrum.

W stands for Welch. You can see that the higher powers are looking quite like the Gaussian. So here are the spectra:

You can see the trade-offs. W decays relatively slowly, but is narrowest in the FD - for its TD footprint, it has a narrower low-pass. Higher powers approach the Gaussian. The Lanczos does indeed have a flatter top, as intended, and also a good attenuation. But it is broadest of all - it lets through relatively high frequencies before the cut-off.

13 comments:

GregMarch 3, 2015 at 7:40 AM
Nice. Good comparative plot. It would have been good to have the gaussian in there too for comparison.

While your first order is steeper in roll off than the Lanczos the higher ones seem quite close. See below. Your comment that L is letting through more HF is not really accurate. This is only because it actually *has* a pass-band ! The others attenuate right from zero.

To make a realistic comparison you need to scale the zero of all the filters to the same point. eg align all the filters to have a zero at 0.01

You will then see a better comparison of the neg. lobes and also that L has an unattenuated pass-band then drops to the same zero. This should show that it has a faster transition band.

I'm a little surprised at the passband ripple there, could you put a figure on the peak?
That's a defect that needs to be quantified.

ReplyDelete
Replies
GregMarch 3, 2015 at 7:49 AM
Sorry, looks like you did add the gaussian, black line.
ReplyDelete
Replies
GregMarch 3, 2015 at 8:05 AM
Here is my plot of Lanczos vs a three pole running mean ( which is very close to the gaussian but actually does pass through zero ). They have the same zero pass frequency.
http://climategrog.wordpress.com/2013/11/28/lanczos-filter-script/

http://climategrog.files.wordpress.com/2013/11/lanc_r3m_comp.png

Looking at that plot again I wonder if I didn't use a 5-lobe window. The strong ripple suggests you may also have used that. ( a=3 in the parlance of the WonkyPedia article is 5 lobe, their a=2 is the three lobe that I usually prefer. Go figure out why they present it like that ).

Again it's a trade-off. 5-lobe has faster transition : closer to ideal step in FD, but the cost is a bit more ringing. This can give the negative 'costs' that you referred to in your last post. I generally avoid it because of this overshoot.

ReplyDelete
Replies
CarrickMarch 5, 2015 at 4:01 AM
Nick: I scale them to have the same footprint in the time domain. I think that is what counts.

Well it depends on what you are trying to preserve. More typically, when we are comparing filters, we adjust them to either have the same equivalent rectangular bandwidth (ERB) or we fix the 3-db corner frequency (that's what I usually do, even when I'm working with time-domain filter implementations).
ReplyDelete
Replies
CarrickMarch 5, 2015 at 4:09 AM
Great observation about the relationship between Welch^n and the spherical Bessel functions by the away.

For Welch filters, this gives us an analytic method for choosing to preserve any criterion that we wish, be it the location of the first null, the 3-dB roll-off point, or fixing the ERB.

Anyway, I think this is a nice illustration of how using a analytically tractable function often more than offsets the
advantages of more complex filter choices.

This is a nice piece of work.
ReplyDelete
Replies
GregMarch 5, 2015 at 8:05 AM
Here is an alternative presentation:

Let * represent convolution, W = Welch; R=lin. ramp. Since differentiation and convolution are both linear operations and thus commutative:

D ( W * T(t) ) = DW * T(t) = R * T(t) = W * D ( T(t) )

ie. sliding-trends ( T(t) ) = R * T(t) = W * dT/dt

That fits better with what is 'obvious' ( ie familiar ) to me and finally agrees with the result you originally posted. Thanks for your patient explanations.

So now we see that sliding trend is low-pass filtered rate of change with a fairly lumpy freq. resp with a nasty 50% negative lobe, we may seek a window function with a more orderly FT:

eg. D ( gauss * T(t) ) = Dgauss * T(t) = gauss * dT/dt

ReplyDelete
Replies
GregMarch 5, 2015 at 6:45 PM
I said I'd come back to Pekka's two point first-and-last approximations.

Nick has shown the convolution of dT/dt with parabola ( Welch ) is the same as convolution of T(t) with a ramp ( OLS slope )

by similar logic convolution of dT/dt with a ramp ( OLS slope ) is the same as convolution of T(t) with a rectangle ( boxcar, running average )

one step further, convolution of dT/dt with a rectangle ( boxcar, running average ) is the same as convolution of T(t) with a derivative of the rectangle : Pekka's two point slope estimation.

We have already seen that FT of d/dt of rect is f*sinc(f) , ie a pure sine with zero roll-off. A repetitive notch filter with repetitive, unattenuated inversion of the signal. A linear combination of any number of them will have similar defects. It would be quite simple using odd or even harmonics to turn the combined FT into a square or a triangle wave but it would still have the same zeros, same repetition and same inversion pattern.

Interesting but not much help as an estimation of slope.

However, convolution of the kernels corresponds to multiplication of FTs where the different spacing of the zeros in FD will enforce more zeros in the response.

This is essentially what I show as a three pole running mean approximation to gaussian that does have a zero in FD.
http://climategrog.wordpress.com/2014/07/13/989/
http://climategrog.files.wordpress.com/2013/12/gauss_r3m_comp.png

However, this discussion highlights a danger in applying this to derivatives ( eg dT/dt ) since the stop band ripple that dies away nicely in that plot will be revived by the *f effect of the derivative.

At this stage we'd have to invoke Carrick's 1/f nature of the data to justify continued use.

Many useful ideas tying together here, thanks to all for the varied range of expertise and experience.

ReplyDelete
Replies

Add comment

An interactive topic index for all Moyhu posts.
Latest Ice and Temperature data
Climate Data Portals
A gallery of Javascript-enhanced graphics
Temperature trend viewer
Google Maps and GHCN
WebGL map of past GHCN/SST station temperatures
WebGL map of GHCN/SST station temperature trends
HiRes NOAA OI SST with WebGL and Movie
Regional Hi-Res SST movies
WebGL Facility
TempLS Guide
More pages, and blog glossary

moyhu

Monday, March 2, 2015

Derivatives, filters, odds and ends

Derivatives, filters, odds and ends

Welch smoothing and spherical Bessel functions.

General remarks on filters

Plots

13 comments:

Maintained Pages

Search This Blog

Recent Comments

Blogroll

Blog Archive

Translate

Resources

About Me

moyhu

Monday, March 2, 2015

Derivatives, filters, odds and ends

Derivatives, filters, odds and ends

Welch smoothing and spherical Bessel functions.

General remarks on filters

Plots

13 comments:

Maintained Pages

Search This Blog

Recent Comments

Blogroll

Subscribe To

Blog Archive

Translate

Resources

About Me