Monday, March 2, 2015

Derivatives, filters, odds and ends

I've been writing about how a "sliding" trend may function as a estimate of derivative (and what might be better) here, here and here. There has been discussion, particularly commenter Greg. This post just catches up on some things that arose.

Welch smoothing and spherical Bessel functions.

In the first post, I gave this plot of the spectra of the 10-year OLS trend operator, and what results if multiplied by a Welch taper W, and then multiply again. I'll explain why those operations below. T the spectra are actually spherical Bessel functions. Or, more exactly,
where x=f*π*T, f=freq in 1/Cen, T=trend period,n=1,2,3...

I've given a detailed derivation here. You might think that Bessel functions is overkill. But in fact, j0(x)= sqrt(π/2x)J1/2(x) is just the sinc function sin(x)/x, and as jn increase in order, they are just trig functions multiplied by polynomials in 1/x, in such a was that they are O(xn) near 0, and decay as 1/x for large x. So x1-njn has the appropriate asymptotics here, and for the order of the polynomials, uniquely so.

General remarks on filters

Greg has been commenting on the merits of other filters, including Gaussian and Lanczos. They are of course not strictly differentiating filters, but could be converted to one. In fact, any of the usual symmetric windows/filters can be converted by differentiating to a differentiating filter (the reason is not quite as obvious as it sounds).

Properties that you are likely to want in a lowish pass convolution filter are:
  1. In the time domain, a bounded domain (compact support). In fact, the domain should about match the period corresponding to the upper limit of frequencies that you want to start attenuating.
  2. In the frequency domain, eventual rapid attenuation with high frequency, which often means controlling side lobes
  3. Sometimes overlooked, a contrary wish for some specific behaviour in the pass band. Possibly a flat response for low pass, or in our case, an accurate derivative representation.
I say contrary, because whatever is wanted for low frequencies is unlikely to include attentuation. So there is an uncomfortable intermediate region which is neither accurate nor attenuated. This is worse for differentiation filters which want to increase amplitude with frequency right up to the transition to attenuation.

The ideal filter with compact support and bounded domain is that gate or boxcar filter H(t). But in the frequency domain is a sinc function sin(ω)/ω, which attenuates very slowly. There is a rule about rates of attenuation. The power |ω|-n of attenuation is determined by n, the order of the first derivative that fails to exist. This is usually determined by the cut-off at the ends of the compact support of the filter. H(t) is discontinuous, to the first derivative fails.

There are filters, like Welch, or the triangle filters (see Wiki for a list) which terminate with a discontinuous derivative. These attenuate O(|ω|-2). It is fairly easy to extend to a zero derivative at the end, as with Hann, Blackman, and so increase n to 3 or beyond. These satisfy 1 and 2. For Welch W, raising it to the nth power improves the termination by a power of n, and the corresponding HF attenuation.

The ultimate in this direction is the Gaussian. It gives best roll-off in the freq domain (FD). It does not strictly have bounded support, but approaches zero so fast that it can be truncated, with only a small amplitude HF tail (though slowly fading). This can be ameliorated. But again, it does not have a flat top.

The Lanczos window tries more to honor requirement 3 by reintroducing aspects of the sinc function in the time domain. It is a product of a narrow and a wider sinc, generally arranged so that the zeroes coincide at the cut-off. It gives a flatter top in the FD, at the cost of being non-positive in the TD.

I find that Welch functions are flexible. You can raise them to any power to give faster attenuation, at the expense of a broader footprint in the FD relative to the support in the TD (so attenuation is eventually faster, but starts later).


That was all a longish preamble to fulfilling a request from Greg to plot spectra of the Gaussian (truncated at 3 σ), and Lanczos, along with the Welch functions I have been using.

So here are the plots of the windows. I've normalised to area=1, except for Lanczos, shown with area=0.5 (for the sake of the y-axis). But it is restored for the spectrum.

W stands for Welch. You can see that the higher powers are looking quite like the Gaussian. So here are the spectra:

You can see the trade-offs. W decays relatively slowly, but is narrowest in the FD - for its TD footprint, it has a narrower low-pass. Higher powers approach the Gaussian. The Lanczos does indeed have a flatter top, as intended, and also a good attenuation. But it is broadest of all - it lets through relatively high frequencies before the cut-off.


  1. Nice. Good comparative plot. It would have been good to have the gaussian in there too for comparison.

    While your first order is steeper in roll off than the Lanczos the higher ones seem quite close. See below. Your comment that L is letting through more HF is not really accurate. This is only because it actually *has* a pass-band ! The others attenuate right from zero.

    To make a realistic comparison you need to scale the zero of all the filters to the same point. eg align all the filters to have a zero at 0.01

    You will then see a better comparison of the neg. lobes and also that L has an unattenuated pass-band then drops to the same zero. This should show that it has a faster transition band.

    I'm a little surprised at the passband ripple there, could you put a figure on the peak?
    That's a defect that needs to be quantified.

    1. Greg,
      "To make a realistic comparison you need to scale the zero of all the filters to the same point. eg align all the filters to have a zero at 0.01"

      I scale them to have the same footprint in the time domain. I think that is what counts. The Gaussian, for example, doesn't have a zero for a long way out.

      That's why I comment on the wideness of the Lanczos. It's starting the cut-off at fairly high frequencies. That's fine, but it has to observe a longer period relative to those frequencies (more cycles) to get the cut-off.

    2. "I scale them to have the same footprint in the time domain. I think that is what counts. "

      Huh? That's a strange criterion for filter design. Surely it's the frequency response that is primary metric, not the implementation practicalities.

      There are situations where the footprint weighs heavily in the choices made but it is not a measure of filter's behaviour. You need to display the behaviour
      of all of them in comparable form and then F.resp. becomes something you can balance against other factors (like kernel length) in making the design choices.

    3. "Huh? That's a strange criterion for filter design. Surely it's the frequency response that is primary metric, not the implementation practicalities."

      Practicalities matter. Most climate series are relatively short. After smoothing they are one footprint shorter. Plus, as Pekka says, if you want to estimate a derivative, you want it to be as local as possible.

  2. Sorry, looks like you did add the gaussian, black line.

  3. Here is my plot of Lanczos vs a three pole running mean ( which is very close to the gaussian but actually does pass through zero ). They have the same zero pass frequency.

    Looking at that plot again I wonder if I didn't use a 5-lobe window. The strong ripple suggests you may also have used that. ( a=3 in the parlance of the WonkyPedia article is 5 lobe, their a=2 is the three lobe that I usually prefer. Go figure out why they present it like that ).

    Again it's a trade-off. 5-lobe has faster transition : closer to ideal step in FD, but the cost is a bit more ringing. This can give the negative 'costs' that you referred to in your last post. I generally avoid it because of this overshoot.

    1. Greg,
      I used the Lanczos shown in the top plot; in Wiki terms, a=3. Counting both sides, it's a 5-lobe window.

    2. Thanks, that was what I guessed from the passband ripple. That's what you get for using WonkyPedia.

      That article is primarily about Lanczos interpolation ( and written by someone who has never actually done it ). If you use 5-lobe for image interpolation you will horrible ringing which is jarring to look at. It's similar to what jpeg does to block graphics.

      Gimp uses a three-lobe window for Lanczos interpolation in it's scaling options. L5 was tested as a possible additional option and dumped.

      I think that is a visual indication that it's probably a bad choice for data too.

  4. Nick: I scale them to have the same footprint in the time domain. I think that is what counts.

    Well it depends on what you are trying to preserve. More typically, when we are comparing filters, we adjust them to either have the same equivalent rectangular bandwidth (ERB) or we fix the 3-db corner frequency (that's what I usually do, even when I'm working with time-domain filter implementations).

  5. Great observation about the relationship between Welch^n and the spherical Bessel functions by the away.

    For Welch filters, this gives us an analytic method for choosing to preserve any criterion that we wish, be it the location of the first null, the 3-dB roll-off point, or fixing the ERB.

    Anyway, I think this is a nice illustration of how using a analytically tractable function often more than offsets the
    advantages of more complex filter choices.

    This is a nice piece of work.

  6. Here is an alternative presentation:

    Let * represent convolution, W = Welch; R=lin. ramp. Since differentiation and convolution are both linear operations and thus commutative:

    D ( W * T(t) ) = DW * T(t) = R * T(t) = W * D ( T(t) )

    ie. sliding-trends ( T(t) ) = R * T(t) = W * dT/dt

    That fits better with what is 'obvious' ( ie familiar ) to me and finally agrees with the result you originally posted. Thanks for your patient explanations.

    So now we see that sliding trend is low-pass filtered rate of change with a fairly lumpy freq. resp with a nasty 50% negative lobe, we may seek a window function with a more orderly FT:

    eg. D ( gauss * T(t) ) = Dgauss * T(t) = gauss * dT/dt

    1. BTW the reason you had problem with negation was that OLS is the correlation of +ve slope with the sample but convolution requires that the kernel is reversed in time. This is often ignored since with a symmetric kernel it has no effect, and is not mentioned.

      However, with a asymmetric kernel like an exponential ( relaxation process ) or your slope, it does.

      In fact, it was not the y of the kernel that needed inverting is was x.

  7. I said I'd come back to Pekka's two point first-and-last approximations.

    Nick has shown the convolution of dT/dt with parabola ( Welch ) is the same as convolution of T(t) with a ramp ( OLS slope )

    by similar logic convolution of dT/dt with a ramp ( OLS slope ) is the same as convolution of T(t) with a rectangle ( boxcar, running average )

    one step further, convolution of dT/dt with a rectangle ( boxcar, running average ) is the same as convolution of T(t) with a derivative of the rectangle : Pekka's two point slope estimation.

    We have already seen that FT of d/dt of rect is f*sinc(f) , ie a pure sine with zero roll-off. A repetitive notch filter with repetitive, unattenuated inversion of the signal. A linear combination of any number of them will have similar defects. It would be quite simple using odd or even harmonics to turn the combined FT into a square or a triangle wave but it would still have the same zeros, same repetition and same inversion pattern.

    Interesting but not much help as an estimation of slope.

    However, convolution of the kernels corresponds to multiplication of FTs where the different spacing of the zeros in FD will enforce more zeros in the response.

    This is essentially what I show as a three pole running mean approximation to gaussian that does have a zero in FD.

    However, this discussion highlights a danger in applying this to derivatives ( eg dT/dt ) since the stop band ripple that dies away nicely in that plot will be revived by the *f effect of the derivative.

    At this stage we'd have to invoke Carrick's 1/f nature of the data to justify continued use.

    Many useful ideas tying together here, thanks to all for the varied range of expertise and experience.