Thursday, May 29, 2014

Detecting periodicity


I've been following, and occasionally joining in, a series of threads by Willis Eschenbach at WUWT. Here is the latest, and you can follow links back.

Willis is trying to detect periodicity in various segments (century or so) of monthly or annual data. Lately he has been trying to decide whether sunspot (SSN) periodicities can be observed in climate data. This involves first establishing the major periodicities in SSN, and then looking at periodicities in signals which might be responding.

In the early part, Willis was fitting sinusoids using a least squares fit by optimisation. In this comment, I showed that the quadratic form could be expressed as a fairly orthodox discrete Fourier Transform expression, and wrote code to show how the same result could be derived. Willis then recharacterised what he was doing as a Slow Fourier Transform; here I try to put that in context.

I was not, BTW trying to suggest that what Willis was doing was wrong; only to say that it could be done more efficiently by FFT methods (which involved padding), and there was a long history. In fact someone characterised it as a Lomb periodogram, and it is certainly of a class of Least-squares spectral analysis (LSSA).

In the most recent version, he is using regression of the data onto sinusoids, and investigating the effect of using monthly or annual time divisions. Tamino chimed in in a positive spirit (the thread was called "Well, Color Me Gobsmacked!"), and said the method was known as the Date-Compensated Discrete Fourier Transform.

I think Willis now regards the chief advantage of the method as allowing for irregularly spaced data, in his case missing months, and that is what the DCDFT is for.

The method is sometimes being criticised as not being a truly orthogonal decomposition. I think that is misplaced. You can look for periodicities in a Fourier decomposition, but it is not the only way. I think Willis is basically filtering. He creates the equivalent of a perfectly tuned circuit, which takes the data as input, and identifies the amplitude of the response as the measure of periodicity. I think that is fine, and so I'd like in this post to say how one might assess and maybe improve the merit of that measure.

It's like what you did with the old AM radio receiver. By frequency shifting with a knob, you listened to the sound level to find stations.

What I want to do in this post is to show how you can get past all the issues of missing data and orthogonality, and rephrase in terms of Fourier integrals, which exposes the sources of error. I'm expanding on this comment.


Willis' method as a filter

For any given frequency ω, Willis uses the R lm() function to fit a+b*cos(ωt)+c*sin(ωt). He then returns √(b²+c²) as the amplitude of the fitted sinusoid. He usually plots this against period, not frequency, and looks for peaks.

This is a linear filter on the input. Normally we think of a filter as modifying one time series to another. This is a perfectly tuned filter, in that it produces a pure sinusoid as output. That is rather boring, in that it can be characterised by two numbers, amplitude and phase, and Willis drops the phase. But it is still a filter. You can think of it as a tuned circuit (or bell or cavity etc) which responds to frequencies near its resonance.

Expression as an integral

I'll assume that the data represents samples from a continuous function f(t). This assumption is basic to any continuum analysis. You have to assume that the data you have is representative of the values in between. f(t) could, for example, be a linear interpolate.

The sum Willis is evaluating in lm() can then be expressed as an integral:
∫ Ψ(t) cos(ωt+θ) f(t) dt
ω is angular frequency, θ phase.  All integrals here are from -∞ to ∞, but Ψ(t) is sum of a finite number of delta functions. I've chosen Ψ to suggest a shah (Ш) function, also called a Dirac comb of delta functions. Generally it will here be mostly equally spaced, but that is not a requirement. It is truncated.

So now we can see what kind of filter it is, by looking at the Fourier Transform. The FT of cos is just a paired delta function that shifts the spectrum of Ψ(t) to center on ±ω. So if Ψ has, say, N=200 delta functions with s=1 annual spacing, the Fourier transform (t origin at the middle) is
sinc(Nω)/sinc(sω)
where sinc(x)=sin(x)/x

It looks like this in the frequency domain:

and the amplitude looks like this:


It is symmetric about 0. When combined with the sinusoid, it is displaced. Willis is looking for periods around 10 years; using that as a test frequency gives:



I think this shows a number of things. There is a peak at 1/10 yr-1. It has side lobes, which represent the effect of the finite (200 yr) interval. These affect the ability to discriminate peaks.  It is actually periodic overall, with period 0.5 yr-1 (the FT alternates in sign and  has period 1 yr-1). That shows problems when you get up to the Nyquist frequency. But the peaks are fairly sharp, and the Nyquist troubles far away, at least if you are looking at periods of 10 years or so.

Irregular points

The sinc function ratio is actually the sum of sinusoids evaluated at the sample points. You can evaluate at a set less regularly spaced, with missing values say, and the function won't be radically different. And of course, the method still works.

Other ideas

In recent discussion, I've put some other analogies. The method is actually like the way analogue spectral analysis was done. Tuned circuits, acting as filters, responded to the signals in a tuned way. If you aren't familiar with circuits, think of a bank of lossless tuning forks, mounted on a sounding board, say. You buzz the burst of signal, and see which ones pick up the most energy.

The mathematics of the regression Willis uses is in fact the math of heterodyning. You multiply by a sinusoid and add. If the signal can reasonably be fitted with a nearby sinusoid, then the trig formula for products will yiekd a beat frequency, lower in pitch as the two frequencies are closer. The addition is a low pass filter, passing with amplitude inversely proportional to frequency. So its response is approximately proportional to both the strength of the periodic component, and to its closeness.

What next

I'll stop here for the moment. I'd like to deal with the effect of annual averaging, which fits in nicely, and was one of Willis' topics. I'd also like to deal with the effect of windowing, with the idea of making the side lobes diminish faster. I think it will help, but the current situation isn't bad.

26 comments:

  1. Very interesting Nick.

    The sign of all this tends to get overlooked because everything is presented in terms amplitude.The negation of the sinc/sinc at Nyquist could be important.

    Some posted last year on Curry's site about the possibility that hadSST or hadCRUT showing signs of aliasing. ( I cant find it in site search just now ).

    What is the _sign_ of the side lobes at 0.4 and 0.6 ?

    The will not be 0 or 180 phase , I know but have an equal but opposite phase shift about the central value.

    But since all this is totally predictable mathematically , isn't there a clear means to test for the presence of aliasing in a precessed dataset ?

    Greg.

    ReplyDelete
    Replies
    1. Greg,
      The denominator sinc doesn't affect anything much until toward Nyquist. So it's basically just the sign of the sinc numerator. I've done the arithmetic signed, plotted as abs. I'll see if I can add a good plot.

      As for testing aliasing, I don't know how to do that.

      Delete
    2. Thanks Nick. What I meant was that in your first graph above, the second peak at 0.5 is inverted. That presumably means that side-bands are deviations from an inverted signal not form an in-phase copy.

      I have a feeling the must be a way to detect that sort of pattern in the FT.

      I'm annoyed I cant find the article at Judith's site, it was quite good.

      Delete
    3. Greg,
      Well, 0.5 is the Nyquist frequency. It's saying in general that you'd bring in (negatively) a part of that, except that those frequencies can't be properly represented on the sampled grid anyway.

      It may be significant when the filter is translated; then you do alias in some near-Nyquist frequencies as low freq. Again, I don't know how to detect that.

      Delete
    4. Greg,
      I think you were thinking of this:
      http://judithcurry.com/2011/10/18/does-the-aliasing-beast-feed-the-uncertainty-monster/

      Delete
    5. Greg,
      I looked again at Saumarez article. I thought then and think now that it was absurdly theoretical. I gave up on RS when he seemed to think that HADCRUT was used to drive a model. I queried with no response. But he's looking, in this context, at the possibility that there might be a sharp peak with period 30.3 days, which after 30-day averaging, would contribute to an apparent 10-year cycle. By sharp I mean that it stands apart from, say, 30 day period.

      Apart from the implausibility of that, it would be hopelessly confounded with the different month lengths.

      Delete
    6. Thanks NIck, it was Saumarez and I think this is the article (IIRC he did a similar one earlier). He makes some good points and some silly mistakes (like we all do). I believe no one till I've checked.

      I think he presents a good case that there is evidence of aliasing being present. Tough it would be better to look at hadSST and CRUT separately.

      There will not be a sharp "peak" at 30.3 but there will be the 12th harmonic of the very strong annual cycle. Looking at FT of daily arctic ice, for example, you can see all these higher harmonics quite clearly.


      The problem is that climate science has yet to discover d.p. basics like the need for anti-alias filtering before resampling data.

      Greg.

      Delete
    7. Actually he refers to looking at individual "stations" so I think like so many others he seems to be confusing Had and CRUT. AFAICT he is looking at CRUTEM.

      Delete
    8. Greg,
      "There will not be a sharp "peak" at 30.3 but there will be the 12th harmonic of the very strong annual cycle."

      No, that can't alias with the monthly cycle. The reason is that there are exactly 12 months in the year. So you can't have a beat frequency over years.

      Delete
    9. The "annual cycle" is a fictitious simplification, it is not a "cycle".

      There is roughly repetitive variability each year, which has significant energy at 12th and below. This is not a fixed cycle it changes from year to year. This implies the magnitude of this 12th changes each years.

      But when does it change?? The 1st Jan each year? No, it is constantly changing. You could recalculate the "annual cycle" each month (Jan-Jan; Feb-Feb.....) and get a different value for the 12th.

      Now, monthly sampling is an irregular mix of two sampling frequencies, each of which could cause aliasing.

      E&OE this gives 8.3y and 3.65 y

      3.7y is something that I come across very often in a wide range of data.

      Maybe this merits a closer look.

      Greg.

      Delete
    10. Greg, "fictitious"
      I disagree; the annual cycle is the real physical thing. Another real cycle could provide beat frequencies, but monthly can't; it is locked to 12/yr. If a beat freq is to be significant it has to be maintained, in phase, over several years. But even where months vary, after one year it's in sync. OK except for leap years, but that's a tiny effect.

      Why tiny? Leap years represent the difference between the daily and annual cycle, both real. 365 + 0.25. But if you're looking for beats there, you are starting at the 365th harmonic of annual.

      Delete
    11. You are correct. The 12th harmonic can not slip out of phase with the ( albeit irregular ) 12 per year sampling.


      I think his fig 4 and fig 8 which he labels as showing that the data IS aliased is a misinterpretation. It is simple that part of the spectrum below Nyquist is meaningless and should be ignored.


      His figure 7 does show the leakage and distortion of the running mean filter below the sampling frequency. This WILL cause aliasing. Though it is relatively small.

      What he does not seem to say explicitly is that taking monthly means is identical to monthly re-sampling of the running mean, and as seen the running mean is not an adequate anti-alias filter.

      A subsequent point is that the leakage is in fact inverted as well.

      I have not tried to reproduce his histogram but he seems to have a point about the possibility of spurious decadal scale trends being created.

      His synthetic data seems to shows a magnitude that is significant. His description of his method seems reasonable. This should be reproduced and verified.

      Delete
    12. PS the distorted leakage is the lesser part of the problem. Any aliased trends will come from the part between 2mo (required filter cut-off) and 1mo (first zero in RM).

      This is a significant signal.

      Delete
  2. My impression of all this is that WE (and its a general failure at WUWT, and similar) are doomed because they don't know any of the prior art.

    ReplyDelete
    Replies
    1. Well, yes, but he's currently being sensibly sceptical about solar cycles etc, and his methods are adequate for the task. And well, I guess he would say that he's having fun.

      Actually, I was quite surprised to see Nick Lomb's name turn up on a periodogram. I've been aware of him for a long time as a fairly practical astronomer (and a very good curator). I was actually quite surprised that the prior art of the 70's was seen as new at the time.

      Delete
    2. > his methods are adequate for the task

      I only skimmed. I wouldn't be too surprised to find that they aren't; that they won't find more subtle signals. But its all so pointless; there is so much work on this topic. For exampe:

      * http://www.cawcr.gov.au/staff/jma/vanloon_solar.pdf
      * http://depts.washington.edu/amath/old_website/research/articles/Tung/journals/coughlin-tung04strat.solar.pdf
      * http://strat-www.met.fu-berlin.de/labitzke/signal/

      Well, you know how this stuff goes.

      Delete
  3. The CSALT model that I developed uses a similar least-squares spectral analysis method of estimating Fourier components.
    http://contextearth.com/context_salt_model/

    I add an oscillating factor with frequency w as an unknown amplitude A*cos(wt) + B*sin(wt) to a multivariate regression analysis and include the A, B parameters to the model if the R2 error is reduced markedly.

    Of course this has the problem of over-fitting unless one is committed to restricting to only those sinusoidal factors that make physical sense.

    Wondering Willis became very upset with me when I pointed out how I applied his "discovery". What a nimrod.

    btw, the outcome of the model is that CSALT does pick up the TSI oscillations, but they are at the 0.05 C level in amplitude, which is well below the secular trend of 1C due o global warming.

    ReplyDelete
  4. William Connolley,
    Is your critique of Willis so hostile because you disagree with him politically or because what he is doing is incorrect? My guess is the former. Nick, to his credit, at least acknowledges that what Willis is doing is giving accurate results. Your approach toward all 'skeptics' is symptomatic of the poisoned pool of climate science; a pool in which you seem to so very comfortably swim. I do wonder if your goal is to advance understanding or to force arrival at your preferred political outcome. All evidence indicates the policy outcome is all that matters to you; sad.

    ReplyDelete
    Replies
    1. I think pointing out that Willis is doing Fourier analysis with a limited knowledge of prior art is fair comment. It was made plentifully on the WUWT threads, and Willis generally acknowledges that. In fact, he was quite chuffed when Tamino identified his method as DCDFT. If you check the latest thread, I'm mainly defending him against accusations of imperfect Fourier decomposition. And earlier, trying to put what he's doing in context. He didn't see that as hostile.

      I'm actually very sympathetic to anyone who wants to work out their own mathematics, and I think Willis does that very well. I bristle when inadequate maths is used to attack what other scientists are doing.

      Delete
    2. Sure Nick, and Willis was pleased to find that someone had apparently done (in the 1980's) exactly what he has been doing himself. I do not dispute that what Willis is doing could, at least in theory, be learned from various earlier publications (eg. Tamino's references to 1980's astronomical research). But come on Nick, Willis is working on his own, without funding, and more to the point, without free unlimited access to existing published work. I have plenty of my own experience 'discovering' something that was already (somewhere) known long before I developed it. That is OK. What is not OK is stuff like "My impression of all this is that WE (and its a general failure at WUWT, and similar) are doomed because they don't know any of the prior art." Doomed? Really? Doomed to what? I'll take Willis every day over a jaundiced and unpleasant individual who's only goal is to force his green tinted vision of the future on everyone else. I'll take Willis on a general intelligence test over 99% of his critics, including WC... and I'll make money for sure.

      Delete
  5. Nick,
    You need an edit function. Really.

    ReplyDelete
    Replies
    1. Stephen,
      Edit functions are rare on blogs. I think Lucia's may be the only one. Blogger (this host) does allow you to obliterate your comment completely. I've done that with the ones you removed.

      Delete
  6. Nick,
    Thanks for the obliteration of the multiple deleted comments. I will try to type much more carefully on your blog.

    I was in Turkey for several days until yesterday morning. 18 million people in Istanbul, and all wanting to increase their carbon footprint. Any approach to control fossil fuel use is going to have to address the desires of the billions who are now poor (or semi-poor), but who want to be rich.

    Steve

    ReplyDelete
    Replies
    1. I probably didn't say clearly, but you can actually obliterate it your self - there is an option at the bottom of the window. And you can edit while you are writing.

      But yes, it's not a great writing environment. I always write in a local editor, then paste. I think Blogger may restrict pasting unless you log in with an identity, eg Google or Wordpress.

      Delete
  7. Webby, you seem to have some good maths knowledge so I hope you can see the following point.

    A change in radiative forcing , whatever its origin, will produce an in-phase response that is a change in dT/dt, not a change in T. Flux is a power term not energy. That means that changes in F and T are orthogonal.

    Once the system has had time to settle, at least nearly, to equilibrium you can look for a final change of T (delta T) that matches the new level of radiative forcing.

    Now since F and T are orthogonal the correlation between them is ZERO.

    What any regression analysis seeks to do is to find the correlation between the variables. That is not possible if the two quantities are orthogonal. Indeed this is sometimes used in an attempt to separate the two responses. eg Forster & Gregory.

    Once surface temperature starts to change ( the integral of the induced dT/dt change ) there will be a radiative feedback. This will be in-phase with the instantaneous T.

    So if you get a finite result when regressing T and Rad terms what it reflects is the climate feedback responses NOT the long term temperature increase delta T induced by changes in the forcing.

    Do a multivariate regression and you have the same problem in spades.


    Now not all your CSALT terms are rad forcing. ENSO is temperature but likely has non zero phase with global averages so you will not get the correct magnitude there either. Anyway, most are rad terms, so the whole exercise is doomed from the start.

    Yes, it sort of looks OK because you have lots of variables with about the right mix of frequencies but that does not mean it's physically meaningful.It's just like regressing a bunch of suitable AR1 series would also "explain" most of the temp. record.

    You are not alone, there are several published papers based on equally incorrect correlation analyses.

    Lagged regression is similarly flawed because it fits neither one nor the other and as soon as we depart form orthogonality with one or the other, the correlation gets diluted and the derived regression coeff is an under-estimation.

    Spencer & Braswell note the problem without finding a solution for the correct result.

    I attempt to derive a more correct correlation by anticipating a simple "relaxation the mean" response here:

    http://climategrog.wordpress.com/?attachment_id=884

    Perhaps you could adapt that to CSALT

    There is also the much overlooked problem of regression dilution due significant error and non linear variation in the x-variable. This means most cases of regressing one climate variable against another one will be fundamentally wrong also. Again under-estimating the correlation:

    http://climategrog.wordpress.com/2014/03/08/on-inappropriate-use-of-ols/

    Once you get into multivariate regression you have the problem in spades,again. The regression will be incorrectly fitting one variable and falsely attributing correlations to one or more others to minimise the overall residual.

    Any resemblance of the result to the physical reality is purely fortuitous.


    ===========

    Best regards, Greg.

    ReplyDelete
  8. Goodman, Your criticism is only precautionary. Here is a very simple proof. If somehow the sun doubled the radiative forcing from what it currently is, I would EASILY be able to pick up that correlation in CSALT. However, it wouldn't be accurate because the resultant temperature is a nonlinear function of radiative forcing -- S-B law. Now consider that since the actual radiative forcing is incremental, I can always linearize the result as a Taylor series approximation. Add in a simple lag and voila, we have an estimate of the transient effect. We all realize that the fat tails of the equilibrium response are buried in the huge heat sink of the ocean, so we can safely ignore that for the time being.

    So the proof is that the limiting case makes sense and the incremental view for transient response makes sense.

    The physical view is that this is no different than a variational thermodynamics approach, which is well accepted.

    I do realize that you are creating a "just-so" story about how volcanic eruptions have long term effects on the climate but I ain't buying that. The effects of the volcano condense out within a few years. CSALT picks that up amazingly well.

    The fact that CSALT is a transient analysis should be pretty clear and it thus shows a lower-bound estimate to how much warming has occurred immediately due to the thermodynamic factors.

    ReplyDelete