Sunday, April 27, 2014

A weekend paradox

There's a post by Willis Eschenbach at WUWT, titled Extreme Times. It notes that with an autocorrelated signal, for any fixed observation period, the max for the period is more likely to be at the ends than in the middle. That's not easily intuitive.

They go on to argue that statements like "the millenium ended with its warmest decade" do not reinforce global warming. That's what millenia do.

The argument he's countering says that there's only about a 1 in 100 chance of that happening by chance. No, says Willis, it's more like 1 in 50. True, but that's still only 1 in 50. It doesn't change much.

Anyway, I thought of a more homely version of the paradox. Counting weeks as starting on Sundays, on what day are weekly temperature maxima most likely to occur?

My argument went thus. It's like with TOBS. Warm days often come in spells. A warm spell midweek will probably yield a max for one week. But a warm spell at the weekend may well make a weekly max on both Sat and Sun. So over a year, say, Sundays will show up more in the statistics. In fact, up to twice as often as mid-week.

Anyway, the argument at WUWT went on, so I checked. I have a file of daily max for Melbourne from this post. It's from May 1855 to Nov 2013. I counted. Results below, with an error corrected:

Update I had made a mistake in transposing matrices, which had the result of somewhat exaggerating the effect. It is still there though. I have posted the Melbourne max data as a 7 col array here. It starts Sun 7 May 1855.

Day of MaxNumber
Sunday1273
Monday1138
Tuesday1100
Wednesday1071
Thursday1101
Friday1057
Saturday1534

This doesn't mean, alas, than Nature gives us specially warm weekends in Melbourne. You'd get the same result for minima. Or, if you start the week on Wednesday:

Day of MaxNumber
Wednesday1306
Thursday1102
Friday1067
Saturday1126
Sunday1038
Monday1150
Tuesday1485

TOBS

This logic lies behind the TOBS adjustment for change of resetting times for min/max thermometers. There you divide into 24 hour periods. The difference is that Nature does make a difference between hours of the day. So if you make a split at 5pm, while it does increase both the occurrences of maxima and minima there, at that time maxima are far more likely to occur and be more counted. That's a warm bias. If you shift to 9am, minima will be favored. In the USHCN, there was a trend to move from 5pm reading to 9am reading of min/max thermometers (which sets start of "day"). That needs to be corrected. And yes, since the bias moved from warm to cold, correcting it increases trends.





18 comments:

  1. I'm guessing next he will go and prove the non-existence of God, since there's autocorrelation between proven miracles and the amount of religious people present at the time of observation.

    ReplyDelete
  2. Nick,

    The double counting at the ends can be removed by doing what is called a moving max (or min). However, I've only done this for an odd integer window size, so that it is centered on the middle of the window. The trick is to match the moving max/min time series with the original time series and select all exact matches (for limited precision data, I add random noise at the end of each data point, that removes repeating numbers that might occur due to limited precision).
    I did find the algorithm (on my own, but perhaps just a reinvention that someone else has already done before) completely removes adjacent maxima (or minima). I did this about 3 years ago when working on historic Mississippi River stage data during the flood of 2011. I know this algorithm works for selecting, on average, a max/min, per annum, when N = 365 (one data point/stage per day).

    And I've just applied it to the Newport, RI (NOAA 8452660) predicted hourly tide data circa 1930-2012 (again a small project for the USACE). NOAA in their predicted tide series uses a maximum frequency of one year, the data are definitely stationary (zero mean, zero slope, you know a bunch of tidal harmonics). I took the window size as N = 25, selected all matching pairs, and obtained an averaged period of 25.72 hours, not exactly 25 but then again there's the O1 harmonic sitting out there at 25.819 hr, so perhaps no real surprise there. Anyways, then I tried M = 24 hours (or rather 24 bins) since that seems rather obvious, but no go, distinct semi-diurnal distro. So then I tried M = 26 (or rather 26 bins) since that was very close to the average period of 25.72 hours, viola, a uniform distro (Excel 2013 x64, not that it matters);

    Mean = 1088
    Median = 1092
    Mode = 1095
    Min = 1057
    Max = 1113
    Stdevp = 13.08551754
    Skew = -0.580302101
    Kurt = 0.051937353

    I also have some thoughts on what at first glance looks like an elliptical distro that WE and you have generated, but perhaps later.

    What would be really good is if WE would post his two million point time series (the RI tide time series I mentioned above has ~730,000 data points), I'm banned over there, so I don't want to do the asking.

    As to your data set, N = 7, but I would need to get the data into two columns from the array format it is in now (rather rusty at both Excel (and Fortran) at the moment, so I don't remember how to convert tabular time series into a linear array).

    Or I could send you the two spreadsheets, that I mentioned above. As usual YMMV.

    ReplyDelete
    Replies
    1. It seems to me that the effect follows from the definition of max, so I was doubtful that it can easily be removed. Here's an analytic analogy. Suppose you have idealized weather, where the temperature is a sinusoid with max at 3pm, min at 3am. Actually it doesn't have to be a sine - just periodic and only one local max per period.

      Take random 12-hour samples. What's the frequency of the hours in scoring a max?

      If the sample starts between 3am and 3pm, the max (at 3pm) will be interior, and all hours equally likely. But if it starts between 3 and 9, the max is at the start. And later, at the end. So a 7/24 chance for the first and last hour.

      Tomorrow I'll post the Melbourne data as a 7-row array.

      Delete
    2. Nick,

      I found a linear copy at this link for Melbourne (circa 1855-2014);

      http://climexp.knmi.nl/selectdailyseries.cgi?id=someone@somewhere

      For some reason the time series for max/min are fully populated. don't even know if it's adjusted data or not.

      BTW, even a seven column time series may pose problems for me as all time series that I have are linear one column data + start time + increment.

      I generate the dates based on 146097 days/400 years = 365.2425 days/year, months are uniform at 365.2425/12 (I almost always look for at least daily time series, if available).
      .
      I do this to get around the Excel dating issues (no negative values, in Excel 12:00 AM 1/1/1900 has a value of 1 and (12:00 AM 12/31/1899 = 0) when it should be 0, Excel also carries a leap day of 2/29/1900 when none exists).

      Currently working with the max time series with N = 7, N = 15 and N =365 (however, no 365 bin distro on that one). For N = 7 & 15 the distros are uniform.

      Also, I did the linear trend line from the time series, my method finds no matches (as it should), while the fixed window method would find a max at the end of every window (positive slope).

      Delete
    3. Everett,
      I have posted the data, linked above, arranged in weeks (in tenths °C). I don't find much advantage in formulating as time series; it's just a long string of daily maxes, and all I really do is step through 7 at a time to count where is each weekly max. Still, I made a mistake which a ts formulation probably would have prevented. I've corrected the results, which show a somewhat diminished effect.

      Delete
  3. Nick,
    First, do you find that Willis gets in way over his head on these matters? Or do you think he is intentionally deceptive?

    On this matter, here is my take. I looked at the red noise time series he is using and it looks more like an unbounded random walk than the bounded, reversion-to-the-mean random walk that a red noise process should have as a characteristic.

    One of the properties of a classical random walk is that it should act as a martingale (or gambler's ruin) process, which means that it will eventually walk to plus or minus infinity. This means that all states are equally populated and the AC has a spike at 0 only. The implication of this is that one would normally see the walker near an end-point as it makes its journey from its starting point to +/- infinity. In other words, it will be near an endpoint the longer it runs. That is the gambler's ruin outcome.

    With that as a boundary condition, a red noise walker can be configured by the Ornstein-Uhlenbeck coefficients to assume a character of anything from a tightly bounded random walker that bounces between two states (like a random telegraph signal), to something that looks like a classic unbounded random walker. The issue is that if Willis chose weakly bound O-U coefficients, it will start to look like an unbounded walker, especially if he does not let it run long enough. That is the catch. He has a finite run on a weakly bound red noise walker, which means that it will not have visited all the states. If he did a histogram on a tightly-bound red noise walker, the profile would have been uniform. And that is what a temperature profile looks more like. Unless the AWG is in effect, which means that there is a secular trend and of course it will be near an end-point.

    I really do laugh at Willis for what he does in promoting FUD in climate science. He takes on this personna of an everyman working stiff who claims to have this great scientific intuition and then passes on his "discoveries" and the gullible fools at WUWT lap it up.

    ----

    BTW, you also may be on to something with the day-of-the-week analysis. There is an interesting stat study on heavy weather vs workweek days and it looks pretty conclusive -- yet I wonder if they have considered your "wrap-around" effect?
    http://www.agu.org/pubs/crossref/2008/2007JD008623.shtml




    ReplyDelete
    Replies
    1. WHS,
      Willis is over his head - maybe we all are. He has skills and a feel for numbers, and I thought this was an interesting post, though spun a bit too much. He describes himself as a passionate man, and that can certainly obscure the vision. The indignation is a problem.

      I thought here the idea was right, but the example probably wasn't the best choice.

      The study does mention a weekend effect, which may include something of the above. It would be certainly worth checking with different week cutoffs.


      Delete
    2. More proof that Willis is in over his head.
      Your response to his "slow Fourier Transform" was much too kind.

      His indignant ego is through the roof.

      Delete
  4. Well, he does convey the impression that his methods are special, and they aren't. But they are sound enough.

    ReplyDelete
  5. Nick,
    The Fourier analysis ain't going to work on something as complex as decoding ENSO.
    These are not stationary waveforms, nor are they composed of simple sines and cosines.

    We are going to have to make an all-out effort to educate people on how we can unroll the physics behind ENSO and climate variability in general:
    http://contextearth.com/2014/05/02/the-soim-substantiating-the-chandler-wobble-and-tidal-connection-to-enso/

    Willis isn't the go-to- guy on this, you are, Nick. The WUWT's crowd is terrified of your knowledge and breadth, and that's why you get beaten down .. and then with no shame, the WUWTers turn around and use artifacts from your server. Hilarious watching them get spun around like that.

    Well played.

    ReplyDelete
  6. You don't need stationary waveforms before you can use Fourier analysis. There is in general a one-to-one correspondence between linear time-domain manipulation and frequency domain ones, a statement that does not need stationarity before it is valid.

    So what you said is just nonsense.

    You can use Fourier analysis (in the form of spectral periodograms) study ENSOs without any problems. In fact, it is quite conventional and useful to do so, in climate studies.

    Here is a usage by the IPCC AR4 for example.

    ReplyDelete
  7. Sure Carrick, go ahead and use Fourier transforms on every time-series problem. No skin off my nose to watch you struggle. :)

    Probably as entertaining as watching wonderin willis make a fool of himself.




    ReplyDelete
  8. Actually, I have software that performs real-time linear (and nonlinear) filtering of signals. I happen to use the DFT to implement this because of its relative efficiency. Even things like log-frequency sweeps can be efficiently and accurately computed in the frequency domain using a DFT. I've tested and compared it against time-domain code and it works. [tm] It happens that the DFT code has (nearly) fixed computational costs, for a broad variety of filter designs. And since we can benchmark the DFT code, we know up front how much overhead it's going to use (for real-time data filtering this is important to know).

    By that's neither here nor there. You claimed that "The Fourier analysis ain't going to work on something as complex as decoding ENSO." Not only is this false, Fourier analysis is commonly used by people in climate science to study ENSO.

    What Willard is doing is correct, even if he doesn't know the right name for what he's doing. The formula for the DFT is after all conventionally derived using an OLS formulation.

    So I'm not sure what you are actually finding entertaining here, but hubris is becoming of nobody.

    ReplyDelete
    Replies
    1. Well, I am decoding ENSO without needing to use an FFT that's for certain, and I am cautioning against using the tool like a hammer where every problem you see is a nail.

      In the case of ENSO, the underlying mechanism does change with time
      contextearth.com/2014/05/02/the-soim-substantiating-the-chandler-wobble-and-tidal-connection-to-enso

      Also check this out. I tried Eureqa on the ENSO SOI signal, and take a look at the weird chirp time series it generates as a best fit so far:
      soi = -0.0121906051619981*cos(2.16080385292669*Time) - 0.0222527128064939*cos(1.09507734655359*Time + 1.48316186388863*cos(0.631267903227871*Time - cos(0.413159383099342*Time) - 0.413159383099342*cos(2.16080385292669*Time)) - cos(0.10810821132414*Time + cos(0.631267903227871*Time - cos(0.413159383099342*Time) - 0.413159383099342*cos(2.16080385292669*Time))^2*cos(0.413159383099342*Time)))

      There is the well known 2.9 year cycle in there, but the rest is a recursive sin(sin(sin())) frequency modulation. That's a tricky one to extract from a Fourier analysis.

      BTW: Willard? Who's Willard? You're losing it I am afraid.

      Delete
  9. This is a repeat of a comment I made that apparently went into the bit bucket. It is paraphrased because I didn't save the other one before publishing.

    I am not arguing that you should only use Fourier analysis, just correcting your statement that it can not be used.

    Fourier analysis is not an ideal method for the study of transient phenomenon such as the known wintertime phase entrainment of the ENSO. Time-domain based methods for better for that IMO.

    Willard ≠ Willis. But typos aren't a sign of losing it, so don't be afraid.

    Anyway, I maintain that I haven't lost it because I never had it.

    ReplyDelete
  10. "Willard? Who's Willard?"
    More than you might expect.

    ReplyDelete
  11. I know Willard -- the Climate Ball guy.

    "Fourier analysis is not an ideal method for the study of transient phenomenon such as the known wintertime phase entrainment of the ENSO. Time-domain based methods for better for that IMO."

    The SOI of ENSO is an almost ideal dynamic sloshing mechanism nicely modeled as a periodic perturbation applied to the wave equation (the Mathieu equation). The wintertime phase entrainment is barely evident in contrast to the 6 to 6.5 year periodic forcing which create the peaks and valleys.

    ReplyDelete
  12. Nick says that with Willis that "The indignation is a problem."


    I ventured over to WUWT to stake a claim on what Willis wrote recently:

    http://wattsupwiththat.com/2014/05/08/cycling-in-central-england/#comment-1632760
    "
    "BTW… Peeking at the code, it looks like Willis is fitting a sine wave using linear regression. Kewl ! "

    I was quite proud when I dreamed that one up. Before that I was optimizing a sine wave, a very slow process. Instead, I just created a sine wave and a cosine wave, and used linear regression to give the optimum results using the two waves as the independent variable and the data as the dependent variable. Then I could take the peak-to-peak amplitude of the resulting fitted sine wave.
    "

    So I responded with " Not too original, I am afraid. " and a description of how my CSALT model works.

    And of course, Willis responded with:

    "
    Oh, piss off, you nasty little man. Your jealousy is overwhelming your good sense. I came up with the idea myself, and I was proud of it. So sue me. Was I the first man to come up with the idea? Of course not … but I did come up with it independently myself. You are great at trying to tear down something someone else has built, but you never seem to build anything yourself … funny how that works.

    w.
    "

    Willis tends to do that. He latches on to an idea and claims it is his own while claiming that he is self-taught. The idea of applying the quenouille significance measure is something that you have been doing Nick, and I am certain Willis picked it up from your discussions. Funny to watch that behavior in the WUWT thread.



    ReplyDelete