Wednesday, November 16, 2011

A picture of statistically significant warming.

This is the third in the series of plots showing color maps of all possible trends that can be derived from a dataset. The first post was designed to show how noisy short-term trends were, and how you could pick almost any color, representing a trend value, and find a period where it applied. But with some Javascript enhancement, it's also a good way of visualizing trends on a graph.

The scientific damper to choice of trends in a noisy signal is the significance test. So I've adapted the figures to show significance. I'm using the device of transparency - the colors just fade away as significance is lost. There is a small change at 99%, a big drop at 95% and a small further fade at 90%. The small changes are hard to see. The test is whether the trend is significantly different from zero. Colors fade when either the period is short or the estimated trend is in fact close to zero.

The data here are monthly temperature anomalies, so there is correlation, which affects significance. I've used the Quenouille correction for loss of dof. It gives results very close to AR(1) modelling. I'll give details.

I have included two new series - the NOAA land only index and the HADSST2 sea surface temperature. You can choose the series and time intervals by using the radio buttons on the right. I have redesigned the plot to make full use of the screen space. Because it overwrites the sidebars, I'll keep it below the jump.

Update.Sometimes the pictures don't appear - it seems to depend on how you get to the page. I've found that going to the home page and then clicking on the "read more" always seems to work. I'll try to find the reasons.Seems better now,

So here is the plot. On the left, each small colored square represents a temperature, and the legend gives the center of each color range. I'm using more colors to give a more continuous gradation. But again, the colors have a gray band near zero and a brown band at about 1.7 C/century, a roughly typical figure for end 20C. If in doubt, just click a region - the numerical value will appear on the right, and the blue and red balls will jump to mark the endpoints of the trend line.

Land and Ocean
Land Only
Sea Surface

You can also control the range from the graph at the right. You can click on the red and blue lines to move the corresponding balls to that position. There are also the nudge controls; the red and blue ones control the corresponding balls. The further from the center you click, the bigger the jumps.

I have added two other nudgers. The reason is that I found it quite informative to make a kind of movie by moving the range along by clicking. The purple nudger, top right, moves both balls keeping the separation constant, so you can see, say, how a 10-year trend varies. The gold one moves them apart, but keeps the mean constant. This lets you see if there is any kind of derivative at a point that makes sense.

I'll discuss in detail the color plot for Hadcrut3 for the period from 1989. The reason is the famous question that was asked of Phil Jones as to whether there had been significant warming since 1995. Presumably someone had worked out that that was about as far back as you could go before warming became significant. Anyway, PJ said no, and of course there was then much chatter of "no warming since 1995". But then later he said that it had become significant, and was criticised for changing his mind.

But you can see from the plot here what is happening. If you follow up the right vertical axis (now), the numbers mark the period of trend. Start 1995 is now nearly 17 years ago, and that is indeed the boundary of significance relative to zero trend. You can follow the white diagonal line to see other recent periods with 16 years of trend. They are all significant, generally with a greater warming trend. So "since 1995" is borderline significant partly because it is a fairly short period, but also because among such periods, the estimated slope, though positive and only a little below the recent average (brown, was small enough to tip the balance.

Conversely, there are periods even on the four year boundary where the estimated trend was different enough from zero to be significant. These tend to associate with unusual years. 2003 and 2005 were warm, so there is a period where short-term warming was significant, while 2008-9 was cold, and significant cooling could be observed. There is also a patch of significant cooling bottom left about the time of Pinatubo.

Calculating significance Just a word on how the significance was calculated. Calculation time is important. There are about 5 million dots on these plots, and each one represents a regression over up to a thousand data points. So you can't just use a standard regression package (unless you are blest with much greater patience than I am). Fortunately you can do simple regressions by just working on a few vectors of cumulated sums, so instead of a sum of many you work with just a few differences of two. But that complicates the programming.

In a simple regression the model is:
and if you have a vector y of T observations, J is a vector of times and I is a vector of 1's, then the least squares solution of y=aJ+bI is needed,
or if X is a nx2 matrix (J,I)
, then

In fact, S=(X*X)-1σ2 is the covariance matrix of (a,b), where σ is the variance of the residuals d=(y-aJ-bI).

σ2 is estimated by s2=(d*d)/(n-1), so the standard error of the trend a becomes sqrt(S11s2).

That assumes the residuals are independent, ie not correlated. But for monthly anomalies this is not generally true. Effectively the se is larger, and the Quenouille correction is to multiply the se by sqrt((1+ρ)/(1-ρ)) where the correlation ρ = Σ (dtdt-1)/(n-2)/s2.

The numbers of months (dof) are large enough (min 48) that the t-distribution reduces to normal reduces to normal, so I used the conventional 2-sided se measures (abs(trend)/se>1.96 = 95% etc).


  1. So I'm guessing Tamino would say you've underestimated the autocorrelation by using an AR1 model. See: 1, 2, 3, 4.

    However, I don't know how to resolve Tamino's results with Kelly's computation of the acf, which shows significant negative correlation around 25 months: here

    I tried two ad-hoc methods: One involved globbing months into groups and calculating the trend on the globs, increase the glob size until the uncertainty stabilises. It gave roughly the same results, but was a bit unpredictable, possibly because of the anticorrelation issue.

    Kevin C

  2. Thanks for those pointers, Kevin. I'm reassured that T agrees that it's the standard approach, and I think it's mainly what he used for the BEST analysis, though he did try ARMA(1,1) as well.

    Kelly's post is interesting, as it looks at some of the same data. I'm not sure what to make of the negative correlation at long lags, but I don't think it bears much on the real problem, which is the loss of d.o.f. Basically, is there a good predictor based on AR(1)? Apparently so, and therefore the data has less information than we thought.

    I have to be conscious of computational feasibility. This series of plots involves about 5 million regressions, each of a few hundred points. So the fact that there is an efficient way of doing the standard approach is important.

    I'd also say that here I'm trying to show the pattern of significance, and I think that would not change much with further refinement.

  3. Really neat!

    Goddard now knows where to go for all of his all too many cherry picks. Just kidding.

    As to the part where you say;

    "I'm using the device of transparency - the colors just fade away as significance is lost. There is a small change at 99%, a big drop at 95% and a small further fade at 90%. The small changes are hard to see. The test is whether the trend is significantly different from zero. Colors fade when either the period is short or the estimated trend is in fact close to zero."

    Would it also be possible to show the R^2 value? As to the color fading, or the confidence value (e. g. 99%, 95%, 90%)?

    Or conversely, what does the range (From X to Y) shown on the right mean in terms of % significance?

    I have not read the entire post, so I may have missed something.

    As to the method of testing for significance, keep on, keeping on (It's a wee bit above my pay grade).

  4. EFS,
    The range is just a guide to help follow how many years are in the regression. If you follow the faint diagonal white lines, you're looking at regressions over a fixed period - the right axis marks tell you how long.

    I'm not sure that more info can be put into this style of pic, but I can do separate color plots of significance levels, and maybe r2.

  5. I think the computationally efficient approach to ARMA(1,1) estimation is to estimate r0 = c(t+1) and r1 = c(t+2)/c(t+1)
    The correlations for ARMA(1,1) are then
    1, r0, r0*r1, r0*r1^2, r0*r1^3...
    whereas for AR(1) they are
    1, r0, r0^2, r0^3...

    You can use the sum of the series and the eqn from the start of Tamino's Hurst post to calculate Neff. For AR1, this gives the Quenouille correction. For ARMA, N/Neff will be 1+2r0/(1-r1)

    I can't find a source for that equation, although it is clearly right for the theoretical cases I throw at it.

    Should the autocorrelations be calculated from the data itself, or from a longer (possibly out-of-sample) period? The same question applies to R2. If both come from a reference period, the uncertainty of the gradient depends only on the length of the period, so no calculation is required.

    Kevin C

  6. Kevin,
    I think the Quenouille correction is just the row-sums of the power-law correlation matrix. So your ARMA(1,1) formula would follow in the same way
    ... r1*r0,r0,1,r0,r0*r1....

    I'm not sure how much it is worth using the Quenouille approach for higher order. In FEM terms it's a lumped mass approx, and the limits of that may eventually negate any high order improvements.

  7. I love the elegant graphics, but I think they tell us more about the limitations of statistical analysis of trends than they do about the global temperature record. Some examples:

    1) The green/brown area representing rapid warming in the late 20th century is statistically different from zero (no trend). The more interesting question is whether this green/brown patch is statistically different from the blue areas that dominated first 70 years of the 20th century. Saying that there is a statistically significant trend of 1.7 degC/century for one time period and 0.7 degC for an earlier time period, doesn't tell us that the rate of warming has increased (due to GHG's). I think the proper way to frame this question is to analyze the DIFFERENCE between the trends; what is the confidence interval for the difference in trends and what is the likelihood that the difference could be =<0? The adjacent color changes on your triangle don't represent statistically significant changes in the rate of warming, but they certainly give the illusion of important change.

    2) Given that the trend that was observed from 1975 to 2000 agrees well with the predictions of climate models, what is the likelihood that the current "pause"/low trend could be due to chance or natural variation? To what extent does this "pause" suggest that we should change our interpretation about the previous agreement between models and observation.

    3) At the 95% confidence level, 5% of the trends that one calls statistically significant are present only by chance. When one looks at a large number of statistical analyses at the same time (like this triangle) and then focuses their attention on a subset, one runs a substantial risk of cherry-picking. There are certainly short periods in the record where natural variability/noise was unusually low (or better-behaved) and we will calculate narrower ci's for those trends simply BY CHANCE. When the trend is high during those chance periods, it will be "statistically significant"; but when the trend is low, it won't be.

    4) The whole idea of doing a linear fit to the temperature record is highly suspect. One can fit any function (power series, logarithmic, trig, etc) one desires to any data set with an independent and dependent variable and hunt for a function with the best fit to the data - there is nothing special about a straight line except for simplicity. An n-th degree polynomial will fit n data points perfectly. The parameters one obtains from fitting any arbitrary function are usually meaningless - unless there is a theoretical cause/effect reason why that particular function is appropriate. The trend in global temperature has no important scientific meaning unless there is a good reason for temperature to vary linearly with time. However, theory says that temperature increases in direct proportion to radiative forcing and we call the constant of proportionality the climate sensitivity. For the most part*, time plays a role in temperature change only in the lag between transient and equilibrium climate sensitivity. Otherwise, time itself doesn't cause ANY change in temperature.

    * ENSO obviously plays a role in global temperature; but, since the period of the oscillation is irregular, it is usually treated as autocorrelated noise. There are other ways temperature actually does or might change with time: the annual peak in global temperature (which disappears when we calculate anomalies) that occurs two months after the perihelion, the 11-year solar cycle, Milankovitch cycles, and possibly 60-year cycles with obscure origins and unknown regularity (which are too long to be properly detected in the instrumental temperature record). Too often we forget that multiple CAUSES of global warming are changing with time and causation becomes noise in a linear trend.


  8. Frank,
    I agree that one should look at whether trends are significantly different from trends other than zero - there's nothing special about zero. It's what people talk about, though.

    On 2, there is a brief period on Hadcrut, and to some extent on GISS and others, associated with the 2008 chill which is a significant negative trend. Of course that would be very significantly below the model trend. That doesn't necessarily mean that warming has paused or whatever; just that something unusual happened.

    There's a nig danger of post hoc reasoning here. If you look at just one instance at random and it turns out to be significant. But if you scan the plot looking for "significance" then even if it is quite random, you'll find some. But I think it's very hard to quantify that effect.

    There's another oddity here, which has cropped up in duscussions at Lucia's nlog, where she was keen on testing trend differences for significance. You get a cold spell, and find trends that are "significant". Then it warms up, and they no lomger show as significant. But you'd think that significance can't go away.

    That's another effect of the post hoc reasoning.

    On linear fit, I think it is less arbitrary than you say. In a way a trend is just an estimate of the derivative in the presence of noise. That's significant if you believe there is an underlying function. You can also see it as the limit of low frequency filtering - fitting a sinusoid of very long period.

    I agree that the distinction between true noise and unidentified recurrent processes is unsatisfactorily resolved.

  9. Nick wrote: "I agree that one should look at whether trends are significantly different from trends other than zero - there's nothing special about zero. It's what people talk about, though."

    Why is the (in)significance of the linear trend what people talk about? You've provided a tool that anyone can use to visually cherry-pick "significant" trends or pauses, but cherry-picking invalidates the statistical analysis. The brief (meaningless) 2008 chill is a perfect example.

    Statistical significance certainly can "disappear" and "re-appear". Create a large artificial population of boy's heights and girl's heights whose mean differ by the standard deviation of each population: Boy's 61+/-2 inches; girls 59+/-2 inches. Now try taking several samples of various sizes from these artificial known populations and determine (two-tailed T-test) if there is a statistical insignificant chance (<5%) that the difference in means could be =<0, so you can conclude that boys are taller than girls. Some experiments comparing 16 boys to 16 girls will show a statistically "significant" difference and some won't. Now define taller as at least one inch taller and repeat the process. In some areas of science, we can simply increase the size of the test group until we reach a satisfying conclusion. Unfortunately, the data set for climate science is limited.

    "The trend is just an estimate of the derivative in the presence of noise." Who says the derivative must be constant? Only those unwilling to think about other functional relationships. Time doesn't cause temperature change, something else changing with time does


  10. Frank,
    I agree that the tool would facilitate cherry-picking, and I am uneasy about that. The counter is that people cherry-pick anyway, and the tool shows what is being done. That's why I thought the analysis of the Jones 1995 issue was useful.

    I also think the tool goes some way toward a legitimate use of significance, since it is so hard to shake off post hoc fallacies. You can see, by eye, the actual area of significance and insignificance. That a sort of measure of how likely a random choice would be to come up with a significant result. It's not a complete answer - obviously duration, for example, is a factor one has to think about. But it may help.

    My comment about significance related to a particular fixed proposition. In your case, if you have established that boys are "significantly" taller than girls (in some population), then that conclusion should be robust to further testing. It should be a statement that you believe to be true about the real world.

    In climate, if with a model in mind (say underlying trend) you say there is significant warming, then if you later find it isn't significant, that really means you need to change your model. The earlier significance statement was falsified. Maybe the trend changed.

    That's where the derivative idea comes in. I think the model people have in mind when regressing and talking about significance has two aspects:
    1. There is an underlying function, varying smoothly with time, that determines temperature, subject to added noise
    2. That function varies slowly enough in time that a low-order polynomial approx will work.

    In that case, regression is the way to estimate the underlying gradient, using different intervals to account for curvature. Of course, the ability to account for higher derivatives is severely limited by noise, and may be vary small, in which case the best estimator of derivative would be full-period trend.

    I made that nudger that fixes the centre point of the range as a way of testing whether the estimate of derivative at a point shows approx invariance with range.

  11. I have trouble seeing the transparency differences associated with different degrees of confidence. Each color in you key could be shown with different confidence intervals.

    Your triangle plot shows only trends that are statistically different from zero. If the computation is reasonable, why not give the user the option of seeing periods when trend is statistically different from a given trend (which has no inherent uncertainty) or the trend from a given period (which has inherent uncertainty). I could pick the 1975-2000 trend of strong warming and see what periods are different, such as 2000-2011. (It might be useful to adjust the confidence interval to suit the question. I hate the IPCC's use of "likely", but one might want to ask if it is likely that 1975-2000 and 2000-2011 trends are "likely" different.)

    My comments about significance relate to conclusions about the whole population (or "true situation") that can be drawn from particular experiments or sets of observations. The idea that we choose a preset required degree of confidence before doing an experiment and drawing a conclusion is excellent for experiments where we can reduce the confidence interval by repeating the experiment or improving the precision of data collection. We often don't have this luxury in climate science. You say that a "conclusion" should be a "statement of what [one] believes to be true about the real world". For me (personally and heretically), there is nothing magical about any particular p value. For example, how likely is it that the difference between 1975-2000 and 2000-2100 trends could be zero (or negative)? How likely is it that the difference could be substantial, 50% of the 1975-2000 trend (or more)? Conclusions don't suddenly spring into existence when a certain threshold is crossed; conclusions include uncertainty. Different data sets allow us to conclude that boys are taller than girls with different degrees of confidence. If a statistical significant conclusion is defined as p<5%, it comes and goes with the experiment. Heresy? Perhaps. Better interpretation? Perhaps.


  12. Frank,
    I share your heresy there. I was hoping to show grades of confidence, but I found that the transparency, while appropriate, is a blunt device; a small change is not noticed.

    The reason for not immediately showing different null trend levels is not so much computing time, but images to handle - if I do all data sets, there are 44 more images per trend level. I am planning another post (real soon), and I've been working on another approach, where I just color the triangle according to the ratio trend/sd. So a value of 2 is significantly different from zero at 95% etc. It shows the continuum of p-values (with that transform), and also gives an idea of significant difference between other values - a value of 3 is to an approx significantly higher than a value of 1. Then I just need a scheme for making it easy to see what trend a value of 1 corresponds to.

  13. Hiya, I'm having a discussion ( with someone who claims that HadSST2 shows a falling trend since 2004. Are you able to tell me the confidence level at which this becomes significant? I'm hoping you have this data at your finger tips as I'm really just being too lazy to work it out for myself. Thanks in advance for any help.

  14. Nick,
    Yes, the gadget in this post can help here. It shows significant trends - HADSST2 is the last button you can click, and the period 1999-2011 is right for your query. If you look along the right axis, none of the trends are significant. If you slide down to about where it says 8 years (duration) and click, it tells you the trend for specific periods. For example, Jan 2004 to N0v 2011 -0.509 °C/century. But not significant.

    You can also look at this different way of looking at it. If you look at the second plot down, again near the right axis where it marks 8 years (level with 2004 on the left), the color tells you how negative the trend would have to be to be significant. It looks to me like about -3°C. With short periods, it takes a steep slope to be significant.

    It's a pity that second plot doesn't tell the color value on clicking - I'll see if that can be done.

  15. Nick, I got that second interpretation muddled. In fact you should look at the top plot. Around the 2004-2011 point, the color is yellow. This tells you that you can say that the trend is significantly less than about 2.73°C. So it missis significance at 0C by a long way.

  16. That is great, much appreciated. You have created a good resource here, thanks a lot!