Comments on moyhu: A picture of statistically significant warming.

That is great, much appreciated. You have created ...

2012-02-04T19:38:28.008+11:00

That is great, much appreciated. You have created a good resource here, thanks a lot!

Nick, I got that second interpretation muddled. In...

2012-02-03T23:33:00.992+11:00

Nick, I got that second interpretation muddled. In fact you should look at the top plot. Around the 2004-2011 point, the color is yellow. This tells you that you can say that the trend is significantly less than about 2.73°C. So it missis significance at 0C by a long way.

Nick, Yes, the gadget in this post can help here. ...

2012-02-03T23:28:08.723+11:00

Nick,
Yes, the gadget in this post can help here. It shows significant trends - HADSST2 is the last button you can click, and the period 1999-2011 is right for your query. If you look along the right axis, none of the trends are significant. If you slide down to about where it says 8 years (duration) and click, it tells you the trend for specific periods. For example, Jan 2004 to N0v 2011 -0.509 °C/century. But not significant.

You can also look at this different way of looking at it. If you look at the second plot down, again near the right axis where it marks 8 years (level with 2004 on the left), the color tells you how negative the trend would have to be to be significant. It looks to me like about -3°C. With short periods, it takes a steep slope to be significant.

It's a pity that second plot doesn't tell the color value on clicking - I'll see if that can be done.

Hiya, I'm having a discussion (http://www.clim...

2012-02-03T20:05:04.506+11:00

Hiya, I'm having a discussion (http://www.climateconversation.wordshine.co.nz/2012/01/sceptics-query-our-truth-we-shall-besmirch-and-slander-them/#comment-79080) with someone who claims that HadSST2 shows a falling trend since 2004. Are you able to tell me the confidence level at which this becomes significant? I'm hoping you have this data at your finger tips as I'm really just being too lazy to work it out for myself. Thanks in advance for any help.

Frank, I share your heresy there. I was hoping to ...

2011-12-12T11:50:54.333+11:00

Frank,
I share your heresy there. I was hoping to show grades of confidence, but I found that the transparency, while appropriate, is a blunt device; a small change is not noticed.

The reason for not immediately showing different null trend levels is not so much computing time, but images to handle - if I do all data sets, there are 44 more images per trend level. I am planning another post (real soon), and I've been working on another approach, where I just color the triangle according to the ratio trend/sd. So a value of 2 is significantly different from zero at 95% etc. It shows the continuum of p-values (with that transform), and also gives an idea of significant difference between other values - a value of 3 is to an approx significantly higher than a value of 1. Then I just need a scheme for making it easy to see what trend a value of 1 corresponds to.

I have trouble seeing the transparency differences...

2011-12-12T10:20:10.023+11:00

I have trouble seeing the transparency differences associated with different degrees of confidence. Each color in you key could be shown with different confidence intervals.

Your triangle plot shows only trends that are statistically different from zero. If the computation is reasonable, why not give the user the option of seeing periods when trend is statistically different from a given trend (which has no inherent uncertainty) or the trend from a given period (which has inherent uncertainty). I could pick the 1975-2000 trend of strong warming and see what periods are different, such as 2000-2011. (It might be useful to adjust the confidence interval to suit the question. I hate the IPCC's use of "likely", but one might want to ask if it is likely that 1975-2000 and 2000-2011 trends are "likely" different.)

My comments about significance relate to conclusions about the whole population (or "true situation") that can be drawn from particular experiments or sets of observations. The idea that we choose a preset required degree of confidence before doing an experiment and drawing a conclusion is excellent for experiments where we can reduce the confidence interval by repeating the experiment or improving the precision of data collection. We often don't have this luxury in climate science. You say that a "conclusion" should be a "statement of what [one] believes to be true about the real world". For me (personally and heretically), there is nothing magical about any particular p value. For example, how likely is it that the difference between 1975-2000 and 2000-2100 trends could be zero (or negative)? How likely is it that the difference could be substantial, 50% of the 1975-2000 trend (or more)? Conclusions don't suddenly spring into existence when a certain threshold is crossed; conclusions include uncertainty. Different data sets allow us to conclude that boys are taller than girls with different degrees of confidence. If a statistical significant conclusion is defined as p<5%, it comes and goes with the experiment. Heresy? Perhaps. Better interpretation? Perhaps.

Frank

Frank, I agree that the tool would facilitate cher...

2011-12-10T13:22:24.908+11:00

Frank,
I agree that the tool would facilitate cherry-picking, and I am uneasy about that. The counter is that people cherry-pick anyway, and the tool shows what is being done. That's why I thought the analysis of the Jones 1995 issue was useful.

I also think the tool goes some way toward a legitimate use of significance, since it is so hard to shake off post hoc fallacies. You can see, by eye, the actual area of significance and insignificance. That a sort of measure of how likely a random choice would be to come up with a significant result. It's not a complete answer - obviously duration, for example, is a factor one has to think about. But it may help.

My comment about significance related to a particular fixed proposition. In your case, if you have established that boys are "significantly" taller than girls (in some population), then that conclusion should be robust to further testing. It should be a statement that you believe to be true about the real world.

In climate, if with a model in mind (say underlying trend) you say there is significant warming, then if you later find it isn't significant, that really means you need to change your model. The earlier significance statement was falsified. Maybe the trend changed.

That's where the derivative idea comes in. I think the model people have in mind when regressing and talking about significance has two aspects:
1. There is an underlying function, varying smoothly with time, that determines temperature, subject to added noise
2. That function varies slowly enough in time that a low-order polynomial approx will work.

In that case, regression is the way to estimate the underlying gradient, using different intervals to account for curvature. Of course, the ability to account for higher derivatives is severely limited by noise, and may be vary small, in which case the best estimator of derivative would be full-period trend.

I made that nudger that fixes the centre point of the range as a way of testing whether the estimate of derivative at a point shows approx invariance with range.

Nick wrote: "I agree that one should look at...

2011-12-10T08:17:26.339+11:00

Nick wrote: "I agree that one should look at whether trends are significantly different from trends other than zero - there's nothing special about zero. It's what people talk about, though."

Why is the (in)significance of the linear trend what people talk about? You've provided a tool that anyone can use to visually cherry-pick "significant" trends or pauses, but cherry-picking invalidates the statistical analysis. The brief (meaningless) 2008 chill is a perfect example.

Statistical significance certainly can "disappear" and "re-appear". Create a large artificial population of boy's heights and girl's heights whose mean differ by the standard deviation of each population: Boy's 61+/-2 inches; girls 59+/-2 inches. Now try taking several samples of various sizes from these artificial known populations and determine (two-tailed T-test) if there is a statistical insignificant chance (<5%) that the difference in means could be =<0, so you can conclude that boys are taller than girls. Some experiments comparing 16 boys to 16 girls will show a statistically "significant" difference and some won't. Now define taller as at least one inch taller and repeat the process. In some areas of science, we can simply increase the size of the test group until we reach a satisfying conclusion. Unfortunately, the data set for climate science is limited.

"The trend is just an estimate of the derivative in the presence of noise." Who says the derivative must be constant? Only those unwilling to think about other functional relationships. Time doesn't cause temperature change, something else changing with time does

Frank

Frank, I agree that one should look at whether tre...

2011-12-03T20:54:59.430+11:00

Frank,
I agree that one should look at whether trends are significantly different from trends other than zero - there's nothing special about zero. It's what people talk about, though.

On 2, there is a brief period on Hadcrut, and to some extent on GISS and others, associated with the 2008 chill which is a significant negative trend. Of course that would be very significantly below the model trend. That doesn't necessarily mean that warming has paused or whatever; just that something unusual happened.

There's a nig danger of post hoc reasoning here. If you look at just one instance at random and it turns out to be significant. But if you scan the plot looking for "significance" then even if it is quite random, you'll find some. But I think it's very hard to quantify that effect.

There's another oddity here, which has cropped up in duscussions at Lucia's nlog, where she was keen on testing trend differences for significance. You get a cold spell, and find trends that are "significant". Then it warms up, and they no lomger show as significant. But you'd think that significance can't go away.

That's another effect of the post hoc reasoning.

On linear fit, I think it is less arbitrary than you say. In a way a trend is just an estimate of the derivative in the presence of noise. That's significant if you believe there is an underlying function. You can also see it as the limit of low frequency filtering - fitting a sinusoid of very long period.

I agree that the distinction between true noise and unidentified recurrent processes is unsatisfactorily resolved.

I love the elegant graphics, but I think they tell...

2011-12-03T09:15:52.051+11:00

I love the elegant graphics, but I think they tell us more about the limitations of statistical analysis of trends than they do about the global temperature record. Some examples:

1) The green/brown area representing rapid warming in the late 20th century is statistically different from zero (no trend). The more interesting question is whether this green/brown patch is statistically different from the blue areas that dominated first 70 years of the 20th century. Saying that there is a statistically significant trend of 1.7 degC/century for one time period and 0.7 degC for an earlier time period, doesn't tell us that the rate of warming has increased (due to GHG's). I think the proper way to frame this question is to analyze the DIFFERENCE between the trends; what is the confidence interval for the difference in trends and what is the likelihood that the difference could be =<0? The adjacent color changes on your triangle don't represent statistically significant changes in the rate of warming, but they certainly give the illusion of important change.

2) Given that the trend that was observed from 1975 to 2000 agrees well with the predictions of climate models, what is the likelihood that the current "pause"/low trend could be due to chance or natural variation? To what extent does this "pause" suggest that we should change our interpretation about the previous agreement between models and observation.

3) At the 95% confidence level, 5% of the trends that one calls statistically significant are present only by chance. When one looks at a large number of statistical analyses at the same time (like this triangle) and then focuses their attention on a subset, one runs a substantial risk of cherry-picking. There are certainly short periods in the record where natural variability/noise was unusually low (or better-behaved) and we will calculate narrower ci's for those trends simply BY CHANCE. When the trend is high during those chance periods, it will be "statistically significant"; but when the trend is low, it won't be.

4) The whole idea of doing a linear fit to the temperature record is highly suspect. One can fit any function (power series, logarithmic, trig, etc) one desires to any data set with an independent and dependent variable and hunt for a function with the best fit to the data - there is nothing special about a straight line except for simplicity. An n-th degree polynomial will fit n data points perfectly. The parameters one obtains from fitting any arbitrary function are usually meaningless - unless there is a theoretical cause/effect reason why that particular function is appropriate. The trend in global temperature has no important scientific meaning unless there is a good reason for temperature to vary linearly with time. However, theory says that temperature increases in direct proportion to radiative forcing and we call the constant of proportionality the climate sensitivity. For the most part*, time plays a role in temperature change only in the lag between transient and equilibrium climate sensitivity. Otherwise, time itself doesn't cause ANY change in temperature.

* ENSO obviously plays a role in global temperature; but, since the period of the oscillation is irregular, it is usually treated as autocorrelated noise. There are other ways temperature actually does or might change with time: the annual peak in global temperature (which disappears when we calculate anomalies) that occurs two months after the perihelion, the 11-year solar cycle, Milankovitch cycles, and possibly 60-year cycles with obscure origins and unknown regularity (which are too long to be properly detected in the instrumental temperature record). Too often we forget that multiple CAUSES of global warming are changing with time and causation becomes noise in a linear trend.

Frank

Kevin, I think the Quenouille correction is just t...

2011-11-18T05:05:46.890+11:00

Kevin,
I think the Quenouille correction is just the row-sums of the power-law correlation matrix. So your ARMA(1,1) formula would follow in the same way
... r1*r0,r0,1,r0,r0*r1....

I'm not sure how much it is worth using the Quenouille approach for higher order. In FEM terms it's a lumped mass approx, and the limits of that may eventually negate any high order improvements.

I think the computationally efficient approach to ...

2011-11-18T00:13:38.904+11:00

I think the computationally efficient approach to ARMA(1,1) estimation is to estimate r0 = c(t+1) and r1 = c(t+2)/c(t+1)
The correlations for ARMA(1,1) are then
1, r0, r0*r1, r0*r1^2, r0*r1^3...
whereas for AR(1) they are
1, r0, r0^2, r0^3...

You can use the sum of the series and the eqn from the start of Tamino's Hurst post to calculate Neff. For AR1, this gives the Quenouille correction. For ARMA, N/Neff will be 1+2r0/(1-r1)

I can't find a source for that equation, although it is clearly right for the theoretical cases I throw at it.

Should the autocorrelations be calculated from the data itself, or from a longer (possibly out-of-sample) period? The same question applies to R2. If both come from a reference period, the uncertainty of the gradient depends only on the length of the period, so no calculation is required.

Kevin C

EFS, The range is just a guide to help follow how ...

2011-11-17T07:55:59.540+11:00

EFS,
The range is just a guide to help follow how many years are in the regression. If you follow the faint diagonal white lines, you're looking at regressions over a fixed period - the right axis marks tell you how long.

I'm not sure that more info can be put into this style of pic, but I can do separate color plots of significance levels, and maybe r2.

Really neat! Goddard now knows where to go for al...

2011-11-17T07:38:08.915+11:00

Really neat!

Goddard now knows where to go for all of his all too many cherry picks. Just kidding.

As to the part where you say;

"I'm using the device of transparency - the colors just fade away as significance is lost. There is a small change at 99%, a big drop at 95% and a small further fade at 90%. The small changes are hard to see. The test is whether the trend is significantly different from zero. Colors fade when either the period is short or the estimated trend is in fact close to zero."

Would it also be possible to show the R^2 value? As to the color fading, or the confidence value (e. g. 99%, 95%, 90%)?

Or conversely, what does the range (From X to Y) shown on the right mean in terms of % significance?

I have not read the entire post, so I may have missed something.

As to the method of testing for significance, keep on, keeping on (It's a wee bit above my pay grade).

Thanks for those pointers, Kevin. I'm reassure...

2011-11-17T03:01:55.527+11:00

Thanks for those pointers, Kevin. I'm reassured that T agrees that it's the standard approach, and I think it's mainly what he used for the BEST analysis, though he did try ARMA(1,1) as well.

Kelly's post is interesting, as it looks at some of the same data. I'm not sure what to make of the negative correlation at long lags, but I don't think it bears much on the real problem, which is the loss of d.o.f. Basically, is there a good predictor based on AR(1)? Apparently so, and therefore the data has less information than we thought.

I have to be conscious of computational feasibility. This series of plots involves about 5 million regressions, each of a few hundred points. So the fact that there is an efficient way of doing the standard approach is important.

I'd also say that here I'm trying to show the pattern of significance, and I think that would not change much with further refinement.

So I'm guessing Tamino would say you've un...

2011-11-17T02:05:55.248+11:00

So I'm guessing Tamino would say you've underestimated the autocorrelation by using an AR1 model. See: 1, 2, 3, 4.

However, I don't know how to resolve Tamino's results with Kelly's computation of the acf, which shows significant negative correlation around 25 months: here

I tried two ad-hoc methods: One involved globbing months into groups and calculating the trend on the globs, increase the glob size until the uncertainty stabilises. It gave roughly the same results, but was a bit unpredictable, possibly because of the anticorrelation issue.

Kevin C