Thursday, March 5, 2015

Klotzbach revisited

Not a perfect title; it's actually my first comment on the 2009 GRL paper by Klotzbach, Pielke's, Christy et al. It was controversial at the time, but that was pre-Moyhu, or at least in very early days. And I hadn't paid it much attention. But it surfaced again today at Climate Etc, so I thought I should read it.

The paper is very lightweight (as contrarian papers can be). It argues that observed surface trends since 1979 actually exceed troposphere trends, as measured by the UAH and RSS indices, which CMIP etc modelling suggests that the troposphere should warm faster.

Now for global you can simply get those trends, and many more, with CIs from the Moyhu trend viewer. You might say, well, figuring out what the models said should be rated substantial. But they way oversimplified, were corrected at Real Climate (Gavin) and had to publish a corrigendum. There has been more discussion then and over the years. Here, for example, is a post at Climate Audit, with Gavin participating. But the audit didn't seem to pick up the CI issue, though other methods were discussed. Later a Klotzbach revisited WUWT post (my title echoes) two years ago; more on that from SKS here. And now another update.

But what no-one, AFAICS, has noticed is that the claims of statistical significance are just nuts. And significance is essential, because they have only one observation period. The claim originally, from the abstract, was:
"The differences between trends observed in the surface and lower-tropospheric satellite data sets are statistically significant in most comparisons, with much greater differences over land areas than over ocean areas."
I've noticed that the authors are quieter on this recently, and it may be that someone has noticed. But without statistical significance, the claims are meaningless.

Update: I think that the CI's they are quoting may relate to a different calculation. They computed the trends in Table 1, with CI's, and in Table 2 the differences. They say in the abstract that these are differences of trends, but the heading of Table 2, which is not very clear, could mean that they are computing the trends of the differences (a new regression) and giving CI's for that. That is actually a reasonable thing to do, but they should make it clear. I have got reasonably close to their numbers for comparisons with UAH, but not with RSS; it may be that the RSS data has changed significantly since 2009.

I'll describe this in more detail below the jump.

Here is their Table 1 of trends in C/decade with 95% CI's
Table 1. Global, Land, and Ocean Per Decade Temperature Trends and Ratios Over the Period From 1979 to 2008
Data Set         Global Tren         Land              Ocean Trend
NCDC Surface     0.16 [0.12-0.20] 0.31 [0.23-0.39] 0.11 [0.07-0.15]
Hadley Surface   0.16 [0.12-0.21] 0.22 [0.17-0.28] 0.14 [0.08-0.19]
UAH Lower Trop   0.13 [0.06-0.19] 0.16 [0.08-0.25] 0.11 [0.04-0.17]
RSS Lower Trop   0.17 [0.10-0.23] 0.20 [0.12-0.29] 0.13 [0.08-0.19]

My own calcs (to 2014) gave CI's comparable to these.

I've been commenting at CE here and here, and I extracted σ values here
NCDC Surface   0.16 [0.12-0.20]  0.04
Hadley Surface 0.16 [0.12-0.21]  0.04
UAH Lower Trop 0.13 [0.06-0.19]  0.06
RSS Lower Trop 0.17 [0.10-0.23]  0.06

 But you don't need them to see that the results in Table 1 are very unlikely to be significant. Virtually all the trends lie within the CIs of the sat indices. Consider a comparison of NCDC 0.16 and UAH 0.13 [0.06-0.19]. There is no way NCDC is inconsistent with the range of UAH, even if it did not have error of its own.

Anyway, what Klotzbach et al did was to show a table of differences with CIs:

Table 2. Global, Land, and Ocean Per Decade Temperature Trends Over the Period From 1979 to 2008 for the NCDC Surface Analysis
Minus UAH Lower Troposphere Analysis and the Hadley Centre Surface Analysis Minus RSS Lower Troposphere Analysis
Data Set               Global Trend (C)    Land Trend (C)     Ocean Trend (C)
NCDC minus UAH           0.04 [0.00- 0.08]  0.15 [0.08- 0.21]    0.00 [-0.04-0.05]
NCDC minus RSS           0.00 [-0.04- 0.04] 0.11 [0.07- 0.15]   -0.02 [-0.07- 0.02]
Hadley Center minus UAH  0.03 [0.00- 0.07]  0.06 [0.02- 0.10]   0.03 [-0.01-0.07]
Hadley Center minus RSS  -0.01 [-0.04- 0.03] 0.02 [-0.02- 0.06] 0.00 [-0.04-0.04]
Trends that are statistically significant at the 95% level are bold; 95% confidence intervals are given in brackets.

So compare, say, Had-UAH global: 0.03 [0.00- 0.07]
But Had was:
Hadley Surface 0.16 [0.12-0.21]
and UAH
UAH Lower Trop 0.13 [0.06-0.19]  0.06

The CI for the difference has about half the range as for RSS alone. This is reflected throughout the table.  But the normal method requires that Ïƒ's  be added in quadrature. The range for the difference must be larger than for each of the operands.

Now it might be possible to construct an argument that dependence would make a lower CI for the difference. But there isn't much data to resolve dependence as well. And this is basically a test for dependence. You can't start off by assuming it. In any case, the paper doesn't say anything about how the difference CI's were calculated. And I have no idea.

Table 3 shows the differences between amplified (x 1.2) surface vs troposphere. It is little different, and again the CI's are far too narrow. Yet that is where the main claim of significance is based. I won't reproduce here; it is muddied by the changes made in the corrigendum. But they do nothing to repair the situation.

I do not believe any of these trend differences are significant.
Update - see above. I think that interpreted as the CI's of a regression on the differences, the original significance claims may be justified.

Update. A commenter at Climate Etc challenged me to calculate a corrected Table 2. He also wanted Table 3, but it's a bit late for that. Here is Table 2. I've shown under each original line, my variance-added numbers. I've worked with rounded data, so there are rounding discrepancies.

Data Set               Global Trend (C)    Land Trend (C)     Ocean Trend (C)
NCDC minus UAH           0.04 [0.00- 0.08]  0.15 [0.08- 0.21]    0.00 [-0.04-0.05]
                          0.03 -0.05 0.11   0.15  0.03 0.27     0.00 -0.08 0.08
NCDC minus RSS           0.00 [-0.04- 0.04] 0.11 [0.07- 0.15]   -0.02 [-0.07- 0.02]
                         -0.01 -0.09 0.07   0.11 -0.01 0.23    -0.02 -0.09 0.05
Hadley Center minus UAH  0.03 [0.00- 0.07]  0.06 [0.02- 0.10]   0.03 [-0.01-0.07]
                         0.03 -0.05 0.11    0.06 -0.04 0.16     0.03 -0.06 0.12
Hadley Center minus RSS  -0.01 [-0.04- 0.03] 0.02 [-0.02- 0.06] 0.00 [-0.04-0.04]
                         -0.01 -0.09 0.07    0.02 -0.08 0.12    0.01 -0.07 0.09
As you can see, there is now only one case that is barely significant - NCDC-UAH on land. But that is at 95% - we could expect it 1 in 20 times. And here are 12 tests.


  1. > So compare, say, Had-RSS global: 0.03 [0.00- 0.07]

    That's Had-UAH, no?

  2. The issue of low CIs of the differences was raised at the time. and following

    So if you look at the trend of the difference series (rather than compute the difference of trends), you do get much lower CIs. This may make intuitive sense given the series are not truly independent (certainly they do share a lot of the same inter-annual vriability).

    1. Thanks for the link to James comments by the way.

  3. Having said that, the amplification ratios they ended up with are way too high. 1.1 over land and 1.6 over ocean implies an overall amplification ~1.45, not 1.2.

    Also I'm not sure autocorrelation was accounted for in trend calculation. And it would be interesting to apply the same difference test within surface (Had-GISS) and satellite (RSS-UAH) series.

    1. Deep,
      Thanks also for the links - interesting. I see you also made the observation about the CI of the differences being less than the CI of the components.

      They say they did account for the autocorrelation by "Santer's method" which is Quenouille adjustment.

  4. deepclimate: So if you look at the trend of the difference series (rather than compute the difference of trends), you do get much lower CIs. This may make intuitive sense given the series are not truly independent (certainly they do share a lot of the same inter-annual vriability).

    This is what I thought too. Nick may have done the very thing he criticized Gaven for by not correcting the CI of the difference of series for the correlation between the two signals.

    1. Carrick,
      I'll just note here what I said at CE and in an update above. I don't think they allowed for correlation; there is no indication that they calculated that. Instead, I think they did a new regression with the differences and used the CI's for that. A reasonable thing to do, but it's not testing the difference between trends, as said in the abstract.

    2. Thanks Nick. For what it's worth,I agree with your interpretation here.

  5. That said, my feeling is the systematic error in the satellite series is much worse than what is needed to explore a relatively small difference in tend such as this.

    Heck, UAH and RSS don't even have the same sign for trend from say 1998 to 2014. In fact, UAH is among the highest of the trends over this period, while RSS is the smallest and the only negative trend

    SERIES TREND (°C/decade)
    GISTEMP 0.077
    UAH 0.071
    HadCRUT4 0.059
    NCDC 0.042
    RSS -0.049

    The other point, which I think Nick was also trying too make, is you need to look at more than one interval. Otherwise you don't know how robust this result is (which I would guess is "not very").

  6. Nick, do I understand the above post right that you test whether the tropospheric trend is above the surface temperature trend? That is not the right test, as you write in the beginning, the tropospheric trend is expected to be stronger than the surface trend. The famous amplification ratios, which were computed wrong in Klotzbach et al. and then in the Corrigendum still had different values from the ones given to them by Gavin Schmidt. Both Klotzbach Corrigendum and Schmidt used the GISS climate model, thus the values should have been the same. If you do not get the original values of the climate modeller that gave you the data you should explain in detail how you computed such values and why your method is more accurate. However, the Corrigendum just gives some numbers and no explanation whatsoever. That was the moment I decided that this "study" was just another blog post of WUWT quality.

    The confidence interval for the trends should also take into account how well non-climatic changes can be removed. Non-climatic changes are paramount for errors in long-term trends. The trend error is a lot larger than just the error of least squares regression.

    Thorne et al. have done a careful analysis of the trend errors in radiosonde upper air data and come to the conclusion that the conclusion that the trend differences are too small to be significant.

    Does anyone know of trend error estimates for the microwave satellite tropospheric temperatures? I image that they are huge, given how much this data has been adjusted already. One of the first versions even had a downward trend, now after removing some of the non-climatic changes they have a trend that is similar to the surface trend. I guess it is almost unknowable how much more errors there are still in such a dataset and they way they remove non-climatic changes means that you have to know the reason to be able to correct it. (Contrary to statistical homogenization used for station data where you compare with neighboring stations and remove non-climatic changes independent of their reason.)

    1. Victor,
      "Nick, do I understand the above post right that you test whether the tropospheric trend is above the surface temperature trend? That is not the right test, as you write in the beginning, the tropospheric trend is expected to be stronger than the surface trend."

      I'm checking their Table 2, which has those differences, with CIs. They do check the trop against upscaled surface in Table 3. However, the waters there are muddy, because they used a wrong amplification factor, and published a corrigendum (without CI's). And there they did not give a global value.

    2. That makes sense.

      Clearly Klotzbach et al. is a blog post of the worst kind. Pretty amazing when the key number of your "article" is just hearsay of an economist and you never looked at the data itself. Strange that they would put such an "article" in the limelight again. And was it still okay in 2009 to present results from just one climate model without need?

  7. I really liked this comment from Zeke. Obviously he was saying similar things to what I said above, only better, and included the reference to Zou (which I keep forgetting about):

    I do agree with you that there is a noteworthy difference between surface and tropospheric trends. However, I see enough uncertainty in satellite reconstructions (e.g. the divergence between UAH and RSS over the last 15 years, work like Zou et al that result in satellite-based records much closer to surface records, and the history of large adjustments to satellite records in general) that I’m reluctant to assume that the satellite record is the correct one. Thats not to say that the surface record is necessarily correct; only that the error bars of both are large enough to make it difficult to determine which is correct at this point in time.

  8. To illustrate the systematic error problem for the satellite data, what would a line for table 2 look like with RSS - UAH (and perhaps done for different time intervals also) - there are surely statistically significant or close to significant differences for the global trend between the two satellite datasets at least. How do you explain that? One is wrong, they both are, or (by default) there is a significant underestimation of uncertainty in long-term trends for that data.

  9. John Christy has some comments up on Judith's blog.

    I really think the last 15-years should be the focus of accuracy and reliability studies for RSS vs UAH, simply because this is virtually the only period they are using different equipment.

    Before that, other than different assumptions about the diurnal cycle, they should be giving the same answer. So it's a test to some degree of sensitivity to decisions about the models used to remove systematic effects, but that's not very interesting when trying to judge the quality of the comparison of satellite to surface stations.

  10. There are two questions: 1) the variability of the amplification ratio: you can at least make an approximation by using multiple realizations of a climate model under warming, and look at how much that amplification ratio varies between realizations. Importantly, it would have to be a climate model that does a good job picking up the high sensitivity of the satellite datasets to ENSO variability. 2) the accuracy of the relative datasets: like several commenters here, I'm more dubious about the satellite dataset providing reliable trends.

    Having said that, I do think it is an interesting comparison, and I wouldn't be completely surprised to find that in fact, there is some evidence that the troposphere is not warming as fast relative to the surface as climate models project. But I think that is very far from convincingly demonstrated.


    1. I had a comment here on the question of the ratio of trends.

      I think getting the ratio correct is tougher than accurately computing the difference, though I now see that even if you subtract, you wouldn't necessarily get a cancellation of natural variability: This is precisely because of the differential rate of warming between surface air temperature and TLT.

      So this seems like a very messy way to test predictions of AGW.

  11. Nick,
    Based on the original post and your updates, I am now confused. Do you think the papers claims are correct or not? Do you think there is or is not a significant discrepancy between surface warming and expected tropospheric warming?

    1. Stephen,
      The original claim was that the trend of surface was significantly greater than the trend of trop. I think that is not correct, or at least, not established. Maybe it could be using correlation.

      What they apparently did was to test whether the trend of differences was significantly (relative to the spread of those residuals) greater than zero. I think that is a reasonable test. Others might not; that's why it is important to be clear. Their results are sometimes significant, sometimes not. I think that over this period, the result is marginally significant. But I wouldn't conclude anything without seeing other periods.