Tuesday, January 20, 2015

So 2014 may not have been warmest?

That has been the meme from people who don't like the thought. Bob Tisdale, at WUWT, gives a rundown. There is endless misinterpretation of a badly expressed section in the joint press release from NOAA and GISS announcing the record.

The naysayers drift seems to be that there is uncertainty, so we can't say there is a record. But this is no different from any year/month in the past, warmest or coldest. 2005 was uncertain, 2010 also. Here they are, for example, proving that July 1936 was the hottest month in the US. Same uncertainties apply, but no, it was the hottest.

So what was badly expressed by NOAA/GISS. They quoted uncertainties without giving the basis for them. What do they mean and how were they calculated? Just quoting the numbers without that explanation is asking for trouble.

The GISS numbers seem to be calculated as described by Hansen, 2010, paras 86, 87, and Table 1. It's based on the vagaries of spatial sampling. Temperature is a continuum - we measure it at points and try to infer the global integral. That is, we're sampling, and different samples will give different results. We're familiar with that; temperature indices do vary. UAH and RSS say no records, GISS says yes, just, and NOAA yes, verily. HADCRUT will be very close; Cowtan and Way say 2010 was top.

I think NOAA are using the same basis. GISS estimates the variability from GCMs, and I think NOAA mainly from subsetting.

Anyway, this lack of specificity about the meaning of CIs is a general problem that I want to write about. People seem to say there should be error bars, but when they see a number, enquire no further. CI's represent the variation of a population of which that number is a member, and you need to know what that population is.

In climate talk, there are at least three quite different types of CI:
  • Measurement uncertainty - variation if we could re-measure same times and places
  • Spatial sampling uncertainty - variation if we could re-measure same times, different places
  • Time sampling uncertainty - variation if we could re-measure at different times (see below), same places
I'll discuss each below the jump. (The plot that was here has been moved to new post)

Measurement uncertainty

This is least frequently quoted, mainly because it is small. But people very often assume it is what is meant. Measurement can have bias or random error. Bias is inescapable, even hard to define. For example, MMTS often reads lower than thermometers. It doesn't help to argue which is right; only to adjust when there is a change.

I speak of a random component, but the main aspect of it is that when you average a lot of readings, there will be cancellations. A global year average has over a million daily readings. In an average of N readings, cancellation should reduce noise by about sqrt(N); in this case by a factor of 1000.

Spatial sampling uncertainty

That is present in every regional average. As said above, we claim an average over all points in the region, but have only a sample. A different sample might give a different result. This is not necessarily due to randomness in the temperature field; when GISS gives an uncertainty, I presume that reflects some randomness in choice of stations, quite possibly for the same field.

A reasonable analogy here is the stock exchange. We often hear of a low for the year, or a record high, etc. That reflects a Dow calculation on a sample of stocks. A different sample might well lead to a non-record. And indeed, there are many indices based on different samples. That doesn't seem to bother anyone.

What I find very annoying about the GISS/NOAA account is that in giving probabilities of 2014 being a record, they don't say if it for the same sample. I suspect it includes sample variation. But in fact we have very little sample variation. In 2010 we measured in much the same places as 2014. It makes a big difference.

Time sampling uncertainty.

This is another often quoted, usually misunderstood error. It most frequently arises with trends of a temperature series. They are quoted with an uncertainty which reflects a model of variation within timesteps. I do those calculations on the trend page and have written a lot about what that uncertainty means. The important distinction is that it is not an error in the trend that was. It is an uncertainty in the trend that might have been if the climate could be rerun with a new instance of random variation. That might sound silly, but it does have some relevance to extrapolating trends into the future. Maybe you think that is silly too.

Briggs has a muddled but interesting article, trenchantly deprecating this use of CI's. RealClimate cited a trend (actually just quoting Cowtan and Way) as 0.116 +/- 0.137. Said Briggs:
"Here’s where it becomes screwy. If that is the working definition of trend, then 0.116 (assuming no miscalculation) is the value. There is no need for that “+/- 0.137? business. Either the trend was 0.116 or it wasn’t. What could the plus or minus bounds mean? They have no physical meaning, just as the blue line has none. The data happened as we saw, so there can not be any uncertainty in what happened to the data. The error bounds are persiflage in this context."

I don't totally disagree. 0.116 is the trend that was. The interesting thing is, you can say the same about the commonly quoted standard error of the mean. Each is just a weighted sum, with the error calculated by adding the weighted variances.

I've used this analogy. If you have averaged the weights of 100 people, the CI you need depends on what you want to use the average for. If it is to estimate the average weight of the population of which they are a fair sample, then you need the se. But if you are loading a boat, and want to know if it can carry them, the se is of no use. You want average instrumental error, if anything.

And the thing about trend is, you often are interested in a particular decade, not in its status as a sample. That is why I inveigh against people who want to say there was no warming over period x because, well, there was, and maybe a lot, but it isn't statistically significant. SS is about whether it might, in some far-fetched circumstances, happen again. Not about whether it actually happened.

Briggs is right on that. Of course I couldn't resist noting that in his recent paper with Monckton, such CI's featured prominently, with all the usual misinterpretations. No response - there never is there.

Statistical Tie

OK, this a pet peeve of mine. CI's are complicated, especially with such different bases, and people who can't cope often throw up their hands and say it is a "statistical tie". But that is complete nonsense. And I was sorry to see it crop up in Hansen's 2014 summary (Appendix) where 2014, 2010 and 2005 were declared to be "statistically tied".

You often see this in political polling, where a journalist has been told to worry about sampling error, and so declares a race where A polls 52%, B 48% with sampling error, as a "statistical tie".

But of course it isn't. Any pol would rather be on 52%. And such a margin close to the election usually presages a win. Any Bayesian could sort that out.

2014 was the warmest year. It doesn't matter how you juggle probabilities. There is no year with a better claim.


  1. The denialistas are completely clueless about the real work and understanding that goes into making a measurement and even what measurement actually means

  2. Nick: 2014 was the warmest year. It doesn't matter how you juggle probabilities. There is no year with a better claim.

    Um. it's the best claim by a tiny margin that will likely change as we get more data. So therefore it is “the warmest year” (with no uncertainty statement).

    So … >.<

    Who doesn’t understand measurement science again, Flakmesiter?

    1. Carrick,
      NOAA has a safe margin. GISS too, since most of stations have reported. 4211 GHCN/ERSST, according to the latest count. HADCRUT is in doubt.

      I'll do a new post shortly with plots of the record progress for NOAA etc, which should put it into perspective.

    2. The effects and uncertainties discussed by our dear Dr. Stokes in the above post are beyond the comprehension of the Wutters, and they have no desire to learn.

      As evidence of this, just look at how they lap the crap from Goddard and his ilk,,,

  3. NOAA calculated a 48% chance that 2014 was the hottest. They define 48% as "more unlikely than likely." It's unscientific to report that as "the hottest year." NOAA is a government funded organization and, as such, should stick to the highest scientific standards. Their statement amounts to propaganda and should not be tolerated by taxpayers. - Thomas

    1. I think the "more unlikely than likely." is for a binary choice. Here 2014 would have a plurality.

      There is a lot of goalpost moving here. We've been talking about hottest years for a long time. 2010 probably would not have cleared 50% either. The goalposts can be moved tp infinity so in theory you can't talk of a hottest year at all. But I bet people will.

  4. Nick,

    Based on the available surface temperature data, 2014 is more likely the warmest year than any other that was close to the same temperature. My guess is that a much more "statistically significant" claim could be made if the influence of ENSO were accounted for (eg. subtract ~0.09 times the 3-4 month lagged Nino 3.4 index from the monthly global average starting in ~1990). The reduced noise will narrow the uncertainty range, and make it easier to declare 'the real winner'.

    1. Steve,
      In this case it probably would give a clearer result, but I'm not sure what it achieves. Some people look to record heat as a measure of El Nino. And there would be all sorts of special pleading, not just ENSO.

      Allowing for ENSO doesn't reduce the spatial sampling uncertainty. It will intrude again.

    2. 2013 and 2014 have been dominated by ENSO neutral with the majority of the months having a negative ONI numbers. It looks to me like if ENSO removed, both years get even warmer.

    3. Nick,
      Well, if the ultimate goal is to evaluate a secular trend, then removing pretty clearly known/quantified contributors to short term variation ought to show that trend more clearly, though some will object (like Willis, for example). If there had been a major volcanic eruption, people would be trying (through special pleadings?) to account for that, and with a lot less certainty than the influence of ENSO. There is nothing to be done about sampling uncertainty, of course.

      I do find it interesting that neither RSS nor UAH satellite data for the lower troposphere are even close to record levels; while it ought to always be the other way around (the 'tropospheric hot spot' and all that). The discrepancy between lower tropospheric temperatures and surface temperatures may be telling us something interesting about boundary layer warming versus tropospheric warming. Could the discrepancy be confirmed by examining the temperature trends for higher altitude versus lower altitude surface stations?

    4. Very straightforward to remove the natural temperature fluctuations. Look up the CSALT approach as an example:


      The residual correlates well to the log of atmospheric CO2 concentration.

    5. Stephen Fitzpatrick: Well, if the ultimate goal is to evaluate a secular trend, then removing pretty clearly known/quantified contributors to short term variation ought to show that trend more clearly, though some will object (like Willis, for example).

      This goes under the rubric "modeling the noise. As long as you can remove the short-period variation without introducing systematic bias, it's a better method.

      This is much easier to do in experimental sciences, because you can develop the methodology with one set of data, and then validate it with replications.

      I think it's doable in observational science, here you don't have the opportunity to reset the Earth and replay it again with the same exact physical system.

  5. There's a lot that could be said here, but I wanted to comment first on what Briggs said here:

    Here’s where it becomes screwy. If that is the working definition of trend, then 0.116 (assuming no miscalculation) is the value. There is no need for that “+/- 0.137? business. Either the trend was 0.116 or it wasn’t. What could the plus or minus bounds mean? They have no physical meaning, just as the blue line has none. The data happened as we saw, so there can not be any uncertainty in what happened to the data. The error bounds are persiflage in this context.

    The problem here is the actual value of the trend you get depends on the length of the window, the optimization function, how you treat episodic noise and so forth. In other words, 0.116 is rarely robustly 0.116.

    Those of us who do metrology for a living know that the central value is simply not a robust feature of the data, unless the uncertainty (assumed to be computed correctly) is also correspondingly small. Thus one uses that uncertainty figure to give us is an idea how robust the quoted central value is.

    But there are many other uses for uncertainty. The suggestion by Nick that we only want a ± value so we can appropriately adorn our central value (with something like a pretty ring I guess :-P) is I’m afraid risible.

    A great deal of effort gets spent on analyzing the uncertainty and studying the properties of the noise in a given measurement. This is as equally true at NOAA and GISS as it is in any other field. Here I think Nick is being a little unfair to criticize them for not including all of the details of that measurement study into a short blurb like the report he linked.

    In a complete flip to the argument that Briggs gives, it’s actually the case for some applications that when you forecast, you only forecast the future variability. In the financial market, this goes under the rubric forecasting market volatility [1]. For stock investments, you would use this to determine when you have the lowest risk in making future investments.

    Similar applications are being developed in the physical sciences, and goes under the rubric “noise forecasting”: In other words, we use a model to tell us when it is good to measure?

    If I don’t get to come back and comment further on this thread, I do think Nick is partly right about the uncertainty calculation. In practice, it is rarely the case in practice that stated uncertainty exceeds actual measurement error. And there are non-stationary systematic bias issues in the data here that are complex and difficult to handle properly.

    And as I mentioned on another thread, how likely is it that the algorithms used now will remain unchanged in the future? If they change, it could change the “on record temperature”.

    Probably Kerry Emanuel said it best [2]:

    I think it is a mistake to focus on single years, whether they be cold or hot. Other than that, I have no particular opinion.

    It’s tough problem to fully crack, and there’s little fruit to enjoy in the inside once you’ve done so.

    I used footnotes because for some reason, it was randomly refusing to accept perfectly valid attributed hyperlinks.

    [1] http://faculty.chicagobooth.edu/bryan.kelly/research/pdf/volfor.pdf

    [2] http://www.washingtonpost.com/blogs/capital-weather-gang/wp/2015/01/16/scientists-react-to-warmest-year-2014-underscores-undeniable-fact-of-human-caused-climate-change/

    1. "The suggestion by Nick that we only want a ± value so we can appropriately adorn our central value"
      Carrick, that isn't my suggestion. I'm saying that people demand uncertainty numbers, and then don't inquire what they mean. I think uncertainty should be stated, but it should be the right kind (I've listed) and stated accordingly.

  6. Nick, at Sou's suggestion, I'm reposting a question I asked at HotWhopper. I'm wondering if you (or your commenters) can provide some clarity. The question really goes to your point about spatial sampling uncertainty.
    - - - - - - - - - - - - - - - -
    Are the error bars for any given year completely independent of all other years, or do the issues that lead to error margins point in the same direction all the time (but we don't know which)? Or is it a mix of both?

    To give a (rather lengthy) example of what I'm talking about suppose the anomaly for Year X is +0.7 +/- 0.05 and the anomaly for Year Y is +0.61 +/- 0.05:
    If the errors are independent year-on-year, Year X could be a low as +0.65 and Year Y could be a high as +0.66, so there is some small probability that Year Y was hotter than Year X.

    If not independent, Year X could be as low as +0.65, but if so then all other results are likely to be at the bottom end of their error range (ie the factors that created the error are more or less constant across the whole data set), so Year Y is very unlikely (in this case) to be hotter than +0.56. Alternatively, if we consider that Year Y might be at the upper end of its error range (+0.66), then it is likely X is also at the top end of its range (+0.75), and Year X is definitively, 100% guaranteed to be the hotter year, even if we don't know *exactly* how hot it was.

    The discussion I see on this sort of thing suggests its the former (or always assumes it at least), but in my work (business analysis, not very sciency I'm afraid) I see a lot of cases where the second example is a better representation of what is going on - while the ranges of uncertainty overlap, the same factors are at work throught the dataset, and if they push one data point high or low, they are likely to be doing it for all.
    - - - - - - - - - - - - -

    1. Frank D,
      First, apologies about the initial detour to the spam bin. Mystery.

      I think your point is very important, and relates to the hottest year issue. Spatial sampling uncertainty estimates what would happen if you had resampled at different set of locations, and probably includes the variability if thoise had actually been the same locations - ie includes measurement error. I say probably, because I think it would be difficult to make a separation.

      Now when GISS say gives a number for 2010 and then 2014, they are mostly estimated in the same locations (it's all a bit murky with SST). So estimating on the basis that different locations could have been chosen overestimates the uncertainty. I don't know how much, and it might be very hard to estimate.

    2. Nick,

      Indeed, if we wish to know, how certainly year A is warmer than year B we should perhaps do a pairwise comparison of these two years, not two separate comparisons with an inaccurately known base. If the base had not it's own uncertainty the two approaches would be equivalent but that's not exactly the case.

      Data on other years may, however, affect also the pairwise comparison, if the calculation of the temperature index uses in some way information about spatial correlations in the transformation from the samples of each year to the index value.

      A comparison of recent years could probably be based solely on a common set of measurements dropping all stations that miss a significant number of values for either year. Some infilling is possible at least for stations that can be compared with nearby stations.

      Four warmest years could be compared either pairwise or directly dropping again all the other years.

      In comparison of 2014 with the other warm years on problem is in the spatial distribution of the differences as 2014 had the warmest Pacific, but not the the warmest continents. In any approach there's a risk of error in forming the fields of difference from the limited set of data points. The Pacific may be a source of error in that, an attempt to include very high latitudes as Cowtan and Way have done does surely add to the uncertainty. Thus it may be possible to make stronger statements, when high latitudes are excluded than when they are included.

  7. Nick and others: Monthly MGST anomalies can change several tenths of a degC from one month to another. Does that change how we interpret data showing that the annual MGST for 2014 was several hundredths of a degC warmer than for two earlier years? I think it should and therefore (in my ignorance) think the information should be reported with confidence intervals. However, the answer depends on whether one views these monthly and daily variations as NOISE in the annual GMST or whether the annual GMST is a concept that contains no noise except measurement uncertainty. All measurement systems, of course, contain some uncertainty/noise, but the monthly variations MGST are much bigger than measurement uncertainty.


    1. Frank,
      I think there should be uncertainty shown for annual values, and also for year rankings, with care about independence. That uncertainty should be based on measurement error, spatial sampling etc.

      Whta you are referring to is, I think, what I have called time sampling error. Given the fluctuations it can indeed be seen as rather a matter of luck whether there is a record in 2014. The dip in November, for example, made GISS a near thing. But it doesn't affect whether there was a record. It affects what you may think a record signifies.

    2. Nick: Suppose we made a histogram of the daily GMST anomalies for 2014, 2005 and 2010. Or both the highs and lows. Or we can make a pdf covering roughly a million temperature anomaly measurements made on the planet in each of these years. Does the concept of an annual MGST anomaly encompass only the average of all of these readings or does it include some information about the distribution of the readings that were averaged?

      Area-weighting the data certainly makes sense in either case.

      I could make a case that no one has experienced significant global warming because the scatter in his daily or seasonal local weather experience is so much greater than the amount of warming anyone has personally experienced.


    3. Frank,
      All respectable temperature indices are area-weighted (part via gridding). Yes, the MGST is an estimate of the mean only.

      Farms in Greenland, English wines. Plenty of people experience global warming. I have a jacaranda thriving in my back yard - very marginal in Melbourne 50 years ago.

  8. One way to look at the effect of the spatial sampling error is by comparing the difference between Cowtan and Way corrected series and the original series. I looked at the difference between the average of the three infill methods (kriging, UAH and MERRA) from Cowtan and Way and the original published series for both HadCRUT4 and GHCN. The C&W corrected files were rebaselined to match the original series from 1980-1989 inclusive.

    Here is the result

    If you compare the 2005 to 2010 annual temperature, there does seem to be a big correlation (so what Nick is suggesting may be true for those years). This is not surprising because those were both strong El Niño years, and there's a well known coupling between El Niño and Arctic weather.

    2014 on the other hand was not a strong El Niño year, and correspondingly there doesn't seem to have been a big effect from the C&W corrections on annual temperature.

    I'd still say that trends are what matters. Sure the steadily increase in record temperature with time is evidence there is a trend, but I'm not sure it says very much beyond that. And trend is, IMO, a much simpler problem to do error analysis for than maximum temperature records.

    (When you are looking at a maximum, you are definitionally on the tail, and all sorts of stuff starts to matter. It's just a messy way to do physics, especially when there are much easier, and more germane, metrics to examine.)

    1. Carrick,
      "for both HadCRUT4 and GHCN"
      I presume GHCN means NOAA Land/Ocean?

      I think the way GISS does it (Hansen2010) is simple and direct. They take GCM output, which they can integrate accurately, and then integrate it with the Gistemp sampling. The variation of the difference is their spatial sampling error. I think HADCRUT do something similar with NCEP/NCAR, tho they use other methods too.

      I thought because it invokes the particular station system, or at least gridding, 2010 and 2014 would not be independent, but at RealClimate Victor V persuaded me otherwise.

      Your result does emphasise how C&W mainly deviates recently, with a peak at 2010 and then a decline. I guess that's why they have 2010 as hottest.

    2. I doubt this plays into this at all, but Jan 2015 looks like it's going to be pretty hot. 2014's record may last all of one month. To an even higher 12-month anomaly. Then there's February.

    3. JCH, since you like rankings so much, I made this figure just for you. ;-)

      Rankings versus temperature

      The rankings are 12-month running-average-smoothed NCDC (aka GHCN aka NOAA) data plotted as negative of ranking versus year.

      We actually use rankings when analyzing data at times. Spearman's correlation coefficient is an example of this. It's referred to as "non-parametric statistics".

      Anyway, I thought similarities to be pretty striking. As you'd expect, what you are really saying when you are continuously getting new records (even if just by a hair) is "temperature trend is still positive".

      Most sensible people won't dispute that the Earth is still warming. Some of us won't even dispute that the trend is smaller since 2002. Some of those (myself included) won't assign any particular meaning to this.

      I think as this research suggests we don't have a good enough handle on the sources of short term variability to try and place too much meaning on short term variations in trend.

    4. Oops wrong link. Here's the right one:


  9. That's interesting. We'll see where it is when 2015 fries eggs!


    Salad days are either here or right around the corner.

  10. January is in the bag. Will it top .68C on GISS?

  11. UAH for January out at +0.35. The heatwave of 2014 might continue for months