moyhu: So 2014 may not have been warmest?

Tuesday, January 20, 2015

So 2014 may not have been warmest?

That has been the meme from people who don't like the thought. Bob Tisdale, at WUWT, gives a rundown. There is endless misinterpretation of a badly expressed section in the joint press release from NOAA and GISS announcing the record.

The naysayers drift seems to be that there is uncertainty, so we can't say there is a record. But this is no different from any year/month in the past, warmest or coldest. 2005 was uncertain, 2010 also. Here they are, for example, proving that July 1936 was the hottest month in the US. Same uncertainties apply, but no, it was the hottest.

So what was badly expressed by NOAA/GISS. They quoted uncertainties without giving the basis for them. What do they mean and how were they calculated? Just quoting the numbers without that explanation is asking for trouble.

The GISS numbers seem to be calculated as described by Hansen, 2010, paras 86, 87, and Table 1. It's based on the vagaries of spatial sampling. Temperature is a continuum - we measure it at points and try to infer the global integral. That is, we're sampling, and different samples will give different results. We're familiar with that; temperature indices do vary. UAH and RSS say no records, GISS says yes, just, and NOAA yes, verily. HADCRUT will be very close; Cowtan and Way say 2010 was top.

I think NOAA are using the same basis. GISS estimates the variability from GCMs, and I think NOAA mainly from subsetting.

Anyway, this lack of specificity about the meaning of CIs is a general problem that I want to write about. People seem to say there should be error bars, but when they see a number, enquire no further. CI's represent the variation of a population of which that number is a member, and you need to know what that population is.

In climate talk, there are at least three quite different types of CI:

Measurement uncertainty - variation if we could re-measure same times and places
Spatial sampling uncertainty - variation if we could re-measure same times, different places
Time sampling uncertainty - variation if we could re-measure at different times (see below), same places

I'll discuss each below the jump. (The plot that was here has been moved to new post)

Measurement uncertainty

This is least frequently quoted, mainly because it is small. But people very often assume it is what is meant. Measurement can have bias or random error. Bias is inescapable, even hard to define. For example, MMTS often reads lower than thermometers. It doesn't help to argue which is right; only to adjust when there is a change.

I speak of a random component, but the main aspect of it is that when you average a lot of readings, there will be cancellations. A global year average has over a million daily readings. In an average of N readings, cancellation should reduce noise by about sqrt(N); in this case by a factor of 1000.

Spatial sampling uncertainty

That is present in every regional average. As said above, we claim an average over all points in the region, but have only a sample. A different sample might give a different result. This is not necessarily due to randomness in the temperature field; when GISS gives an uncertainty, I presume that reflects some randomness in choice of stations, quite possibly for the same field.

A reasonable analogy here is the stock exchange. We often hear of a low for the year, or a record high, etc. That reflects a Dow calculation on a sample of stocks. A different sample might well lead to a non-record. And indeed, there are many indices based on different samples. That doesn't seem to bother anyone.

What I find very annoying about the GISS/NOAA account is that in giving probabilities of 2014 being a record, they don't say if it for the same sample. I suspect it includes sample variation. But in fact we have very little sample variation. In 2010 we measured in much the same places as 2014. It makes a big difference.

Time sampling uncertainty.

This is another often quoted, usually misunderstood error. It most frequently arises with trends of a temperature series. They are quoted with an uncertainty which reflects a model of variation within timesteps. I do those calculations on the trend page and have written a lot about what that uncertainty means. The important distinction is that it is not an error in the trend that was. It is an uncertainty in the trend that might have been if the climate could be rerun with a new instance of random variation. That might sound silly, but it does have some relevance to extrapolating trends into the future. Maybe you think that is silly too.

Briggs has a muddled but interesting article, trenchantly deprecating this use of CI's. RealClimate cited a trend (actually just quoting Cowtan and Way) as 0.116 +/- 0.137. Said Briggs:
"Here’s where it becomes screwy. If that is the working definition of trend, then 0.116 (assuming no miscalculation) is the value. There is no need for that “+/- 0.137? business. Either the trend was 0.116 or it wasn’t. What could the plus or minus bounds mean? They have no physical meaning, just as the blue line has none. The data happened as we saw, so there can not be any uncertainty in what happened to the data. The error bounds are persiflage in this context."

I don't totally disagree. 0.116 is the trend that was. The interesting thing is, you can say the same about the commonly quoted standard error of the mean. Each is just a weighted sum, with the error calculated by adding the weighted variances.

I've used this analogy. If you have averaged the weights of 100 people, the CI you need depends on what you want to use the average for. If it is to estimate the average weight of the population of which they are a fair sample, then you need the se. But if you are loading a boat, and want to know if it can carry them, the se is of no use. You want average instrumental error, if anything.

And the thing about trend is, you often are interested in a particular decade, not in its status as a sample. That is why I inveigh against people who want to say there was no warming over period x because, well, there was, and maybe a lot, but it isn't statistically significant. SS is about whether it might, in some far-fetched circumstances, happen again. Not about whether it actually happened.

Briggs is right on that. Of course I couldn't resist noting that in his recent paper with Monckton, such CI's featured prominently, with all the usual misinterpretations. No response - there never is there.

Statistical Tie

OK, this a pet peeve of mine. CI's are complicated, especially with such different bases, and people who can't cope often throw up their hands and say it is a "statistical tie". But that is complete nonsense. And I was sorry to see it crop up in Hansen's 2014 summary (Appendix) where 2014, 2010 and 2005 were declared to be "statistically tied".

You often see this in political polling, where a journalist has been told to worry about sampling error, and so declares a race where A polls 52%, B 48% with sampling error, as a "statistical tie".

But of course it isn't. Any pol would rather be on 52%. And such a margin close to the election usually presages a win. Any Bayesian could sort that out.

2014 was the warmest year. It doesn't matter how you juggle probabilities. There is no year with a better claim.

29 comments:

FlakmeisterJanuary 20, 2015 at 12:19 PM
The denialistas are completely clueless about the real work and understanding that goes into making a measurement and even what measurement actually means
ReplyDelete
Replies
CarrickJanuary 21, 2015 at 8:25 AM
Nick: 2014 was the warmest year. It doesn't matter how you juggle probabilities. There is no year with a better claim.

Um. it's the best claim by a tiny margin that will likely change as we get more data. So therefore it is “the warmest year” (with no uncertainty statement).

So … >.<

Who doesn’t understand measurement science again, Flakmesiter?
ReplyDelete
Replies
AnonymousJanuary 21, 2015 at 11:23 AM
NOAA calculated a 48% chance that 2014 was the hottest. They define 48% as "more unlikely than likely." It's unscientific to report that as "the hottest year." NOAA is a government funded organization and, as such, should stick to the highest scientific standards. Their statement amounts to propaganda and should not be tolerated by taxpayers. - Thomas
ReplyDelete
Replies
UnknownJanuary 21, 2015 at 11:38 AM
Nick,

Based on the available surface temperature data, 2014 is more likely the warmest year than any other that was close to the same temperature. My guess is that a much more "statistically significant" claim could be made if the influence of ENSO were accounted for (eg. subtract ~0.09 times the 3-4 month lagged Nino 3.4 index from the monthly global average starting in ~1990). The reduced noise will narrow the uncertainty range, and make it easier to declare 'the real winner'.
ReplyDelete
Replies
CarrickJanuary 22, 2015 at 6:38 AM
There's a lot that could be said here, but I wanted to comment first on what Briggs said here:

Here’s where it becomes screwy. If that is the working definition of trend, then 0.116 (assuming no miscalculation) is the value. There is no need for that “+/- 0.137? business. Either the trend was 0.116 or it wasn’t. What could the plus or minus bounds mean? They have no physical meaning, just as the blue line has none. The data happened as we saw, so there can not be any uncertainty in what happened to the data. The error bounds are persiflage in this context.

The problem here is the actual value of the trend you get depends on the length of the window, the optimization function, how you treat episodic noise and so forth. In other words, 0.116 is rarely robustly 0.116.

Those of us who do metrology for a living know that the central value is simply not a robust feature of the data, unless the uncertainty (assumed to be computed correctly) is also correspondingly small. Thus one uses that uncertainty figure to give us is an idea how robust the quoted central value is.

But there are many other uses for uncertainty. The suggestion by Nick that we only want a ± value so we can appropriately adorn our central value (with something like a pretty ring I guess :-P) is I’m afraid risible.

A great deal of effort gets spent on analyzing the uncertainty and studying the properties of the noise in a given measurement. This is as equally true at NOAA and GISS as it is in any other field. Here I think Nick is being a little unfair to criticize them for not including all of the details of that measurement study into a short blurb like the report he linked.

In a complete flip to the argument that Briggs gives, it’s actually the case for some applications that when you forecast, you only forecast the future variability. In the financial market, this goes under the rubric forecasting market volatility [1]. For stock investments, you would use this to determine when you have the lowest risk in making future investments.

Similar applications are being developed in the physical sciences, and goes under the rubric “noise forecasting”: In other words, we use a model to tell us when it is good to measure?

If I don’t get to come back and comment further on this thread, I do think Nick is partly right about the uncertainty calculation. In practice, it is rarely the case in practice that stated uncertainty exceeds actual measurement error. And there are non-stationary systematic bias issues in the data here that are complex and difficult to handle properly.

And as I mentioned on another thread, how likely is it that the algorithms used now will remain unchanged in the future? If they change, it could change the “on record temperature”.

Probably Kerry Emanuel said it best [2]:

I think it is a mistake to focus on single years, whether they be cold or hot. Other than that, I have no particular opinion.

It’s tough problem to fully crack, and there’s little fruit to enjoy in the inside once you’ve done so.

I used footnotes because for some reason, it was randomly refusing to accept perfectly valid attributed hyperlinks.

[1] http://faculty.chicagobooth.edu/bryan.kelly/research/pdf/volfor.pdf

[2] http://www.washingtonpost.com/blogs/capital-weather-gang/wp/2015/01/16/scientists-react-to-warmest-year-2014-underscores-undeniable-fact-of-human-caused-climate-change/

ReplyDelete
Replies
Frank DJanuary 22, 2015 at 11:02 AM
Nick, at Sou's suggestion, I'm reposting a question I asked at HotWhopper. I'm wondering if you (or your commenters) can provide some clarity. The question really goes to your point about spatial sampling uncertainty.
- - - - - - - - - - - - - - - -
Are the error bars for any given year completely independent of all other years, or do the issues that lead to error margins point in the same direction all the time (but we don't know which)? Or is it a mix of both?

To give a (rather lengthy) example of what I'm talking about suppose the anomaly for Year X is +0.7 +/- 0.05 and the anomaly for Year Y is +0.61 +/- 0.05:
If the errors are independent year-on-year, Year X could be a low as +0.65 and Year Y could be a high as +0.66, so there is some small probability that Year Y was hotter than Year X.

If not independent, Year X could be as low as +0.65, but if so then all other results are likely to be at the bottom end of their error range (ie the factors that created the error are more or less constant across the whole data set), so Year Y is very unlikely (in this case) to be hotter than +0.56. Alternatively, if we consider that Year Y might be at the upper end of its error range (+0.66), then it is likely X is also at the top end of its range (+0.75), and Year X is definitively, 100% guaranteed to be the hotter year, even if we don't know *exactly* how hot it was.

The discussion I see on this sort of thing suggests its the former (or always assumes it at least), but in my work (business analysis, not very sciency I'm afraid) I see a lot of cases where the second example is a better representation of what is going on - while the ranges of uncertainty overlap, the same factors are at work throught the dataset, and if they push one data point high or low, they are likely to be doing it for all.
- - - - - - - - - - - - -
ReplyDelete
Replies
AnonymousJanuary 25, 2015 at 7:33 AM
Nick and others: Monthly MGST anomalies can change several tenths of a degC from one month to another. Does that change how we interpret data showing that the annual MGST for 2014 was several hundredths of a degC warmer than for two earlier years? I think it should and therefore (in my ignorance) think the information should be reported with confidence intervals. However, the answer depends on whether one views these monthly and daily variations as NOISE in the annual GMST or whether the annual GMST is a concept that contains no noise except measurement uncertainty. All measurement systems, of course, contain some uncertainty/noise, but the monthly variations MGST are much bigger than measurement uncertainty.

Frank

ReplyDelete
Replies
CarrickJanuary 26, 2015 at 7:16 AM
One way to look at the effect of the spatial sampling error is by comparing the difference between Cowtan and Way corrected series and the original series. I looked at the difference between the average of the three infill methods (kriging, UAH and MERRA) from Cowtan and Way and the original published series for both HadCRUT4 and GHCN. The C&W corrected files were rebaselined to match the original series from 1980-1989 inclusive.

Here is the result

If you compare the 2005 to 2010 annual temperature, there does seem to be a big correlation (so what Nick is suggesting may be true for those years). This is not surprising because those were both strong El Niño years, and there's a well known coupling between El Niño and Arctic weather.

2014 on the other hand was not a strong El Niño year, and correspondingly there doesn't seem to have been a big effect from the C&W corrections on annual temperature.

I'd still say that trends are what matters. Sure the steadily increase in record temperature with time is evidence there is a trend, but I'm not sure it says very much beyond that. And trend is, IMO, a much simpler problem to do error analysis for than maximum temperature records.

(When you are looking at a maximum, you are definitionally on the tail, and all sorts of stuff starts to matter. It's just a messy way to do physics, especially when there are much easier, and more germane, metrics to examine.)
ReplyDelete
Replies
JCHJanuary 28, 2015 at 8:09 AM
That's interesting. We'll see where it is when 2015 fries eggs!

http://1.bp.blogspot.com/-ehofFfmWF28/UtqBwQVuqJI/AAAAAAAAE0k/sFGoSUYa2vQ/s1600/PDOGistemp.png

Salad days are either here or right around the corner.
ReplyDelete
Replies
JCHFebruary 1, 2015 at 8:54 AM
January is in the bag. Will it top .68C on GISS?
ReplyDelete
Replies
JCHFebruary 4, 2015 at 2:36 AM
UAH for January out at +0.35. The heatwave of 2014 might continue for months
ReplyDelete
Replies

Add comment

An interactive topic index for all Moyhu posts.
Latest Ice and Temperature data
Climate Data Portals
A gallery of Javascript-enhanced graphics
Temperature trend viewer
Google Maps and GHCN
WebGL map of past GHCN/SST station temperatures
WebGL map of GHCN/SST station temperature trends
HiRes NOAA OI SST with WebGL and Movie
Regional Hi-Res SST movies
WebGL Facility
TempLS Guide
More pages, and blog glossary

moyhu

Tuesday, January 20, 2015

So 2014 may not have been warmest?

So 2014 may not have been warmest?

Measurement uncertainty

Spatial sampling uncertainty

Time sampling uncertainty.

Statistical Tie

29 comments:

Maintained Pages

Search This Blog

Recent Comments

Blogroll

Blog Archive

Translate

Resources

About Me

moyhu

Tuesday, January 20, 2015

So 2014 may not have been warmest?

So 2014 may not have been warmest?

Measurement uncertainty

Spatial sampling uncertainty

Time sampling uncertainty.

Statistical Tie

29 comments:

Maintained Pages

Search This Blog

Recent Comments

Blogroll

Subscribe To

Blog Archive

Translate

Resources

About Me