## Friday, January 27, 2017

### ## Global anomaly spatial sampling error - and why use anomalies?

In this post I want to bring together two things that I seem to be talking a lot about, especially in the wake of our run of record high temperatures. They are
• What is the main component of the error that is quoted on global anomaly average for some period (month, year)? and
I'll use the USHCN V2.5 dataset as a worked example, since I'm planning to write a bit more about some recent misuse of that. In particular I'll use the adjusted USHCN for 2011.

#### Using anomalies

I have been finding it necessary to go over some essentials of using anomalies. The basic arithmetic is
• Compute some "normal" (usually a 30-year period time average for each month) for each station in the network,
• Form local anomalies by subtracting the relevant normal from each reading
• Average the anomalies (usually area-weighted)
People tend to think that you get the aanomaly average just by averaging, then subtracting an offset. That is quite wrong; they must be formed before averaging. Afterward you can shift to a different anomaly base by offsetting the mean.

#### Coverage error - spatial sampling error for the mean.

Indices like GISS and HADCRUT usually quote a monthly or annual mean with an uncertainty of up to 0.1°C. In recent years contrarians have seized on this to say that maybe it isn't a record at all - a "statistical tie" is a pet phrase, for those whose head hurts thinking about statistics. But what very few people understand is what that uncertainty means. I'll quote here from something I wrote at WUWT:

The way to think about stated uncertainties is that they represent the range of results that could have been obtained if things had been done differently. And so the question is, which "things". This concept is made explicit in the HADCRUT ensemble approach, where they do 100 repeated runs, looking at each stage in which an estimated number is used, and choosing other estimates from a distribution. Then the actual spread of results gives the uncertainty. Brohan et al 2006 lists some of the things that are varied.

The underlying concept is sampling error. Suppose you conduct a poll, asking 1000 people if they will vote for A or B. You find 52% for A. The uncertainty comes from, what if you had asked different people? For temperature, I'll list three sources of error important in various ways:

1. Measurement error. This is what many people think uncertainties refer to, but it usually isn't. measurement errors become insignificant because of the huge number of data that is averaged. measurement error estimates what could happen if you had used different observers or instruments to make the same observation, same time, same place.

2. Location uncertainty. Ths is dominant for global annual and monthly averages.You measured in sampled locations - what if the sample changed? You measured in different places around the earth? Same time, different places.

3. Trend uncertainty, what we are talking about above. You get trend from a statistical model, in which the residuals are assumed to come from a random distribution, representing unpredictable aspects (weather). The trend uncertainty is calculated on the basis of, what if you sampled differently from that distribution? Had different weather? This is important for deciding if your trend is something that might happen again in the future. If it is a rare event, maybe. But it is not a test of whether it really happened. We know how the weather turned out.

So here I'm talking about location uncertainty. What if you had sampled in different places. And in this exercise I'll do just that. I'll choose subsets of 500 of the USHCN and see what answers we see. That is why USHCN is chosen - there is surplus information from the dense coverage.

#### Why use anomaly?

We'll see. What I want to show is that it dramatically reduces location sampling error. The reason is that the anomaly set is much more homogeneous, since the expected value everywhere is more or less zero. So there is less variation in switching stations in and out. So I'll measure the error with and without anomaly formation.

#### USHCN example

So I'll look at the data for the 1218 stations in 2010, with an anomaly relative to the 1981-2010 average. In a Monte Carlo style, I make 1000 choices of 500 random stations, and find the average for 2011, first by just averaging station temperatures, and then the anomalies. The results (in °C) are:

 Base 1981-2010, unweighted .. Mean of means .. s.d. of means Temperatures 11.863 0.201 Anomalies 0.191 0.025

So the spatial error is reduced by a factor of 8, to an acceptable value. The error of temperature alone, at 0.201, was quite unacceptable. But anomalies perform even better with area-weighting, which should always be used. Here I calculate state averages and then area-weight the states (as USHCN used to do):

Update: I had implemented the area-weighting incorrectly when I posted about an hour ago. Now I think it is right, and the sd's are further reduced, although now the absolute improves by slightly more than the anomalies.

 Base 1981-2010, area-weighted .. Mean of means .. s.d. of means Temperatures 12.102 0.137 Anomalies 0.101 0.016

For both absolute T and anomalies, the mean has gone up, but the SD has reduced. In fact T improves by a slightly greater factor, but is still rather too high. The anomaly sd is now very good.

Does the anomaly base matter? A little, which is why WMO recommends the latest 3 decade period. I'll repeat the last table with the 1951-90 base:

 Base 1951-80, area-weighted .. Mean of means .. s.d. of means Temperatures 12.103 0.138 Anomalies 0.620 0.021

The T average is little changed, as expected. The small change reflects the fact that sampling 1000 makes the results almost independent of that random choice. But the anomaly mean is higher, reflecting warming. And the sd is a little higher, showing that subtracting a slightly worse estimate of the 2011 value (the older base) makes a less homogeneous set.

#### So what to make of spatial sampling error?

It is significant (with 500 station subsets) for anomaly, and the reason why large datasets are sought. In terms of record hot years, I think there is a case for omitting it. It is the error if between 2015 and 2016 the set of stations had been changed, and that happened only to a very small extent. I don't think the theoretical possibility of juggling the station set between years is an appropriate consideration for such a record.

#### Conclusion

Spatial sampling, or coverage error for anomalies is significant for ConUS. Reducing this error is why a lot of stations are used. It would be an order of magnitude greater without the use of anomalies, because of the much greater inhomogeneity, which is why one should never average raw temperatures spatially.

1. Nick, an interesting exercise. It definitely reaffirms the appropriateness of using temperature anomalies to construct regional or global surface temperature trends. I agree that spatial sampling is one of the larger error (uncertainty) sources for estimating these trends. However, as I'm sure you know, there are a lot of additional problems that increase the uncertainties, especially over longer time periods of 50 to 100 years or more. I touched on a few of them for land stations here, and I'm sure I missed some. I'm not sure that the typical uncertainty estimates offered with various global temperature trend estimates include all the important factors. I recognize that many sources of random error tend to cancel out over time with large numbers of samples. However, there are some types of error that can introduce false trends, such as station moves, and changes in the microscale environment at a station over time. And of course, we don't have many fixed stations in the oceans that cover ~70% of the globe. I'd love to see a global CRN based on the USCRN as a model, but also including strategically placed fixed ocean platforms. Some stations outside the US may already qualify, but we need a lot more for improved future assessments.

2. "So here I'm talking about location uncertainty. What if you had sampled in different places."

Your entire analysis rests on the locations we have measurements for. However as per Cowtan and Way, much of the warming is coming from the locations we have no measurements and so your analysis of uncertainty doesn't apply because we never had measurements to include or exclude.

So bearing that in mind, why doesn't the global temperature anomaly uncertainty take into account those areas where we make up data? Surely that must have uncertainty beyond what you describe above.

1. You can observe the variations in anomaly in the thousands of points that are measured, and you can look at that variability over various scales. It seems to follow a pattern. It is not impossible that the unmeasured points will follow a very different pattern, but not at all plausible.

Climate is not alone in having to rely on sampled data. In science we are constantly dealing with continua where we want to know something about bulk quantities. The only way to gain knowledge is by sampling. And the only way to check it is by more sampling.

3. "You can observe the variations in anomaly in the thousands of points that are measured, and you can look at that variability over various scales. It seems to follow a pattern."

I think you're missing the point. Its these unmeasured areas that DONT follow the pattern. They're the areas where there is supposed to be extra warming although we're not actually measuring it.

1. "Its these unmeasured areas that DONT follow the pattern. They're the areas where there is supposed to be extra warming although we're not actually measuring it."
No, you are missing the point. Virtually everything we know about the physical world is by induction, based on sampling. Our measurement capability is finite. The speed of light has been found to be constant, at many times and places. Maybe where it wasn't measured, it was different?

The warmth that GISS etc find in the Arctic is not invented. It is interpolated from places measured to be warm. The principle is no different to anywhere else. The data is sparser, and that is reflected in a higher uncertainty.

2. "The data is sparser, and that is reflected in a higher uncertainty."

This is the point. Where and how is it reflected by higher uncertainty?

3. Nick Stokes says, "The speed of light has been found to be constant, at many times and places. Maybe where it wasn't measured, it was different?"

That is a poor analogy. We know that the temperature is varies geographically, as does the anomaly. It is fair to posit that the speed of light is different geographically, then test for it, and likely this has been done and empirically the hypothesis (c is not constant) has been rejected. If we know that T and dT (anomaly) we should treat the data differently, and our hypothesis should be that the T,dT does not vary geographically, and test for that. And I suspect that has been done that is why for instance we see different T,dT geographically.

4. correction: I said, "...If we know that T and dT (anomaly) we should treat the data differently,..."

s/b "If we know that T and dT (anomaly) VARY GEOGRAPHICALLY, we should treat the data differently,

5. "It is fair to posit that the speed of light is different geographically, then test for it"
But you can't test everywhere. At some stage you have to make an inductive step. And that is true with any continuum measurement. At some stage of resolution, you find that interpolation works. You can test by seeing if stations could have been interpolated from other stations - this is pretty much what they do. In fact, of course, it doesn't work every time. But you are calculating an average. You need sufficient correlation. That is what they test for, going back to Hansen in about 1988.

6. The problem is that around the poles there is no testing and the poles are very different to the rest of the earth. There are sea ice boundaries, the polar vortex, periods of darkness for months on end, atmospheric temperature inversions...none of that gives any confidence that you can interpolate/kridge from where you have tested.

7. The thing to understand is that there is no neutral choice with regards to how unmeasured Arctic (and other) regions contribute to the global average. Interpolation assumes that nearby measured regions provide the best guide, and this approach seems to be supported by satellite data and physical models regarding the consistency of temperature anomalies over distance. It's also supported by out-of-sample surface observations where available. The alternatives are to effectively assume the Arctic is either not warming at all, or warming at the global average rate, approaches which are supported by nothing other than an irrational feeling that they're more neutral. Which seems better to you?

Also note, if you're talking about the area around the poles meaning something like 85N-90N, that's a tiny fraction of the Earth's surface. Even with a very strong assumed warming the contribution to global annual average anomaly is negligible - around 1%, and therefore the contribution to global average uncertainty is also negligible.

8. PaulS wonders "or warming at the global average rate, approaches which are supported by nothing other than an irrational feeling that they're more neutral."

Is it irrational to assume the unmeasured region would be warming at the same rate as the measured region? Interpolation isn't "nearby" otherwise there wouldn't be a problem. Interpolation is over vast distances.

"Even with a very strong assumed warming the contribution to global annual average anomaly is negligible - around 1%"

If that were the case then any additional warming in the region would have negligible impact on the global average but that's not the case according to Cowtan and Way for example...

9. Is it irrational to assume the unmeasured region would be warming at the same rate as the measured region?

The point about irrationality concerns the assumption that leaving a cell blank is a neutral choice.

What you're describing by using measured areas to inform unmeasured areas is essentially interpolation, which is a reasonable approach. Given that we're deciding to interpolate, what makes more sense? 1) Interpolating in the Arctic using the global average, which is dominated by the Tropics and Subtropics, or 2) Interpolating in the Arctic using nearest measurements in the Arctic?

If that were the case then any additional warming in the region would have negligible impact on the global average but that's not the case according to Cowtan and Way for example...

I was very clear that I was talking about the impact of interpolating just the 85N-90N region "around the Poles". HadCRUT4 coverage misses most of the area North of 70N, so impact of infilling that is considerably larger: More like a 5% influence. Note that these percentages refer to influence in terms of long-term warming amount (e.g. over the past Century). What Cowtan and Way highlighted was that the Arctic seemed to be warming very strongly (also backed up by strong sea ice decline) over a period when the most of the rest of the planet exhibited near-zero trends (1997-2013). Hence over that short period the Arctic was unusually important for determining the true global average trend.

10. "I was very clear that I was talking about the impact of interpolating just the 85N-90N region "around the Poles"."

But you were responding to "...sea ice boundaries, the polar vortex, periods of darkness for months on end, atmospheric temperature inversions..." none of which are restricted to the area you chose to talk about. Your argument was therefore a strawman argument.

In response to your first point, making up data is very hard. Also extraordinary results require extraordinary evidence. When you make up data and it has a significant impact on the overall measurement then you're in dangerous territory so what you say is "better", I say is much more uncertain and needs to be recognised as such not just acknowledged and then swept under the carpet as we "move on" with our expectations met.

11. That section begins 'Also note, if you're talking about... 85N-90N'. The parameters of how I was interpreting your argument were made completely clear - it's up to you to clarify, which you didn't.

In response to your first point, making up data is very hard.

You effectively make up data any way you do it, including leaving the grid cells empty. What matters is the best approach to make up data in this context. The evidence in support of interpolating over the Arctic being the best approach is strong. Leaving cells blank is, at first order, equally uncertain.

12. "I say is much more uncertain"
It is not more uncertain. As PaulS says, you should give your best estimate. Not doing that is not neutral; it means you settle for a worse estimate. And that means more uncertainty.

We are trying to estimate the whole globe temperature, and can never have more than a finite number of measurement points. Everywhere else must be inferred, and that creates uncertainty. Where data is sparser, the uncertainty is necessarily greater, and that feeds into the overall uncertainty (but doesn't swamp it). That doesn't mean that anything less than the best estimate should be chosen.

13. "It is not more uncertain."

It is more uncertain because you cant test it for the reasons I stated earlier. Interpolating over vast distances is likely to produce a result like Steig's Antarctic warming result which O'Donnell showed was very probably wrong.

So doing that give the "best" estimate? Only if you need it to confirm the warming you're expecting.

14. If there was a problem with Steig et al, it wasn't from stretched interpolation. What they did was to supplement the station data with AVHRR data, which is of uncertain accuracy, but has excellent resolution. O'Donnell did the same. I don't think the outcome was obviously better. My take on that is here.

You actually can test in various ways; Cowtan and Way used Arctic buoy readings. But the thing is that while interpolation over long distances is necessarily uncertain, it is better than anything else.

15. It is more uncertain because you cant test it for the reasons I stated earlier.

But it has been tested, as has been pointed out already. Out of sample data proving the efficacy of interpolation.

Interpolating over vast distances is likely to produce a result like Steig's Antarctic warming result which O'Donnell showed was very probably wrong.

A strange argument since O'Donnell's paper also involved interpolation over vast distances. They used basically the same approach as Steig. Their continent-wide trends agreed within uncertainty.

Only if you need it to confirm the warming you're expecting.

Doesn't make sense. If there were no measured warming there would be nothing to interpolate.

16. "But it has been tested, as has been pointed out already. Out of sample data proving the efficacy of interpolation."

We're going to have to agree to disagree on this. There is nothing in or out of sample to test when you have no readings in the region. If you disagree with the suggestion that temperature fields around the poles may behave differently that those elsewhere on the earth, then fine. But yours is not how science works.

"A strange argument since O'Donnell's paper also involved interpolation over vast distances."

And found the warming was likely more restricted to the areas the warming was being measured. Which is currently used in the global temperature reconstructions as is the warming that is measured at the Arctic. In the case of C&W they didn't so much "smear" the warming like Steig did, they just expand it and thus add to the warming trend for the region.

So Steig and O'Donnell's approaches didn't change things so their uncertainties would be about the same but C&W's approach does so its uncertainty would be greater.

17. There is nothing in or out of sample to test when you have no readings in the region.

Read the paper. They tested against Arctic buoys, which are out-of-sample.

If you disagree with the suggestion that temperature fields around the poles may behave differently that those elsewhere on the earth

Nowhere have I disagreed with that. Indeed, suggesting that the nearest measurements to that region would provide the best possible guide is clear acknowledgement of that difference. On the other hand, your solution of allowing the Arctic to be represented by the global average - dominated by the lower latitudes - certainly does not agree with the idea of the Poles behaving differently. Make your case for why the Poles should be represented by the global average.

But yours is not how science works.

It's exactly how science works.

So Steig and O'Donnell's approaches didn't change things so their uncertainties would be about the same but C&W's approach does so its uncertainty would be greater.

What creates the uncertainty is the lack of measurements in the region. What would be your a priori uncertainty range for average Upper Arctic anomalies. -10K - 10K? -20K - 20K? The purpose of C&W, along with other methods, is both to provide a better constraint on uncertainty and a better best estimate. At the very least C&W cannot have increased uncertainty (If you believe it did, that simply means you underestimated the true uncertainty). As Nick says, the evidence suggests they have actually reduced uncertainty.

18. I'll try to simplify things. There are three options available, we have to choose one:

1) Infill missing Arctic cells with a zero anomaly (i.e no warming).

2) Infill missing Arctic cells with the global average anomaly - mostly influenced by tropics and subtropics.

3) Infill missing Arctic cells using the geographically nearest available observations and measured statistical correlation patterns, supported by spatial satellite data and tested against out-of-sample Arctic buoy temperatures.

Which one, and why?

19. You're way off track. You're trying to justify a method of estimating temperatures without understanding what doing that means.

You say "At the very least C&W cannot have increased uncertainty (If you believe it did, that simply means you underestimated the true uncertainty). As Nick says, the evidence suggests they have actually reduced uncertainty."

Ugh. Uncertainty is determined from the data you have, not from the data you make up.

Do you really think a few buoys in the region means we understand the temperature fields and their effects? Its pretty clear we dont.

http://iabp.apl.washington.edu/overview_history.html

"For example, during the summers of 2002 and 2003, colder than normal air temperatures were observed over the Alaskan coast (e.g. Serreze et al. 2003), and yet record minima in sea ice extent were observed. In order to explain this paradox, Rigor and Wallace (2004) hypothesized that these recent minima may be due to changes in the thickness of sea ice blown towards the Alaskan coast by the surface winds."

We might have given high confidence that a larger ice extent would result that year with the little knowledge we had.

20. Cowtan & Way's approach uses nearby data to infer temperature at an unmeasured location. Prior to this, HadCRUT4 assumed that the unmeasured location had temperature equal to the global average.

Plenty of statistical analyses along with the tests given in the C&W paper show that C&W reduces uncertainty in the global average temperature.

21. " Uncertainty is determined from the data you have, not from the data you make up."
No. Uncertainty comes from the inferences that you made in calculating the average. You might have a small amount of impeccable data, but you are more uncertain than if you had a larger amount.

There's nothing unusual about these averaging considerations. People handle it in their daily lives. Suppose you were managing a TV show, and you compare the monthly ratings, which are the average of the daily. Suppose you had a day missing. You don't despair and say we know nothing. Most people would just average the remainder. A more quantitative person might say, what is the lowest of our daily ratings. Assume that and average. Then assume the highest. That gives a range, and measures uncertainty of the average. And because it is only one day, it isn't very much uncertainty.

But you could do better. Suppose the day was a Saturday, a good day. So instead of infilling with an average range, put in the range for an average Saturday. That will be tighter; the lower bound is higher. You have a better average with less uncertainty, just by better infilling.

If you really want to see how infilling and sampling works on a larger scale, think about where those ratings come from.

22. "No. Uncertainty comes from the inferences that you made in calculating the average. You might have a small amount of impeccable data, but you are more uncertain than if you had a larger amount."

Spoken like a true mathematician. Uncertainty doesn't only arise from the sparsity of data! One cannot use statistics to describe a temperature field if one doesn't know how that temperature field looks. Except within very broad parameters.

23. We are removing the uncertainty by deriving models of ENSO which can be used to compensate for the "noise" of temperature variation. Force with the earth wobble periods and the lunisolar cycles and there you go:

http://imageshack.com/a/img923/2548/uUibRE.png

4. Thank you for yuor essay, Nick.
It raises more questions. Here is but one. You write -
"•Compute some "normal" (usually a 30-year period time average for each month) for each station in the network,
•Form local anomalies by subtracting the relevant normal from each reading
•Average the anomalies (usually area-weighted)."
But does not (or can not, and usually does)step 3 produce a new 'normal'? What error is associated with its production? What error is involved in the processes of calculating and subtracting your first 'normal'? Entropy type thoughts say no pain, no gain. You can get a different normal each time you use daily, monthly, seasonal, annual data for your 30 year period; then in GISS style, by frequent later adjustments to data, you really have to calculate a new anomaly with new look at a data set, also if it has been adjusted from the starting set. This is easily missed and it can lead to more errors.
I'm not used to assignment of separate error sources except for their final combination into overall. If it is done, it seems best to calculate the largest plausible overall error, then try to identify the largest isolated contributor, then see if that can be reduced. In a well managed data set, the main errors are in the unworked originals and in a bad one, the main errors might arise from adjustment.
There is still no argument to convince me that the present error bounds on land station T data are at all realistic. I write this partly because I feel that the final error bounds should cover at least the 'raw' and adjusted sets if there are both, since both at times can be used as valid for particular applications. Or conversely, time might not be wasted on some new applications if the reality of the wide spread of error bounds suspected, is there.
Geoff.

1. Geoff,
"But does not (or can not, and usually does)step 3 produce a new 'normal'?"
It is intended to make the normal zero. And within the anomaly time frame, it pretty well does. But yes, outside that frame regions can drift in a way that the expected values are not zero. The deviation is far less than the original temperatures with all their seasonal, latitude, altitude variability. But there is some. There are two noted cases where that has caused trouble:
1. HADCRUT and the Arctic, as adddressed by Cowtan and Way. The Arctic has warmed to an extent that makes its anomaly systematically different, and HADCRUT was undersampling it. If the anomalies were truly relative to expected value, this wouldn't matter, but they do. C&W restored the proper sampling.
2. Marcott set anomalies to a period about 5000 BP. But over the next 5000 years there was drift, so again near 0BP proxies no longer had extected value zero. So it mattered when they dropped out of the sample. This led to the spike.

TempLS deals with this by not nominating a fixed period, but correcting for global drift. It may still be affected by regional variation.

As far as entropy is concerned, that comes back to the criterion here, which is the reduction of location uncertainty (what if you tried other places?). Any anomaly base reduces that drastically, because it removed lat/alt/season. But as I showed above, choosing a close base does a little better. In a sense, it is entropy maximisation. You want to remove all predictability to get iid residuals.

"I'm not used to assignment of separate error sources"
It is a big feature of Brohan (2006) UKMO style analysis, which they carry on to realise with ensembles. You have a rational basis for estimating the parts; starting with the whole is a guess.

"There is still no argument to convince me that the present error bounds on land station T data are at all realistic."
It's always worth thinking about what they really mean. One thing rarely considered wrt a time series is that they are not independent. This comes into play with thinking about the anomaly average itself contributing error. It dos (small), but it is of the form of a constant added to all times. So it doesn't affect trends, shape, or anything you really care about. Does it matter for anything if Wagga was 0.2°C cooler than you thought (for all time)? No, that could happen if they had just put the station on a hill. But error in normal has that effect and is counted in the overall error.