Wednesday, May 14, 2014

USHCN, adjustments, averages, getting it right.

There has been quite a kerfuffle about USHCN adjustments. There was a WUWT post on reasons for the spike, a recalc at Steven Goddard's site, and more here and here. But the basic elementary issue is little understood. You need to be very careful doing arithmetic with averages of data from disparate situations. I'll show why.

I introduce three annual averages. F1 and R1 are averages of Final and Raw USCHN data over just the set where both numbers are known. Then F2 is the average of Final, in each year, over data where the raw is not known.

My earlier calculation was F1-R1, or equivalently, the average difference between final and raw when both are known. This is clearly a measure of adjustment. Steven Goddard's variants all include F2 in some way. I'll show that this is never helpful, and leads to silly results.

First, a simple rule for combining averages. If you have a set S of N numbers, made up of subsets S1 (N1) and S2 (N2), and the respective averages are A, A1 and A2, then
A=p*A1+q*A2, where p=N1/N, q=N2/N and p+q=1
{Proof N*A=N1*A1+N2*A2)

In averaging USHCN data with adjustments, we have in each year a set S1 of N1 station/months that have both raw and final data, and a set S2 of N2 that have final alone. Call F1 and R1 the averages of final and raw on S1, and F2 the average of final on S2.

Caveat: I have started from 1900 from where it is almost true that all stations have final data. There are just a few (max 3) missing in some years of the first decade. I believe that makes no difference to the analysis.

Steven Goddard (SG), took the average of all raw from all final:
A_S = p*F1 + q*F2 - R1 = (1-q)*F1 + q*F2 - R1 = (F1 - R1) + q*(F2-F1)
I took the averages of differences where they exist, which is exactly:
A_N = F1 - R1

So A_N is indeed an average of known adjustment differences. SG has combined it with something else. What does that achieve?

SG insists that it adds, or retains, information about the interpolation (F2). But we don't have a difference between interpolated and raw. Instead, he has combined it with F2-F1, which is the difference between two sets of final values.

This really makes no sense. But it led to the "spike". The reason is that the interpolated points are more frequent in the latest month, April. And that is the warmest month. While R1 is on the complementary set, which is the cooler period. The small adjustment makes little difference here, compared to seasonal. In fact, for recent years, raw and adjusted are virtually identical.

I'll illustrate. I'll start with a plot of p, the fraction of stations reporting. For a long time it was almost one. In the early part there is more loss, and since 1990 the number of active stations has reduced. There is a sharp dip in 2014, because some data comes in late; this mostly affects the most recent month. (Actually, the dip is exaggerated in the plot because it counts May-Dec 2014 as missing)

Now here are the three data average sequences, R1, F1, F2; everything else is a linear combination of them:

The downspike in 2014 is due to being mostly winter so far; a caution of troubles to come with absolute temperatures. The next plot is smoothed (7 year MA) to compare with the SG plot here, where "Raw" is R1 and "Final" is A_S, which I've called SG (average of all final)SG. The "new method" there is averaging by months first which makes a big difference for 2014, but else not. More later on that. Here I've removed 2014.

SG claims that the adjustment turns the downtrend raw R1 (blue) into the uptrend SG (purple). I think the right comparison is with F1 (red). It actually doesn't make a big difference in the midrange, where there is little missing data. But at the ends it does, and SG exaggerates the uptrend produced by adjustment.

Those are absolute plots. Next I'll subtract R1 from F1 and SG, and also show the difference curve. We agree that R1 is the right measure of "raw".

The green SG-R1 was the original spike plot shown at WUWT. The blue F1-R1 was my corrected version. Again you can see that they track mid-range, but diverge near the ends, giving SG-R1 a much greater range. The red curve is the extra bit that he has included. You can see that the difference generates the spike. The reason is that in F2-F1, April has a lot more missing raw values than other months. So F1 is a much more wintry set of data than F2.

I've contended that introducing F2-F1, via SG, makes no sense. Steven Goddard obstinately disagrees, claiming that he's preserving something about the final adjustments. The next plot puts this in perspective. I compare that inclusion with what you get if you do the same calc with data that is just the climatology - ie the average raw temperature, for each month, but constant over all years. I've called that Z, and like F it breaks into Z1 and Z2 on S1 and S2. This has no information about the annual weather, or the adjustments

It shows that almost all the introduced term F2-F1 is accounted for by the variation in the climatology - ie different kinds of stations reporting for each month. Between months matters more than between stations, because of seasonality which is a systematic shift for all stations. The green curve is what you get after multiplying by q, and would be the same for q*(F2-F1). But that is exactly the difference between SG's curve and mine.  And it shows clearly the 2014 spike, which is entirely predictable from knowing average (not 2014) temperatures only.

Averaging months first

As mentioned above, SG has an updated method. He gets the annual average by averaging the data for each month first, and then by year. You might think that shouldn't matter, and the fact that it does is a bad sign. It makes little difference to F1-R1, the correct measure. But it does to the added F2-F1.

We might as well think about this as generating a monthly plot, which can then be binned to annual. It removes the spike because F2-F1 is now evaluated on the same month. Each month is weighted equally.  But there is still the difference due to the stations reporting having on average warmer or colder climates than those that don't. Maybe by luck they will be the same. But there's no reason to rely on it. There's nothing to gain, and accuracy to lose. Here is the comparison of the difference and the climatology only version, worked out that way:

Again the green curve is what is added, and has smaller scatter, so when added to F1-R1 it makes less difference (still some spike, though). But again, it's almost exactly what you get calculated by climatology. It adds no information about temperatures, nor adjustments. It's just error.

Where next?

I've spent a lot of time on this case because I think it helps get thinking on averages straightened out. It helped me. I also think it's worth getting the calculation of the effect of adjustments right. There are effects, and they do have in total an uptrend effect. That's the way it is. But there is no need to bewail the way it isn't.

I'll write more on averaging and temperature indices. It was the basic issue that Cowtan and Way found with HADCRUt 4. The issues are non-trivial.


  1. I linked to it over in the WUWT post, but Lucia's Spherical Cow model is worth revisiting as a simple explanation of why its generally better to work with anomalies rather than absolutes unless the composition of the network has remained constant:

    Ironically, much of the discussion recently has been a replay of the debate we were having back when the whole "march of the thermometers" meme was being debunked. If you use absolute temperatures and your station network composition is changing over time, you will tend to get climatology-related artifacts that are comparable in magnitude to any trend effects you are looking for.

    1. I've been curious about why USHCN produces an absolute temperature. I presume there is history there. I believe they treat pre-1931 differently because, I gather, when they took over from state dominant arrangements in 1931, some things were hard to change. But still, 1931?

      It does seem to cause a lot of trouble, and I wonder how many people really want to know that it was 54°F in the USA.

    2. I can think of reasons, but my guess is there are paying customers who ask for it (e.g., other agencies in the federal government). Agricultural science uses absolute temperature for one. Charts that show growing zones would be very difficult to generate with an anomaly map. ;-)

    3. Carrick,
      Yes, but that's climatology. We had maps like that in our school atlas. But who needs to know the US average temperature?

      I think you're right that there may be pressure from parts of fed etc, which probably predates paying days. But I'm still puzzled.

  2. That's a great explanation of the spike. Thanks!

  3. Nick,
    From time to time reference is made to trend or lack of trend in numbers of record high (or low) temperatures. This reference is a part of the Line of Evidence by which a group of meteorologists sought to counter US government evidence used in a Supreme Court Case involving our EPA.

    I would think it possible to have increasing numbers of record highs in a climate with descending mean temperatures and if that is possible, then little significance should attach to increased record numbers. Am I missing something here?

    1. FWIW that would require substantial regional changes in climate (climatology) which might be possible in a continental nation such as the US, but not very possible.

    2. JF,
      The focus on records seems to be a particularly US thing. I think it's an understandable measure in some ways, being not affected by TOBS etc. But it is dependent on the reliability of old measuring; errors tend to become records. And it is dependent on the history of stations. What if the hottest place in your state is discontinued (or a new station there)? Also I think they accept records from flakier sources.

      But if all that turns out OK, then I think there should be increasing new records. I did a study of hot times in my home town here.

    3. Eli, could you expand a bit on your point? I don't understand what you are driving at.

      I may have been confusing tortoise and hare (sorry) with climate characteristics and have no dog in the race, but the thing I was trying to get at, is whether there is intellectual authority behind the notion that increasing numbers of high temp records indicates general warming. I concede that it is likely, but are we certain?

      Nick, I like the OBS observation where the record is trapped by the sliders in the thermometers and it matters not which day it happened on. I do think that we in the US are now highly "extreme" driven. I wouldn't at all be surprised that a word frequency count of the all of the evening news programs would produce a much higher number now than ten years ago - and maybe the occasional "record" too.

    4. "Extreme" would be the word counted in convoluted comment above.

  4. Nick, do you have your colors mixed up in the figure "smoothed averages"? If the deviation between F1 and F2 is so large something is funny in the adjustments.

    1. Eli,
      F2 is the average of the set of final values where there is no corresponding raw. From about 1920 to 1990 that was a very small set, so the average wanders. It is multiplied by a small p, so doesn't matter very much.

      I think at the ends it reflects the biases I am trying to illustrate. Spectacularly in 2014, because of April having many missing and being warmer than rest of 2014 so far. But in other years, I think from other biases. It probably mostly means missing values are more common in winter. There may also be a bias in where stations dropped out after 1990.