There has been quite a kerfuffle about USHCN adjustments. There was a WUWT post on reasons for the spike, a recalc at Steven Goddard's site, and more here and here. But the basic elementary issue is little understood. You need to be very careful doing arithmetic with averages of data from disparate situations. I'll show why.
I introduce three annual averages. F1 and R1 are averages of Final and Raw USCHN data over just the set where both numbers are known. Then F2 is the average of Final, in each year, over data where the raw is not known.
My earlier calculation was F1-R1, or equivalently, the average difference between final and raw when both are known. This is clearly a measure of adjustment. Steven Goddard's variants all include F2 in some way. I'll show that this is never helpful, and leads to silly results.
First, a simple rule for combining averages. If you have a set S of N numbers, made up of subsets S1 (N1) and S2 (N2), and the respective averages are A, A1 and A2, then
A=p*A1+q*A2, where p=N1/N, q=N2/N and p+q=1
In averaging USHCN data with adjustments, we have in each year a set S1 of N1 station/months that have both raw and final data, and a set S2 of N2 that have final alone. Call F1 and R1 the averages of final and raw on S1, and F2 the average of final on S2.
Caveat: I have started from 1900 from where it is almost true that all stations have final data. There are just a few (max 3) missing in some years of the first decade. I believe that makes no difference to the analysis.
Steven Goddard (SG), took the average of all raw from all final:
A_S = p*F1 + q*F2 - R1 = (1-q)*F1 + q*F2 - R1 = (F1 - R1) + q*(F2-F1)
I took the averages of differences where they exist, which is exactly:
A_N = F1 - R1
So A_N is indeed an average of known adjustment differences. SG has combined it with something else. What does that achieve?
SG insists that it adds, or retains, information about the interpolation (F2). But we don't have a difference between interpolated and raw. Instead, he has combined it with F2-F1, which is the difference between two sets of final values.
This really makes no sense. But it led to the "spike". The reason is that the interpolated points are more frequent in the latest month, April. And that is the warmest month. While R1 is on the complementary set, which is the cooler period. The small adjustment makes little difference here, compared to seasonal. In fact, for recent years, raw and adjusted are virtually identical.
I'll illustrate. I'll start with a plot of p, the fraction of stations reporting. For a long time it was almost one. In the early part there is more loss, and since 1990 the number of active stations has reduced. There is a sharp dip in 2014, because some data comes in late; this mostly affects the most recent month. (Actually, the dip is exaggerated in the plot because it counts May-Dec 2014 as missing)
Now here are the three data average sequences, R1, F1, F2; everything else is a linear combination of them:
The downspike in 2014 is due to being mostly winter so far; a caution of troubles to come with absolute temperatures. The next plot is smoothed (7 year MA) to compare with the SG plot here, where "Raw" is R1 and "Final" is A_S, which I've called SG (average of all final)SG. The "new method" there is averaging by months first which makes a big difference for 2014, but else not. More later on that. Here I've removed 2014.
SG claims that the adjustment turns the downtrend raw R1 (blue) into the uptrend SG (purple). I think the right comparison is with F1 (red). It actually doesn't make a big difference in the midrange, where there is little missing data. But at the ends it does, and SG exaggerates the uptrend produced by adjustment.
Those are absolute plots. Next I'll subtract R1 from F1 and SG, and also show the difference curve. We agree that R1 is the right measure of "raw".
The green SG-R1 was the original spike plot shown at WUWT. The blue F1-R1 was my corrected version. Again you can see that they track mid-range, but diverge near the ends, giving SG-R1 a much greater range. The red curve is the extra bit that he has included. You can see that the difference generates the spike. The reason is that in F2-F1, April has a lot more missing raw values than other months. So F1 is a much more wintry set of data than F2.
I've contended that introducing F2-F1, via SG, makes no sense. Steven Goddard obstinately disagrees, claiming that he's preserving something about the final adjustments. The next plot puts this in perspective. I compare that inclusion with what you get if you do the same calc with data that is just the climatology - ie the average raw temperature, for each month, but constant over all years. I've called that Z, and like F it breaks into Z1 and Z2 on S1 and S2. This has no information about the annual weather, or the adjustments
It shows that almost all the introduced term F2-F1 is accounted for by the variation in the climatology - ie different kinds of stations reporting for each month. Between months matters more than between stations, because of seasonality which is a systematic shift for all stations. The green curve is what you get after multiplying by q, and would be the same for q*(F2-F1). But that is exactly the difference between SG's curve and mine. And it shows clearly the 2014 spike, which is entirely predictable from knowing average (not 2014) temperatures only.
Averaging months firstAs mentioned above, SG has an updated method. He gets the annual average by averaging the data for each month first, and then by year. You might think that shouldn't matter, and the fact that it does is a bad sign. It makes little difference to F1-R1, the correct measure. But it does to the added F2-F1.
We might as well think about this as generating a monthly plot, which can then be binned to annual. It removes the spike because F2-F1 is now evaluated on the same month. Each month is weighted equally. But there is still the difference due to the stations reporting having on average warmer or colder climates than those that don't. Maybe by luck they will be the same. But there's no reason to rely on it. There's nothing to gain, and accuracy to lose. Here is the comparison of the difference and the climatology only version, worked out that way:
Again the green curve is what is added, and has smaller scatter, so when added to F1-R1 it makes less difference (still some spike, though). But again, it's almost exactly what you get calculated by climatology. It adds no information about temperatures, nor adjustments. It's just error.
Where next?I've spent a lot of time on this case because I think it helps get thinking on averages straightened out. It helped me. I also think it's worth getting the calculation of the effect of adjustments right. There are effects, and they do have in total an uptrend effect. That's the way it is. But there is no need to bewail the way it isn't.
I'll write more on averaging and temperature indices. It was the basic issue that Cowtan and Way found with HADCRUt 4. The issues are non-trivial.