moyhu: How bad is naive temperature averaging?

Friday, May 22, 2015

How bad is naive temperature averaging?

In my last post, I described as "naive averaging" the idea of calculating an average temperature anomaly G by simply subtracting from each of a number of local records the (varying) lifetime average, and averaging the resulting differences. There, and in an earlier post I gave simple examples of why it didn't work. And in that last post, I showed how the naive average could be made right by iteration.

The underlying principle is that in making an anomaly you should subtract your best estimate of what the value would be. That leaves the question of how good does "best" have to be; it has to be good enough to resolve the thing you are trying to deduce. If that is the change of global temperature, your estimate has to be accurate to the effect of that change.

If you add a global G to a station mean, then the mean of the result isn't right unless mean G is zero. There is freedom to set average G over an interval, but only one - not over all station intervals. So as in the standard method, you can set G to have mean zero over a period like 1961-90, and use station means over that period as the offsets. Providing there are observations there, which is the rub. But much work has been done on methods for this, which itself is evidence that the faults of naive averaging are well known.

Anyway, here I want to check just how much difference the real variation of intervals in temperature datasets makes, and whether the iterations (and TempLS) do correct it properly. I take my usual collection of GHCN V3 and ERSST data, at 9875 locations total, and the associated area-based weights, which are zero for all missing values. But then I change the data to a uniformly rising value - in fact, equal to the time in century units since the start in 1899. That implies a uniform trend of 1 C/century. Do the various methods recover that?

Update - I made a small error in trends below. I used unweighted regressions, which allowed the final months of 2015 to be entered as zeroes. I have fixed this, and put the old tables in faint gray beside. The plots were unaffected. I have not corrected the small error I noted in the comments, but now it is even smaller - a converged trend of 0.9998 instead of 1.

So I'll start with land only (GHCN) data - 7280 stations - because most of the naive averaging is done with land stations. Here is the iterative sequence:

Iter	Trend	RMS
1	0.547165	4.079662
2	0.781293	1.23042
3	0.89165	0.529091
4	0.945758	0.251682
5	0.972713	0.12399
6	0.986226	0.061898
10	0.999021	0.003962
20	0.999899	4e-06

Iter	Trend	RMS
1	0.535192	4.07986
2	0.777397	1.230401
3	0.889767	0.529098
4	0.944707	0.251689
5	0.972055	0.123994
6	0.985764	0.0619
10	0.998745	0.003962
20	0.999636	4e-06

As in the last post, the first iteration is the naive calculation, where the offsets are just the means over the record length. RMS is a normalised sqrt sum squares of residuals. As you see, the trend for that first step was about half the final value. And it does converge to 1C/Cen, as it should. This is the value TempLS would return.

I should mention that I normalised the global result at each step relative to the year 2014. Normally this can be left to the end, and would be done for a range of years. Because of the uniformity of the data, setting to a one year anomaly base is adequate. For annual data, base setting is just the addition of a constant. For monthly data, there are twelve different constants, so the anomaly base setting has the real effect of adjusting months relative to each other. It makes a small difference within each year, but if not set till the end, the convergence of the trend to 1 is not so evident, though the final result is the same.

So normalised to 2014, here is the plot of the global temperatures at each iteration:

The first naive iteration deviates quite substantially, with reduced trend. The deviation is due solely to the pattern of missing values.

Now I'll do the same calculation including sea surface temperature - the normal TempLS range. The effects are subdued, because SST grid values don't generally have start and end years like met stations, even though they may have missing values.

Iter	Trend	RMS
1	0.828193	5.189208
2	0.968292	0.625476
3	0.993962	0.100924
4	0.998822	0.018172
5	0.999756	0.003425
6	0.999936	0.000658
10	0.999979	1e-06
20	0.999979	0

Iter	Trend	RMS
1	0.814107	5.189285
2	0.966456	0.625463
3	0.993393	0.100923
4	0.99844	0.018172
5	0.999406	0.003425
6	0.999592	0.000658
10	0.999637	1e-06
20	0.999637	0

The corresponding plot of iterative G curves is:

The deviation of the first step is reduced, but still considerable. Convergence is faster.

8 comments:

Clive BestMay 24, 2015 at 9:40 PM
Very nice !

" I should mention that I normalised the global result at each step relative to the year 2014. Normally this can be left to the end, and would be done for a range of years"

In the real world how do you normalise TempLS? Do you calculate 12 monthly normals 1961-1990 ? If not how do you avoid doing so. NCDC and Berkeley use an earlier normalisation 1959-1980 which then causes about a 0.15C offset compared to CRU and GISS.

ReplyDelete
Replies
Clive BestMay 25, 2015 at 7:03 PM
I had assumed that the station offsets as described by Tamino had the advantage that all station data could be incuded. However, when I tried to normalise these station offsets to a fixed period 1961-1990 it didn't work.

Anom(mys) = T(mys) - (norm(m,lat,lon) - offset(ms)

Now I realise why. The problem with using any fixed period for the 12 normals is that it only applies for stations with measurements in that period. The normalisation itself is affected by these 'missing stations'. The only way round this is to exclude these stations completely or to interpolate their values into the normalisation period. However I strongly suspect that interpolation itself exaggerates any warming trends. I think Berkeley Earth suffers from this because they have to interpolate long times and distances to incude all stations. Too much smoothing reinforces underlying trends.

You either use all stations and normalise to the full period or use a fixed period and discard those stations with insufficient data.
ReplyDelete
Replies

Add comment

An interactive topic index for all Moyhu posts.
Latest Ice and Temperature data
Climate Data Portals
A gallery of Javascript-enhanced graphics
Temperature trend viewer
Google Maps and GHCN
WebGL map of past GHCN/SST station temperatures
WebGL map of GHCN/SST station temperature trends
HiRes NOAA OI SST with WebGL and Movie
Regional Hi-Res SST movies
WebGL Facility
TempLS Guide
More pages, and blog glossary

moyhu

Friday, May 22, 2015

How bad is naive temperature averaging?

How bad is naive temperature averaging?

8 comments:

Search This Blog

Maintained Pages

Recent Comments

Blogroll

Blog Archive

Translate

Resources

About Me

moyhu

Friday, May 22, 2015

How bad is naive temperature averaging?

How bad is naive temperature averaging?

8 comments:

Search This Blog

Maintained Pages

Recent Comments

Blogroll

Subscribe To

Blog Archive

Translate

Resources

About Me