moyhu: Averaging temperature data improves accuracy.

Friday, April 22, 2016

Averaging temperature data improves accuracy.

I've been arguing again at WUWT. It's a bizarre but quite interesting thread. There is a contrarian meme that asks how global averages can be quoted to maybe two-decimal accuracy, when many of the thermometers might have resolved to just one degree. I tried to deal with that here, showing that adding noise of 1°C amplitude to monthly averages made very little difference to the global average.

But the meme persists, and metrology handbooks get quoted - here a JGCM guide. But the theory quoted is for a single measurement, where repeated measurements can't overcome lack of resolution. But that isn't what is happening in climate. Instead a whole lot of different measurements are averaged.

Of course, averaging does improve accuracy. That's why people incur cost to obtain large samples. In this post, I'll follow my comment at WUWT by taking 13 months of recent daily max in Melbourne, given by BoM to 1 decimal place, and show that if you round off that decimal, emulating a thermometer reading to nearest degree, the difference to the monthly average is only of order 0.05°C; far less than the reduction in resolution. But first, I'll outline some of the theory.

Law of Large Numbers

This goes back to Bernoulli. There was much confusion at WUWT with the central limit theorem, which is not at all the same. The Law of Large Numbers (LoLN) deals with convergence of a sample mean to a population mean with larger samples (lots of formulations) whereas the CLT makes the more interesting claim that the sample mean, as a random variable itself, tends toward a normal distribution, even though the individual samples may not have been normally distributed. There are of course caveats.

The LoLN is what is needed here, and at WUWT a somewhat informal Wiki statement was mentioned: "The average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed." The author (whose comment oddly disappeared) was reproved by Willis, dissing wiki and preferring: "The law of large numbers formulated in modern mathematical language reads as follows: assume that X1, X2, . . . is a sequence of uncorrelated and identically distributed random variables having finite mean μ …" and emphasising uncorrelated, iid etc.

The general idea of LoLN seems simple nowadays. If you add two independent random variables, the variance of the sum is the sum of the variances (subject to conditions like that they actually do have variances, but not requiring normality or identical distributions). If you have a set of independent random variables ε_i, consider a weighted average
A = Σ w_iε_i, with Σ w_i = 1
Scaling can be absorbed in the weights, so they might as well be unit variables. Then the variance of A is Σ w²_i. If it is a simple mean of N variables, w=1/N and the sum is 1/N. But if not, or if the variables have different variance, the convergence of the mean is still just a property of that diminishing sum.

What about correlation? If the unit variables have a correlation matrix K, then the combined variance is Σ w_iK_ijw_j. Does that converge? Well, it depends on K. If its coefficients do not tend to zero away from the diagonal, it may not. Again if w are uniform 1/N, the sum will be over all N² coefficients. But usually correlation does diminish as the variables become more separated in time or space.

I've included this to show where LoLN comes from, and that lack of iid is not a show stopper.

Resolution

To be specific, suppose we have a thermometer read to an accuracy of 1°C, and a succession of temperatures T are coming in with a spread much larger than 1. Suppose we actually know the T values, but they are then read to resolution - ie rounded.

This is equivalent to displacing each reading T_i by an amount ε_i up to 0.5°C to nearest integer. That JCGM guide puts it thus (via Pat Frank at WUWT):

"If the resolution of the indicating device is δx, the value of the stimulus that produces a given indication X can lie with equal probability anywhere in the interval X − δx/2 to X + δx/2. The stimulus is thus described by a rectangular probability distribution of width δx with variance u^2 = (δx)^2/12, implying a standard uncertainty of u = 0.29δx for any indication."

So the cost to accuracy of the mean is a mean of those variables. It is very reasonable to assume them independent. Although temperatures themselves may be correlated, the fractional parts will be much less so, if the assumption that the resolution is well finer than the total temperature range holds. The distributions are uniform, so the standard error of the mean of N such is sqrt(1/12/N). As such, it tends to zero with large N. That is, the mean discrepancy between rounded and exact, Σ ε_i/N, behaves like sqrt(1/12/N).

You may say, what if the rounding isn't perfect? What if, say, .4 is sometimes rounded up instead of down. That just changed the uniform distribution to something similar with a slightly different variance.

Example - Melbourne maxima.

On pages like this, BoM shows the daily max for each recent month in Melbourne, to one decimal place. I have placed here a zipfile which contains a RData file (to load in R) called melb12.sav, which has a list of dataframes with full data for those months. There is also a file called melb13.csv, which has just the maximum temperatures that were used in this test. Here is last month (Mar):

33.7 34.7 23.9 33.0 23.7 25.2 24.9 38.9 28.5 22.1 26.1 22.3 23.2 21.3 26.8 31.4 32.5 19.5 18.8 23.3 23.5 24.3 28.8 21.2 20.4 20.2 19.9 19.2 17.9 18.7 22.7

Suppose we had a thermometer reading to only 1°C - so all these were rounded, as in the JCGM description. For the last 13 months, here are the means for the BoM (1 dp) and for that thermometer:

      Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   dec   Jan   Feb   Mar
1 dp: 22.72 19.24 17.13 14.43 13.29 13.85 17.26 24.33 22.73 27.45 25.98 25.1  24.86
0 dp: 22.77 19.27 17.13 14.37 13.29 13.84 17.33 24.35 22.67 27.48 26    25.17 24.84
diff:  0.05 0.03    0   -0.06    0  -0.01  0.08  0.03 -0.06  0.03  0.02  0.08 -0.02

The middle row, measured by day to 1°C, has a far more accurate mean than that resolution. As a check, the sd of the difference (bottom row) is expected from above to be sqrt(1/12/31) (slight approx for days in month), which is 0.052. The sd of the diffs shown is 0.045. The monthly average at 1 C resolution is accurate to about 0.05°C.

26 comments:

WindchasersApril 22, 2016 at 10:03 AM
I think there's another point that some of the WUWTers may be stumbling over. The thing we're trying to determine is the global mean temperature; the "random variables" being discussed here should be the errors on any given temperature measurement. It's these errors that are uncorrelated; these are the random independent variables under discussion.

Temperatures are obviously correlated in both time and space.

The spatial and temporal heterogeneity adds an extra layer of complexity to calculating a global mean temperature that is sure to confuse some.
ReplyDelete
Replies
EliRabettApril 22, 2016 at 12:22 PM
A good example of this is the technique of oversampling and decimation used to increase the precision of a analog to digital converter

http://www.atmel.com/images/doc8003.pdf

Might mention it over there Eli is banned
ReplyDelete
Replies
EliRabettApril 22, 2016 at 8:49 PM
Hi Nick, the need to add noise is well illustrated by the example over there that somebunny used of what do you get when you have a ruler with only inch marks and a board to measure. If you don't add noise you get the same number all the time and no improvement in precision. If you add enough unbiased noise to jiggle the measurement between marks on the ruler, the precision will improve.

Anyhow, Eli tried to post this and they did. Only banned mostly now Eli guesses.
ReplyDelete
Replies
MagmaApril 23, 2016 at 3:34 AM
That was a tough slog over at WUWT. The level of aggressive numbskullery there is that site's version of a skunk's odor warning others to keep away. One of the clowns commenting even claimed that "we may even have had cooling over the last 150 years and would not know!"

But it's informative to see the sheer range of misunderstandings and basic errors that so many WUWT contributors and commenters hold just on basic facts on instrumentation and simple statistics. How such individuals can go from that base straight to questioning much more complex analyses is a good example of the Dunning-Kruger effect in action.
ReplyDelete
Replies
TadaaaApril 23, 2016 at 3:45 AM
The problem is that - not understanding the science is unfortunately but inevitably, a barrier to understanding the science

As someone once said - "a good man has got to know his limitations"
ReplyDelete
Replies
Kevin O'NeillApril 24, 2016 at 12:06 AM
As a metrologist, I'm surprised anyone would use metrology as an argument against averaging. One must take a series of readings and average if only to know the short-term repeatability to calculate uncertainties. And of course anyone with half a brain, metrologist or not, quickly understands that averaging adds precision.

Perhaps the mental stumbling block is that averaging readings from one device adds precision - not accuracy, but averaging multiple devices adds both precision and accuracy.

I once performed a simple experiment where I showed co-workers that I could get more accurate results from twenty-five 6 1/2 digit voltmeters than from one 8 1/2 digit voltmeter - even though the 8 1/2 digit voltmeter has a presumed accuracy 50 times better than the 6 1/2 digit voltmeters. I did 'cheat' a little by using statistical bootstrapping to increase the effective sample size from 25 to 1000. I would have to go back and find the final results, but the reduction in error was approximately from 85 ppm for a single 6 1/2 digit voltmeter to low single digit ppm error after bootstrapping.
ReplyDelete
Replies
Victor VenemaApril 24, 2016 at 8:05 AM
It is so embarrassing that these people pretend to know science better than scientists.

There is one case, however, where I do worry a little about reading a thermometer to one degree. In the 19th century, the thermometer to measure the sea surface temperature is typically stored in the warm cabin. It then has to be stirred in a bucket of sea water until the thermometer has reached the temperature of the water. I wonder whether the voluntary observers / sailors waited long enough until the the warm bias is gone within 0.1°C when they read the thermometer to 1°C.

(The abbreviation LOLN is not explained and written as LLON a few times.)
ReplyDelete
Replies
Nick StokesApril 24, 2016 at 8:22 AM
Victor,
Thanks for the LLON warning - all fixed now, I hope. Yes, I said the thread was interesting, but it ended up in sheer nuttiness from Pat Frank.

I tried to stick to just the actual thermometer reading, without getting into whether the reading was of the correct thing, or even whether the thermomoeter had stabilized. Yes, there are certainly ways in which bias could be introduced, even into reading - for example, if people tend to round down when it should be up.

ReplyDelete
Replies
...and Then There's PhysicsApril 25, 2016 at 4:27 PM
As Victor says, it is rather amazing that these people really do seem to think that they understand this better than professionals who've worked on this for a very long time. I wonder if it isn't simply different environments. I've been sitting through scientific seminars for a very long time. Something I certainly learned quite early on is that if you think the speaker has made some kind of silly mistake that it is more likely that I was wrong, or misunderstood what was being done, than the speaker having made the kind of mistake that seemed obvious to someone who had only just encountered their work.
ReplyDelete
Replies
@whutApril 25, 2016 at 11:23 PM
...and Then There's Physics said
"Something I certainly learned quite early on is that if you think the speaker has made some kind of silly mistake that it is more likely that I was wrong"

Good that you are speaking for yourself ... when I listen to a speaker make up "just so" stories to explain climate science, I realize that there is so much more left to understand.

ReplyDelete
Replies
WindchasersApril 26, 2016 at 4:13 AM
“Anti-intellectualism has been a constant thread winding its way through our political and cultural life, nurtured by the false notion that democracy means that 'my ignorance is just as good as your knowledge.'” --Asimov

The problem is: when you have one person who's ignorant of a field and yet certain they're right about it, how do you communicate across that gulf of understanding?

Competence is something that must be earned and then demonstrated. But hell if I can get the 'skeptics' to grok that.
ReplyDelete
Replies
AnonymousJuly 10, 2025 at 9:56 AM
Hi Nick,
I hope you're doing well. This is Janet from WUWT.

I'm currently engaged in a debate with Pat Frank regarding uncertainty in global temperature records. My main argument is that if the uncertainty were as large as Pat claims, we would expect to see significant divergence among independent datasets—particularly between satellite and surface records. But in practice, we don’t. Most discrepancies seem to stem from differences in methodology, not from massive underlying uncertainty.

Pat, however, tends to dismiss independent corroboration, asserting that his paper is correct regardless. When I brought up the strong agreement between the USCRN and the adjusted U.S. surface temperature record, he dismissed it as coincidental. He also argues that temperature adjustments have done more harm than good, particularly outside the U.S.

As support for his view, he linked to an older article by Willis from December 2009:
https://wattsupwiththat.com/2009/12/08/the-smoking-gun-at-darwin-zero/

I'm not entirely sure how to interpret this article. Given that this is a WUWT article, I am skeptical. I'm not as familiar with the process of homogenization as you are, which is why I wanted to reach out to you for your perspective on this article.

Our conversation is ongoing and can be found here:
https://wattsupwiththat.com/2025/07/08/climate-oscillations-7-the-pacific-mean-sst/#comment-4091447
ReplyDelete
Replies

Add comment

An interactive topic index for all Moyhu posts.
Latest Ice and Temperature data
Climate Data Portals
A gallery of Javascript-enhanced graphics
Temperature trend viewer
Google Maps and GHCN
WebGL map of past GHCN/SST station temperatures
WebGL map of GHCN/SST station temperature trends
HiRes NOAA OI SST with WebGL and Movie
Regional Hi-Res SST movies
WebGL Facility
TempLS Guide
More pages, and blog glossary

moyhu

Friday, April 22, 2016

Averaging temperature data improves accuracy.

Averaging temperature data improves accuracy.

Law of Large Numbers

Resolution

Example - Melbourne maxima.

26 comments:

Maintained Pages

Search This Blog

Recent Comments

Blogroll

Blog Archive

Translate

Resources

About Me

moyhu

Friday, April 22, 2016

Averaging temperature data improves accuracy.

Averaging temperature data improves accuracy.

Law of Large Numbers

Resolution

Example - Melbourne maxima.

26 comments:

Maintained Pages

Search This Blog

Recent Comments

Blogroll

Subscribe To

Blog Archive

Translate

Resources

About Me