Friday, April 22, 2016

Averaging temperature data improves accuracy.

I've been arguing again at WUWT. It's a bizarre but quite interesting thread. There is a contrarian meme that asks how global averages can be quoted to maybe two-decimal accuracy, when many of the thermometers might have resolved to just one degree. I tried to deal with that here, showing that adding noise of 1°C amplitude to monthly averages made very little difference to the global average.

But the meme persists, and metrology handbooks get quoted - here a JGCM guide. But the theory quoted is for a single measurement, where repeated measurements can't overcome lack of resolution. But that isn't what is happening in climate. Instead a whole lot of different measurements are averaged.

Of course, averaging does improve accuracy. That's why people incur cost to obtain large samples. In this post, I'll follow my comment at WUWT by taking 13 months of recent daily max in Melbourne, given by BoM to 1 decimal place, and show that if you round off that decimal, emulating a thermometer reading to nearest degree, the difference to the monthly average is only of order 0.05°C; far less than the reduction in resolution. But first, I'll outline some of the theory.

Law of Large Numbers

This goes back to Bernoulli. There was much confusion at WUWT with the central limit theorem, which is not at all the same. The Law of Large Numbers (LoLN) deals with convergence of a sample mean to a population mean with larger samples (lots of formulations) whereas the CLT makes the more interesting claim that the sample mean, as a random variable itself, tends toward a normal distribution, even though the individual samples may not have been normally distributed. There are of course caveats.

The  LoLN is what is needed here, and at WUWT a somewhat informal Wiki statement was mentioned: "The average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed." The author (whose comment oddly disappeared) was reproved by Willis, dissing wiki and preferring: "The law of large numbers formulated in modern mathematical language reads as follows: assume that X1, X2, . . . is a sequence of uncorrelated and identically distributed random variables having finite mean μ …" and emphasising uncorrelated, iid etc.

The general idea of LoLN seems simple nowadays. If you add two independent random variables, the variance of the sum is the sum of the variances (subject to conditions like that they actually do have variances, but not requiring normality or identical distributions). If you have a set of independent random variables εi, consider a weighted average
A = Σ wiεi, with Σ wi = 1
Scaling can be absorbed in the weights, so they might as well be unit variables. Then the variance of A is Σ w²i. If it is a simple mean of N variables, w=1/N and the sum is 1/N. But if not, or if the variables have different variance, the convergence of the mean is still just a property of that diminishing sum.

What about correlation? If the unit variables have a correlation matrix K, then the combined variance is Σ wiKijwj. Does that converge? Well, it depends on K. If its coefficients do not tend to zero away from the diagonal, it may not. Again if w are uniform 1/N, the sum will be over all N² coefficients. But usually correlation does diminish as the variables become more separated in time or space.

I've included this to show where LoLN comes from, and that lack of iid is not a show stopper.


To be specific, suppose we have a thermometer read to an accuracy of 1°C, and a succession of temperatures T are coming in with a spread much larger than 1. Suppose we actually know the T values, but they are then read to resolution - ie rounded.

This is equivalent to displacing each reading Ti by an amount εi up to 0.5°C to nearest integer. That JCGM guide puts it thus (via Pat Frank at WUWT):

"If the resolution of the indicating device is δx, the value of the stimulus that produces a given indication X can lie with equal probability anywhere in the interval X − δx/2 to X + δx/2. The stimulus is thus described by a rectangular probability distribution of width δx with variance u^2 = (δx)^2/12, implying a standard uncertainty of u = 0.29δx for any indication."

So the cost to accuracy of the mean is a mean of those variables. It is very reasonable to assume them independent. Although temperatures themselves may be correlated, the fractional parts will be much less so, if the assumption that the resolution is well finer than the total temperature range holds. The distributions are uniform, so the standard error of the mean of N such is sqrt(1/12/N). As such, it tends to zero with large N. That is, the mean discrepancy between rounded and exact, Σ εi/N, behaves like sqrt(1/12/N).

You may say, what if the rounding isn't perfect? What if, say, .4 is sometimes rounded up instead of down. That just changed the uniform distribution to something similar with a slightly different variance.

Example - Melbourne maxima.

On pages like this, BoM shows the daily max for each recent month in Melbourne, to one decimal place. I have placed here a zipfile which contains a RData file (to load in R) called melb12.sav, which has a list of dataframes with full data for those months. There is also a file called melb13.csv, which has just the maximum temperatures that were used in this test. Here is last month (Mar):

33.7 34.7 23.9 33.0 23.7 25.2 24.9 38.9 28.5 22.1 26.1 22.3 23.2 21.3 26.8 31.4 32.5 19.5 18.8 23.3 23.5 24.3 28.8 21.2 20.4 20.2 19.9 19.2 17.9 18.7 22.7

Suppose we had a thermometer reading to only 1°C - so all these were rounded, as in the JCGM description. For the last 13 months, here are the means for the BoM (1 dp) and for that thermometer:

      Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   dec   Jan   Feb   Mar
1 dp: 22.72 19.24 17.13 14.43 13.29 13.85 17.26 24.33 22.73 27.45 25.98 25.1  24.86
0 dp: 22.77 19.27 17.13 14.37 13.29 13.84 17.33 24.35 22.67 27.48 26    25.17 24.84
diff:  0.05 0.03    0   -0.06    0  -0.01  0.08  0.03 -0.06  0.03  0.02  0.08 -0.02

The middle row, measured by day to 1°C, has a far more accurate mean than that resolution. As a check, the sd of the difference (bottom row) is expected from above to be sqrt(1/12/31) (slight approx for days in month), which is 0.052. The sd of the diffs shown is 0.045. The monthly average at 1 C resolution is accurate to about 0.05°C.


  1. I think there's another point that some of the WUWTers may be stumbling over. The thing we're trying to determine is the global mean temperature; the "random variables" being discussed here should be the errors on any given temperature measurement. It's these errors that are uncorrelated; these are the random independent variables under discussion.

    Temperatures are obviously correlated in both time and space.

    The spatial and temporal heterogeneity adds an extra layer of complexity to calculating a global mean temperature that is sure to confuse some.

  2. A good example of this is the technique of oversampling and decimation used to increase the precision of a analog to digital converter

    Might mention it over there Eli is banned

    1. Thanks, Eli. I think the thread at WUWT has now expired, but I'll certainly bring it up if there is a recurrence. It's a very interesting technique, although the extra complexities of adding noise etc could meet resistance there.

  3. Hi Nick, the need to add noise is well illustrated by the example over there that somebunny used of what do you get when you have a ruler with only inch marks and a board to measure. If you don't add noise you get the same number all the time and no improvement in precision. If you add enough unbiased noise to jiggle the measurement between marks on the ruler, the precision will improve.

    Anyhow, Eli tried to post this and they did. Only banned mostly now Eli guesses.

  4. That was a tough slog over at WUWT. The level of aggressive numbskullery there is that site's version of a skunk's odor warning others to keep away. One of the clowns commenting even claimed that "we may even have had cooling over the last 150 years and would not know!"

    But it's informative to see the sheer range of misunderstandings and basic errors that so many WUWT contributors and commenters hold just on basic facts on instrumentation and simple statistics. How such individuals can go from that base straight to questioning much more complex analyses is a good example of the Dunning-Kruger effect in action.

  5. The problem is that - not understanding the science is unfortunately but inevitably, a barrier to understanding the science

    As someone once said - "a good man has got to know his limitations"

  6. As a metrologist, I'm surprised anyone would use metrology as an argument against averaging. One must take a series of readings and average if only to know the short-term repeatability to calculate uncertainties. And of course anyone with half a brain, metrologist or not, quickly understands that averaging adds precision.

    Perhaps the mental stumbling block is that averaging readings from one device adds precision - not accuracy, but averaging multiple devices adds both precision and accuracy.

    I once performed a simple experiment where I showed co-workers that I could get more accurate results from twenty-five 6 1/2 digit voltmeters than from one 8 1/2 digit voltmeter - even though the 8 1/2 digit voltmeter has a presumed accuracy 50 times better than the 6 1/2 digit voltmeters. I did 'cheat' a little by using statistical bootstrapping to increase the effective sample size from 25 to 1000. I would have to go back and find the final results, but the reduction in error was approximately from 85 ppm for a single 6 1/2 digit voltmeter to low single digit ppm error after bootstrapping.

    1. Help me with this Kevin. I can see the improvement in accuracy averaging thermometer readings. I cannot see it with the voltmeters, unless you assume that their errors are evenly distributed about a correct indication. Maybe I don't understand what accuracy means in this context.

    2. j ferguson - you have it in one; we expect any group of independent readings to have a distribution aroumd the 'true' value. The main caution here for me was that I had to find 25 voltmeters that were not correlated. I.e., if they had all been calibrated by the same laboratory, then we might suspect a bias in one direction or another. Similarly, if they had all come off a production line the day before, then we would also suspect a systemic bias.

      Having access to large amounts of data on instrument readings I am constantly reminded that "math works" :)

    3. Thanks kevin. it was easy for me to imagine that all the meters were sent to the same lab for calibration and were all off by the error of the 'standard'

  7. It is so embarrassing that these people pretend to know science better than scientists.

    There is one case, however, where I do worry a little about reading a thermometer to one degree. In the 19th century, the thermometer to measure the sea surface temperature is typically stored in the warm cabin. It then has to be stirred in a bucket of sea water until the thermometer has reached the temperature of the water. I wonder whether the voluntary observers / sailors waited long enough until the the warm bias is gone within 0.1°C when they read the thermometer to 1°C.

    (The abbreviation LOLN is not explained and written as LLON a few times.)

  8. Victor,
    Thanks for the LLON warning - all fixed now, I hope. Yes, I said the thread was interesting, but it ended up in sheer nuttiness from Pat Frank.

    I tried to stick to just the actual thermometer reading, without getting into whether the reading was of the correct thing, or even whether the thermomoeter had stabilized. Yes, there are certainly ways in which bias could be introduced, even into reading - for example, if people tend to round down when it should be up.

  9. As Victor says, it is rather amazing that these people really do seem to think that they understand this better than professionals who've worked on this for a very long time. I wonder if it isn't simply different environments. I've been sitting through scientific seminars for a very long time. Something I certainly learned quite early on is that if you think the speaker has made some kind of silly mistake that it is more likely that I was wrong, or misunderstood what was being done, than the speaker having made the kind of mistake that seemed obvious to someone who had only just encountered their work.

    1. I fully agree. Until the moment you are an expert yourself, it is a good idea to practise humility, train your ability to ask questions and to listen. The expert likely knows something you do not (yet).

    2. It's a good idea to go on practising humility even beyond the moment when you believe yourself to be an expert.

    3. :-) Yes. I was thinking that when you are an expert there may also be a case where you think everyone is wrong and then you should also have the courage to say so.

    4. Yup, all these experts such as Richard Lindzen, Judith Curry, Murray Salby, etc. Better listen to them, lol.

  10. ...and Then There's Physics said
    "Something I certainly learned quite early on is that if you think the speaker has made some kind of silly mistake that it is more likely that I was wrong"

    Good that you are speaking for yourself ... when I listen to a speaker make up "just so" stories to explain climate science, I realize that there is so much more left to understand.

  11. “Anti-intellectualism has been a constant thread winding its way through our political and cultural life, nurtured by the false notion that democracy means that 'my ignorance is just as good as your knowledge.'” --Asimov

    The problem is: when you have one person who's ignorant of a field and yet certain they're right about it, how do you communicate across that gulf of understanding?

    Competence is something that must be earned and then demonstrated. But hell if I can get the 'skeptics' to grok that.

    1. "The problem is: when you have one person who's ignorant of a field and yet certain they're right about it, how do you communicate across that gulf of understanding? "

      The converse problem is the one person who claims to be the authority in a field and is certain that they are correct on a topic, and then uses that to their advantage in halting further progress.

      The best example of that is Richard Lindzen, who in my opinion has done no favors to the discipline of atmospheric physics. I found this quote recently: " More importantly, he's been wrong about nearly every major climate argument he's made over the past two decades. Lindzen is arguably the climate scientist who's been the wrongest, longest. " I would amend that to the wrongest, longest, and LOUDEST.