This follows on from my previous post, which set out a numerical version of the considerations in calculating an annual average of temperatures given monthly averages where one may be missing. I'm using this as an analogy of what USHCN does with missing data in its US spatial average. They infill the missing with a locally based estimate; I've been arguing with people who say that is wrong and they should just drop the data. I contend that infilling is fine, and dropping would be very bad. I set out the arithmetic to show what happens when you do that to February in the data for Luling, Texas 2005.
There I tracked what happens to the climatology and anomalies, which I think is revealing. Here I won't mention that breakdown, but just show graphically how infilling works for averaging the absolutes, and how just dropping doesn't.
First I just plot the monthly averages. The red line is the annual average. It's just the red area (down to zero, not shown), or the simple average of the months.
This time I'll drop June (the pplot works better with a central value). The old red plot is shown with the green overlay after dropping. The other months are stretched to cover the gap; it's the same as reweighting June to zero. The stretching reflects the fact that in averaging, each month is (hopefully) divided by 11. It's as if there were 11 months in the year.
Actually I reweighted June to not quite zero, so you see a slightly thicker vertical between May and July.
The red average is 20.32°C, as before; after dropping June it is 19.66. You can see why; the other months in effect stretch to fill the space, and there are more winter months to expand. So the area under the green is less.
Now we'll see what happens if we infill - we don't remove June, but replace it with the average of May and July:
Now everything is back in place, and only June has a slight area discrepancy. The averages are much closer: 20.32 and 20.18. The difference is 0.14 instead of 0.66.
Finally, if you don't like "fabricating" the June reading, just leave it out, but instead of stretching all months, stretch just May and July. That is the reweighting option. It looks like this:
In fact the result is exactly the same as infilling. That is, pictorially, my contention that infilling in an average is just reweighting. In fact, when you just leave out, you upweight the other months equally; here you reweight the nearby ones. They are upweighted preferentially because their values are more likely to resemble June's.
So once again, there are various ways you can handle missing data in an average, but where climatology varies the very worst is to just leave it out. That applies to stations on a surface as well as months in a year.