moyhu: Infilling, the graphics version

Tuesday, July 1, 2014

Infilling, the graphics version

This follows on from my previous post, which set out a numerical version of the considerations in calculating an annual average of temperatures given monthly averages where one may be missing. I'm using this as an analogy of what USHCN does with missing data in its US spatial average. They infill the missing with a locally based estimate; I've been arguing with people who say that is wrong and they should just drop the data. I contend that infilling is fine, and dropping would be very bad. I set out the arithmetic to show what happens when you do that to February in the data for Luling, Texas 2005.

There I tracked what happens to the climatology and anomalies, which I think is revealing. Here I won't mention that breakdown, but just show graphically how infilling works for averaging the absolutes, and how just dropping doesn't.

First I just plot the monthly averages. The red line is the annual average. It's just the red area (down to zero, not shown), or the simple average of the months.

This time I'll drop June (the pplot works better with a central value). The old red plot is shown with the green overlay after dropping. The other months are stretched to cover the gap; it's the same as reweighting June to zero. The stretching reflects the fact that in averaging, each month is (hopefully) divided by 11. It's as if there were 11 months in the year.

Actually I reweighted June to not quite zero, so you see a slightly thicker vertical between May and July.

The red average is 20.32°C, as before; after dropping June it is 19.66. You can see why; the other months in effect stretch to fill the space, and there are more winter months to expand. So the area under the green is less.

Now we'll see what happens if we infill - we don't remove June, but replace it with the average of May and July:

Now everything is back in place, and only June has a slight area discrepancy. The averages are much closer: 20.32 and 20.18. The difference is 0.14 instead of 0.66.

Finally, if you don't like "fabricating" the June reading, just leave it out, but instead of stretching all months, stretch just May and July. That is the reweighting option. It looks like this:

In fact the result is exactly the same as infilling. That is, pictorially, my contention that infilling in an average is just reweighting. In fact, when you just leave out, you upweight the other months equally; here you reweight the nearby ones. They are upweighted preferentially because their values are more likely to resemble June's.

So once again, there are various ways you can handle missing data in an average, but where climatology varies the very worst is to just leave it out. That applies to stations on a surface as well as months in a year.

16 comments:

Everett F SargentJuly 2, 2014 at 5:54 AM
Nick,

Let's be absolutely clear about one salient fact.

AFAIK and IMHO Goddard absolutely does not want the raw temperature data touched in any way.

Period.

Call this Goddard's Law for all I care (this may well be a straw man, but IMHO I don't expect Goddard to ever, plainly and truthfully, admit that any aspect of homogenization is valid).

Anyways, what this means in the real world is that no one could ever even consider the possibility of developing a climatology using historic land/sea surface temperature records anywhere on planet Earth.

See my previous post where all 1218 USHCN time series shift at least one month by at least 1.2 degrees F.

That basic point there, with that post, is that all 1218 USHCN final (homogenized) can be shown to be "fabricated" "moved" "shifted" "cooling the past" "warming the present" "infilled" "estimated" (the last two quoted words are to be associated by lying deniers with the words "make believe" or some such) (and/or any other salty language that those lying deniers wish to use).

In essence, you are arguing with and against a brick wall.

And, I really do appreciate all your efforts to argue with said brick wall, but hopefully you see the absolute black and white position that the lying deniers like Goddard have taken.

Hopefully.

And if you accept ANY questioning of USHCN final time series, that's where Willard Anthony Watts, et. al. come in with the informal logical fallacy of guilt by association, because of the "USA is always #1" mindset, just imagine what the rest of the world's land/sea surface temperature records must be like, or some such.

This is the same logical fallacy that Willard used in his two part post comparing the likes of Marcott, Mann, et. al. with the likes of Goddard.

I have dropped the "informal" from the above statement because IMHO and AFAIK the Venn diagram of those two groups do not intersect at all.

Never have, never will.

Now, when Willard talks about "confirmation bias" like he usually does, against the perceived "team", what he is obviously missing is that this is a double edged sword, it cuts both ways.

This gets us to the Willard, et. al. still to be submitted paper purporting microsite issues based on a guidance classification that has no objective statistical basis (no actual temperature/site data to support those "seat-of-the-pants" temperature bias ranges).

IMHO and AFAIK, only when their own self-selection set of stations is examined (the list of station ID's is absolutely critical to their central argument, and we won't ever see that list until that paper is actually published somewhere with their SOM (and all I care about is the station ID list, I don't give a hoot about boots on the ground images and/or satellite imagery)) will we see what forms of "confirmation bias" are contained therein.

Willard now agrees with Goddard (big surprise there, NOT!), by using the argument of "see my unpublished paper" where we show ~80% of USHCN (or whatever network that they are using) have microsite issues.

And don't get me wrong, I really do want them to publish that paper, somewhere/anywhere, as long as the list of stations they self-selected is included, and I don't even care if it's an open access (pay to play) journal or the most obscure off topic journal or even if it has absolutely no peer review or even if it's published in a trash anti-AGW journal like E&E.

They do seem to have a really tight echo chamber aspect about that paper, almost surly at this point in time, so my fondest hope is that it is given the "standard" peer review (paper read, sounds good, publish it, don't examine the data at all, and don't question the selection/criteria process at all).
ReplyDelete
Replies
Nick StokesJuly 2, 2014 at 6:09 AM
"They do seem to have a really tight echo chamber aspect about that paper..."
I'm not on tenterhooks, though I'll be interested to see what Evan Jones comes up with. I was amazed though that after all their studies over the years, that Anthony is currently shocked to find that a fairly large fraction are not reporting at all.
ReplyDelete
Replies
Everett F SargentJuly 2, 2014 at 2:47 PM
Nick,

Willard is mad a heck with you, because, well because.

http://wattsupwiththat.com/2014/07/01/ncdc-responds-to-identified-issues-in-the-ushcn/#comment-1674412

In that little rand/screed/manifesto Willard states;

"Along with “estimated” data for a bunch of closed/zombie weather stations that shouldn't be reporting at all, and have no data in the raw data file."

Here's a little back story for you.

I have three degrees in civil engineering, the 1st one was a two year at Vermont Technical College in May 1975 (then UVM then Cornell).

That summer of 1975, I was lucky enough to work for the USACE CRREL in Hanover, NH as a GS-3.

My job, along with several others, was to update the CONUS snow load contour map, using all available historic raw monthly snow accumulation data.

This was all from stacks and stacks of computer printouts.

When a station was missing data, we INFILLED it using a simple three point average from the closest three adjacent stations (that formed a triangular enclosure for the missing data).

Perhaps not the best method, but that was almost 40 years ago.

I can't remember if we did any massive multiyear infilling though (Is that a requirement for the v2.5 USHCN to work?).

But at some point, contour maps are constructed from the final homogenized climatology either as anomalies and/or absolutes. Correct?

We never used those estimates to calculate any other missing data, all interpolation was from original raw data only.

Long story short? Infilling station data has been around a very long time, at least 40 years.

I really don't think that Willard has much, if any, technical background with regards to any form of data analyses.

But Goddard is definitely not even wrong.

Oh, and this update from Politifact (After the Fact);

http://www.politifact.com/punditfact/statements/2014/jun/25/steve-doocy/foxs-doocy-nasa-fudged-data-make-case-global-warmi/

Zeke has the last word and even a graph.
ReplyDelete
Replies
AnonymousJuly 2, 2014 at 11:51 PM
i hope you can bear with me here nick . i have visited several blogs in relation to this topic in the hope of gaining an understanding from a laypersons point of view.
the explanation of the infilling adjustments above is excellent in clarity, however it highlights the issue i have in respect that the result does differ from the result if the actual data had been used .surely in any field where we are talking tenths of degrees that difference is significant ?

the actual issue as i understand it is currently infilling occurs for many areas as a result of homogenized data from urban stations being used where there has been drop out of rural stations ? i appreciate my questions may be technically naive ,but again,from a lay point of view,these are the issues i perceive .
ReplyDelete
Replies
AnonymousJuly 3, 2014 at 12:38 AM
in the example given,read and understood. in the case of the homogenisation in this instance,if the data homogenised is from urban stations and or stations from lower altitudes , does the process not generate a warming bias. would it be possible to document in simple terms how the actual homogenisation process takes place where the drop out of a rural or higher altitude station takes place to create the artificial station ?
ReplyDelete
Replies
AnonymousJuly 3, 2014 at 9:22 PM
thanks for taking the time to reply nick. so how is the infilling calculated to represent station drop out where the stations that dropped out were rural/higher latitude in a region where only/mostly urban stations remain to report ?
using monthly averages from a region or individual station does not address this issue.
ReplyDelete
Replies
AnonymousJuly 6, 2014 at 8:05 AM
Re Watts being "mad" with Nick my guess is that he is probably otherwise preoccupied with trying to reconcile his warring contrarian colleagues, Monckton, Eschenbach, Svalgaard and Jo Nova and partner (Dr. " 11 year notch" Evans). It's fairly serious stuff with Monckton threatening Svalgaard and Eschenbach with litigation for libel against Evans. I wonder how M and E will get on at next week's contrarian junket in (appropriately enough) Las Vegas. Fordprefect at the climateandstuff blog managed to copy a chunk of a broadside against Monckton that appeared briefly on WUWT before being censored.

Rambunctious stuff.
ReplyDelete
Replies

Add comment

An interactive topic index for all Moyhu posts.
Latest Ice and Temperature data
Climate Data Portals
A gallery of Javascript-enhanced graphics
Temperature trend viewer
Google Maps and GHCN
WebGL map of past GHCN/SST station temperatures
WebGL map of GHCN/SST station temperature trends
HiRes NOAA OI SST with WebGL and Movie
Regional Hi-Res SST movies
WebGL Facility
TempLS Guide
More pages, and blog glossary

moyhu

Tuesday, July 1, 2014

Infilling, the graphics version

Infilling, the graphics version

16 comments:

Search This Blog

Maintained Pages

Recent Comments

Blogroll

Blog Archive

Translate

Resources

About Me

moyhu

Tuesday, July 1, 2014

Infilling, the graphics version

Infilling, the graphics version

16 comments:

Search This Blog

Maintained Pages

Recent Comments

Blogroll

Subscribe To

Blog Archive

Translate

Resources

About Me