moyhu: Nonsense plots of USHCN adjustments

Friday, May 9, 2014

Nonsense plots of USHCN adjustments

On 6 May, Anthony Watts posted at WUWT the following plot, which said it represented the effect of USHCN adjustments over the years. WUWT said:

"Yet as this simple comparison between raw and adjusted USHCN data makes clear…

…adjustments to the temperature record are increasing – dramatically. The present is getting warmer, the past is getting cooler, and it has nothing to do with real temperature data – only adjustments to temperature data. The climate reality our government is living in is little more than a self-serving construct."

Dramatically!

The text gave no source, but linked to this USHCN document, which had no such graph. However, it did say you could click for the source, and that led you to this post from Steven Goddard's site.

The title looks rather official, but he did not give the source. But you can see from comments below that he made it himself. And he doesn't give much detail.

He did in a comment link to two tar files of data, on his site. So I downloaded them. They are standard USHCN files of station monthly averages, raw and initial.

I was surprised at the "dramatic" change, because USHCN had published a fairly well known ver 1 plot:

with much more modest changes, and since it's mostly TOBS, such a different version seemed unlikely.

So I did download the data, and did my own calc. I'll show the R code below the jump. But here is the plot:
Update. Zeke points out in a comment below that the files Steven Goddard supplied were in °C, not °F. Since SG had quoted his results in °F, I assumed that was the form of the data - there was no other indication. I've fixed the plot, sticking with °F.
Update,
A commenter, Peter O'Neill, who checked my code, noted perceptively that I had made a mistake on data reading. I allowed four places for a monthly temp, which is OK between -9 and 99, but for -10 or less loses the sign. My mistake was actually in thinking the data was in deg F, so -10 would be unlikely for a CONUS month average, tho apparently not impossible.

So I've fixed it as he suggested. Mainly, the result is a lot smoother. The format error only hurt badly when final and raw were on opposite sides of -10, which is rare and makes a jumpy result.

I now get a small spike in 2014, as Peter did. It is about 0.1°C. I have figured the reason. The average final-raw has a strong seasonal variation. here are the monthly averages for 2013 in hundredths °C:
12, 17, 14, 1, -7, -9, -10, -14, -14, -8, -2, 3
So the first four months of 2014 gets the positive part.

Pretty different. Note the y axis. I just subtracted raw from initial, for all months where both were available, and did an unweighted average over months and stations. That is what Steven Goddard said he did. Personally, I would at least have done by state, and then perhaps with area weighting, But it probably wouldn't make much difference.

Update: Diagnosis

I suggested here that the problem is that SG subtracted the annual average of the final readings from the average of the raw. Zeke, who has the code, has confirmed. But while the final data generally has no missing data, because of FILNET, the raw data has many missing station/months. So the difference of averages of two different populations is formed, and the data, being absolute temperatures, is very heterogeneous. Spatially, but even worse, seasonally.

The spike is clear evidence of the problem. SG got one here, too, in Illinois. It is inevitable.

In 2014, the final set has 1218 values in each of the first four months. Every station. But for raw, there were, respectively, 891,883,883, and 645 for April. A big preponderance of winter month readings in the average. So the raw is cooler than final for that reason, not adjustments.

Update: Some commenters have kindly contributed their own plots. I've appended these at the end.

Here is my R code. The contents of the tar files are in my directories "raw" and "final".

# Code for averaging USHCN adjustment differences
#Written by Nick Stokes, 9 May 2014
# See post at https://moyhu.blogspot.com.au/

# gather and sort data
f=list.files("raw")
x=c(".raw.tavg",".FLs.52i.tavg")
n=substr(f,1,11)  # station code
iy=1890:2014
#  w will be a big matrix of raw and adjusted, in the original units of .01F
# from 1890-2014
# (yr+12 months,125 years,1218 stations,raw+final)
w=array(NA,c(13,125,1218,2))
for(i in 1:length(n)){
  s=n[i];  # looping over stations
  for(j in 1:2){ # raw, then final
    y=x[j];  # read the file
    b=readLines(paste(c("raw/","final/")[j],s,y,sep=""))
    b=gsub("-9999","  NA ",b)
    h=as.numeric(substr(b,13,16))  # Read year
    # Read months
    for(k in 0:11*9+19)h=rbind(h,as.numeric(substr(b,k,k+3)))
    h=h[,h[1,]>1889]  # delete pre 1890
    kk=match(h[1,],iy) # line up
    w[,kk,i,j]=h # add
  }
  if(i%%100==0)print(Sys.time())  # it takes about 80 sec
}

x=w[2:13,,,2]-w[2:13,,,1]  # final-raw (omit years)
x1=colMeans(x,dims=1,na.rm=T)  # mean over months
x2=rowMeans(x1,na.rm=T)  # mean over stations
x2=round(x2/100,3) # round and set to deg F

# x2 are the annual averages Now do plot
graphics.off()
png("ushcn.png",width=800)
plot(1890:2014,x2,pch=19,xlab="Year", ylab="Final-raw in deg F",main="USHCN V2.5 adjustment discrepancy",col="blue")
lines(1890:2014,x2,col="blue")
text(1920,0,"Graph prepared by Nick Stokes, 9 May 2014")
text(1920,-0.05,"post at https://moyhu.blogspot.com.au/ 9 May")
dev.off()

Plots in comments:
Zeke shows this plot of raw and adjusted, and the difference

Here is Zeke's current version of the difference plot. He's using some gridding, which I think is better, but doesn't make a lot of difference (a bit smoother).

Bruce Schuck (sunshinehours) linked max/min plots as well:

46 comments:

ZekeMay 10, 2014 at 12:23 AM
I did a version with 2.5x3.5 lat/lon gridding and a land mask, and it looks quite similar: http://i81.photobucket.com/albums/j237/hausfath/USHCNHomogenizedminusRaw_zps284d69fe.png
ReplyDelete
Replies
AnonymousMay 10, 2014 at 12:43 AM
So "Goddard" made it up?
ReplyDelete
Replies
AnonymousMay 10, 2014 at 2:59 AM
Goddard made his code available, why didn't you use it and check that work instead of coming up with your own method?

I don't see you method as valid, since your baseline offset does not match the pre-2000 NOAA provided plot. How do you account for that?

ReplyDelete
Replies
ScottMay 10, 2014 at 3:28 AM
Nick: It would be nice if there was a way to look at the data in various stages of homogenization, in an effort to see how different corrections impacted the result. Then you could see how the raw data evolves to the adjusted product. It might help people visualize the necessary corrections to time-of-observation bias, land cover biases, loss/gain of stations in the same area, etc.

Of course by the time we get to the global scale, the impact of the homogenization process is reduced. And glacier loss can't be explained by homogenization. Wildlife migration can't be explained by adjustments. Sea level doesn't just rise for no reason. We could go on and on.
ReplyDelete
Replies
BSMay 10, 2014 at 5:04 AM
USHCN sure worked hard to cool off the 1930's.

Wouldn't 0.5F move 1934 to the top of the list?
ReplyDelete
Replies
ZekeMay 10, 2014 at 5:06 AM
Looked into it a bit more, and NCDC posts their data in degrees C, not F as Nick's import code assumes. So Goddard's results are not too far off, apart from the oddity of the last point.

Here is what things should look like (in degrees F): http://i81.photobucket.com/albums/j237/hausfath/USHCNHomogenizedminusRaw_zps3725ac9a.png
ReplyDelete
Replies
BSMay 10, 2014 at 6:19 AM
tmax: http://sunshinehours.wordpress.com/?attachment_id=4212
tmin: http://sunshinehours.wordpress.com/?attachment_id=4213
tavg: http://sunshinehours.wordpress.com/?attachment_id=4214

tmin is barely touched.

.75C for the 20s/30s/40s

That is kind of grotesque

ReplyDelete
Replies
AnonymousMay 10, 2014 at 6:53 AM
This spike at the end may be related to the "late data" problem we see with GHCN/GISS and NCDC's "state of the climate" reports. They publish the numbers ahead of dataset completeness, and they have warmer values, because I'm betting a lot of the rural stations come in later, by mail, rather than the weathercoder touch tone entries. Lot of older observers in USHCN, and I've met dozens. They don't like the weathercoder touch-tone entry because they say it is easy to make mistakes.

And, having tried it myself a couple of times, and being a young agile whippersnapper, I screw it up too.

The USHCN data seems to show completed data where there is no corresponding raw monthly station data (since it isn’t in yet) which may be generated by infilling/processing....resulting in that spike. Or it could be a bug in Goddard's coding of some sorts. I just don't see it since I have the code. I've given it to Zeke to see what he makes of it.

Yes the USHCN 1 and USHCN 2.5 have different processes, resulting in different offsets. The one thing common to all of it though is that it cools the past, and many people don't see that as a justifiable or even an honest adjustment.

It may shrink as monthly values come in.
ReplyDelete
Replies
AnonymousMay 10, 2014 at 7:16 AM
I've been in the basement at NCDC where the paper forms are dealt with. There's a fair amount of lag in the B91 COOP station monthly report mail forms received at NCDC. They have to be scanned, checked for scan errors, some have to be sent for hand transcription, then they go for a sanity test checker to look for things like missing minus signs, 4's that look like 9's and get wrongly scanned/transcribed etc, lots of handling involved to get the numerical data out of them.

Often they are still waiting for final data into the next month for a few stations if observer mails late or forgets.
ReplyDelete
Replies
AnonymousMay 10, 2014 at 7:58 AM
Yep, on this we agree.

CLIMAT reports and GHCN/GISS reporting aren't always in sync, resulting in odd anomalies due to late data. GISS is missing several CLIMAT reports on a regular basis. Some GHCN stations which we know to be reporting regularly, disappear from GISS for no apparent reason.

Stockholm is a good example, missing in GISS since 1994:. http://data.giss.nasa.gov/cgi-bin/gistemp/show_station.cgi?id=645024640000&dt=1&ds=14

Yet we know the data from Stockholm observatory is still being recorded. http://bolin.su.se/data/stockholm/raw_individual_temperature_observations.php

Moberg is the data keeper, hardly anyone obscure.

Data plotted to 2012 here: http://people.su.se/~amobe/stockholm/stockholm-historical-weather-observations-ver-1.0/graphics/annual-temperatures/stockholm_annual_1756_2012_sve.png

The "biggest problem facing mankind" needs better quality control. Gavin needs to do less TED talks and more data ingestion QC IMHO.

ReplyDelete
Replies
BSMay 11, 2014 at 3:12 AM
Any idea why Dec/Jan/Feb are adjusted to bizarrely?

https://sunshinehours.wordpress.com/2014/05/10/ushcn-2-5-adjustments-final-raw/
ReplyDelete
Replies
AnonymousMay 27, 2014 at 10:22 AM
I think a small code correction is needed:

for(k in 0:11*9+19)h=rbind(h,as.numeric(substr(b,k,k+3)))

should I think be

for(k in 0:11*9+18)h=rbind(h,as.numeric(substr(b,k,k+4)))

Double digit negative temperatures can occur. For example, the minimum January mean in the raw data for 20140522 which I found as min(w[2,,,1], na.rm=TRUE) is -27.48, and would be treated as +27.48 by the unmodified code which loses the minus sign.

Comparing, http://oneillp.files.wordpress.com/2014/05/moyhu_4_51.png, the corrected discrepancy, in red, smooths out the fluctuations quite a bit but introduces a final spike, for which I have no explanation so far, although I have not given it much thought yet - I have only just spotted the error.
ReplyDelete
Replies
AnonymousMay 27, 2014 at 10:27 AM
oneillp is Peter O'Neill - I prefer to comment with a name in full. This blogspot "Comment as:" choice, and preview which is not a complete preview, confused me!
ReplyDelete
Replies

Add comment

An interactive topic index for all Moyhu posts.
Latest Ice and Temperature data
Climate Data Portals
A gallery of Javascript-enhanced graphics
Temperature trend viewer
Google Maps and GHCN
WebGL map of past GHCN/SST station temperatures
WebGL map of GHCN/SST station temperature trends
HiRes NOAA OI SST with WebGL and Movie
Regional Hi-Res SST movies
WebGL Facility
TempLS Guide
More pages, and blog glossary

moyhu

Friday, May 9, 2014

Nonsense plots of USHCN adjustments

Nonsense plots of USHCN adjustments

Update: Diagnosis

46 comments:

Search This Blog

Maintained Pages

Recent Comments

Blogroll

Blog Archive

Translate

Resources

About Me

moyhu

Friday, May 9, 2014

Nonsense plots of USHCN adjustments

Nonsense plots of USHCN adjustments

Update: Diagnosis

46 comments:

Search This Blog

Maintained Pages

Recent Comments

Blogroll

Subscribe To

Blog Archive

Translate

Resources

About Me