I made some slight changes to the code. I mentioned last post the fix of Carrick's bug; this time I found issues when selections had years in which there was no data at all. These related to plotting - they had been set to zero, which could affect plot and trend - now they are set to NA, which R treats well.
A summary of this series of posts is here.
The selection
I had to change some of the selection criteria that I posted previously. Here is what I currently have:
if( tc == "NH") i01 = i01[tv$lat>0];
if( tc == "SH") i01 = i01[tv$lat<0];
if( tc == "ConUS") i01 = i01[tv$country == 425 & tv$lat<51 & tv$lon > -130];
if( tc == "Airport") i01 = i01[tv$airpt == "A"];
if( tc == "Rural") i01 = i01[tv$urban == "A"];
if( tc == "Urban") i01 = i01[tv$urban != "A"];
if( tc == "Cut") i01 = i01[tv$endyr<1995 & tv$length>50];
if( tc == "Kept") i01 = i01[tv$endyr>2005 & tv$length>60];
if( tc == "Africa") i01 = i01[tv$country <190];
if( tc == "BC") i01 = c(grep(",BC",tv$name),grep(",B.C",tv$name));
Here are the trends:
Trend | Global | ConUS | Cut | Kept | BC | Africa |
1900 - 2009 | 0.07608 | 0.04261 | 0.05348 | 0.07626 | 0.06916 | 0.05359 |
1978 - 2009 | 0.2715 | 0.2554 | 0.002978 | 0.2808 | 0.5146 | 0.2817 |
Update. Zeke has queried the downturn at the end of the "Cut" plot, of stations that have no data after 1994 in the GHCN set (and do have for at least 50 years prior). It's actually very year dependent.
There were 842 stations that have data up to '91, and not since. 1007 stopped '92 or before, 1023 '93 and 1036 '94. That means that the last point in the "Cut" plot below is based on about 13 stations. So there are big fluctuations at the endpoint depending on where you stop.
I've plotted below "Cut95" - stations whose last year was '94 - that was my original choice. I've also plotted Cut94, Cut93, Cut92. The earlier ones are more reliable at the ends.
Plots
Continental US | Continental US |
Africa | Africa |
"Dropped" GHCN sites | "Dropped" GHCN sites |
"Continued" GHCN sites | "Continued" GHCN sites |
British Columbia | British Columbia |
The last stages of the "dropped stations" plot depends a lot on where you stop:
Stations not reporting '92 or later | Stations not reporting '93 or later |
Stations not reporting '94 or later | Stations not reporting '95 or later |
Cool, Should I just download a new version?
ReplyDeleteopps I see the bug fixes..
ReplyDeleteAnyway I did a fresh download of R since I was a couple versions out of date. Will apply your fix and revert.
Those '90/91 values in the dropped series look suspicious. None of the rest of us got anything quite that dramatic.
ReplyDeleteZeke,
ReplyDeleteYes, I think the transition to zero stations needs more work. It's possibly a weakness with this implementation of least squares. A graph point which is determined by only a few stations will be relatively lightly weighted, unless the weight function is varied.
The weighting takes account of spatial sparsity (through the grid) - maybe something similar is needed in time for series like this.
Zeke,
ReplyDeleteI think now the issue is more that as the plot winds down, there's dependence on very few stations, so what you see depends a lot on the particular year you stop. I've included an update above.
Ahh, its also in part because most of us (myself, Tamino, etc.) used 1992 instead of 1994 as the cutoff, so we didn't have those two years.
ReplyDeleteJust incase there are other curious Canadians lurking here:
ReplyDeleteThe Canadian Province abbreviations are a total mess. ie there were more than BC & B.C including, off the top of my head- ,space BC; B; nothing at all, B.C. and maybe even BC.
Many stations had no province but had CANADA.
So beware, if any one else is doing this.
Have you tried cities > 2 million people. It'd be nice if someone confirms my UHI result :)
ReplyDeleteJoseph - not yet. I'll put it on the list for the next run.
ReplyDeleteJoseph: there is always a danger of having spatial coverage issues predominate when you get too selective a station set. E.g. the low pop density stations in http://rankexploits.com/musings/wp-content/uploads/2010/03/Picture-78.png
ReplyDelete@Zeke: Sure, but I've actually looked at disjoint population size groups, e.g. 10 million to 12 million, 12 million to 15 million, etc. There's a statistically significant temperature slope trend once you pass the 1 million mark.
ReplyDeleteIt would be difficult to explain that regression trend as a regional bias.
Details here.
Joseph,
ReplyDeleteI've put up a new post with BigCity (>2M) and MedCity (>0.5M) results. I'll post the numbers at the document store
Hmm, is conus in the XLS file?
ReplyDeleteThe ConUS numerical results are in the file
ReplyDeleteOutput_2010-04-06:12:50_Global_ConUS_Cut_Kept_BC_Africa_.txt
in All-Files.zip, under dir GHCN Glob Average v1.2.1
I'll update those zip files soon.
ok,
ReplyDeleteI just commented out the graphics and ran the code. still no word from the sig on the X11 problem. anyways. It matches GISS nicely correlation about .97. A bit of weirdness out in the past few years.
Zeke should put together a US comparsion. maybe joseph would contribute as well
A U.S. comparison of what exactly?
ReplyDeleteWe have Nick's version of Conus, your version, GISS, add josephs..
ReplyDeletemaybe you guys could convince tamino to do one, I'm working on asking Jeff.. or In a pinch We could subset v2.mean to just Conus and feed it to jeff's algorithm rather than moding his code.
By doing that we come pretty close to comparing apples to apples.
GISS do some fiddling with GHCN and USHCN and drop some US stations entirely.
In the end what you have is a "proof" of the claim ( which we all know to be largely true) that the various methods of averaging produce the "same" answer.
Mosh,
ReplyDeleteGiven that the U.S. is much smaller than the globe and pretty densely sampled with mostly complete records, it really shouldn't matter much what gridding method and anomaly calculation method you use. It will make a difference if you use GHCN or USHCN, and what version you use (e.g. USHCN F52/GHCN v2.mean_adj vs. raw versions) since the U.S. has a known cooling bias in the raw data (from TOBs and MMTS at least).
Here is an old image of mine that shows GISS and my results for GHCN raw, GHCN adj, USHCN raw, and USHCN adj:
ReplyDeletehttp://rankexploits.com/musings/wp-content/uploads/2010/03/Picture-112.png
Ya Zeke, I see that now, If you get a chance to post an Xls that would be great.
ReplyDeleteI imagine then checking other parts of the world. for example africa which Nick has done