Tuesday, April 6, 2010

Ver 1.2.1 - Various results

I tried some of the more exotic options that I describeded in the last post. There was ConUS, the "Kept" and "cut" GHCN classes - referring to those that were maintained after the GHCN project finished its historical phase, and those which were not. Conspiracy-oriented folks have been suggesting that GHCN dropped preferentially stations that were not warming. I also checked B.C. and Africa.

I made some slight changes to the code. I mentioned last post the fix of Carrick's bug; this time I found issues when selections had years in which there was no data at all. These related to plotting - they had been set to zero, which could affect plot and trend - now they are set to NA, which R treats well.

A summary of this series of posts is here.

The selection

I had to change some of the selection criteria that I posted previously. Here is what I currently have:

 if( tc == "NH") i01 = i01[tv$lat>0];
  if( tc == "SH") i01 = i01[tv$lat<0];
  if( tc == "ConUS") i01 = i01[tv$country == 425 & tv$lat<51 & tv$lon > -130];
  if( tc == "Airport") i01 = i01[tv$airpt == "A"];
  if( tc == "Rural") i01 = i01[tv$urban == "A"];
  if( tc == "Urban") i01 = i01[tv$urban != "A"];
  if( tc == "Cut") i01 = i01[tv$endyr<1995 & tv$length>50];
  if( tc == "Kept") i01 = i01[tv$endyr>2005 & tv$length>60];
  if( tc == "Africa") i01 = i01[tv$country <190];
  if( tc == "BC") i01 = c(grep(",BC",tv$name),grep(",B.C",tv$name));

It turns out that British Columbia is sometimes referred to as "BC", and sometimes "B.C" in the inventory. But you can just run 2 greps and concatenate the results.

Here are the trends:
1900 - 20090.076080.042610.05348 0.07626 0.069160.05359
1978 - 20090.27150.25540.002978 0.28080.5146 0.2817
Numerical results are on the document store

Update. Zeke has queried the downturn at the end of the "Cut" plot, of stations that have no data after 1994 in the GHCN set (and do have for at least 50 years prior). It's actually very year dependent. 

There were 842 stations that have data up to '91, and not since. 1007 stopped '92 or before, 1023 '93 and 1036 '94. That means that the last point in the "Cut" plot below is based on about 13 stations. So there are big fluctuations at the endpoint depending on where you stop. 

I've plotted below "Cut95" - stations whose last year was '94 - that was my original choice. I've also plotted Cut94, Cut93, Cut92. The earlier ones are more reliable at the ends.


Continental US

Continental US



"Dropped" GHCN sites

"Dropped" GHCN sites

"Continued" GHCN sites

"Continued" GHCN sites

British Columbia

British Columbia
The last stages of the "dropped stations" plot depends a lot on where you stop:

Stations not reporting '92 or later
Stations not reporting '93 or later
Stations not reporting '94 or laterStations not reporting '95 or later


  1. Cool, Should I just download a new version?

  2. opps I see the bug fixes..

    Anyway I did a fresh download of R since I was a couple versions out of date. Will apply your fix and revert.

  3. Those '90/91 values in the dropped series look suspicious. None of the rest of us got anything quite that dramatic.

  4. Zeke,
    Yes, I think the transition to zero stations needs more work. It's possibly a weakness with this implementation of least squares. A graph point which is determined by only a few stations will be relatively lightly weighted, unless the weight function is varied.

    The weighting takes account of spatial sparsity (through the grid) - maybe something similar is needed in time for series like this.

  5. Zeke,
    I think now the issue is more that as the plot winds down, there's dependence on very few stations, so what you see depends a lot on the particular year you stop. I've included an update above.

  6. Ahh, its also in part because most of us (myself, Tamino, etc.) used 1992 instead of 1994 as the cutoff, so we didn't have those two years.

  7. Just incase there are other curious Canadians lurking here:

    The Canadian Province abbreviations are a total mess. ie there were more than BC & B.C including, off the top of my head- ,space BC; B; nothing at all, B.C. and maybe even BC.

    Many stations had no province but had CANADA.

    So beware, if any one else is doing this.

  8. Have you tried cities > 2 million people. It'd be nice if someone confirms my UHI result :)

  9. Joseph - not yet. I'll put it on the list for the next run.

  10. Joseph: there is always a danger of having spatial coverage issues predominate when you get too selective a station set. E.g. the low pop density stations in http://rankexploits.com/musings/wp-content/uploads/2010/03/Picture-78.png

  11. @Zeke: Sure, but I've actually looked at disjoint population size groups, e.g. 10 million to 12 million, 12 million to 15 million, etc. There's a statistically significant temperature slope trend once you pass the 1 million mark.

    It would be difficult to explain that regression trend as a regional bias.

    Details here.

  12. Joseph,
    I've put up a new post with BigCity (>2M) and MedCity (>0.5M) results. I'll post the numbers at the document store

  13. Hmm, is conus in the XLS file?

  14. The ConUS numerical results are in the file
    in All-Files.zip, under dir GHCN Glob Average v1.2.1

    I'll update those zip files soon.

  15. ok,

    I just commented out the graphics and ran the code. still no word from the sig on the X11 problem. anyways. It matches GISS nicely correlation about .97. A bit of weirdness out in the past few years.

    Zeke should put together a US comparsion. maybe joseph would contribute as well

  16. A U.S. comparison of what exactly?

  17. We have Nick's version of Conus, your version, GISS, add josephs..
    maybe you guys could convince tamino to do one, I'm working on asking Jeff.. or In a pinch We could subset v2.mean to just Conus and feed it to jeff's algorithm rather than moding his code.

    By doing that we come pretty close to comparing apples to apples.
    GISS do some fiddling with GHCN and USHCN and drop some US stations entirely.

    In the end what you have is a "proof" of the claim ( which we all know to be largely true) that the various methods of averaging produce the "same" answer.

  18. Mosh,

    Given that the U.S. is much smaller than the globe and pretty densely sampled with mostly complete records, it really shouldn't matter much what gridding method and anomaly calculation method you use. It will make a difference if you use GHCN or USHCN, and what version you use (e.g. USHCN F52/GHCN v2.mean_adj vs. raw versions) since the U.S. has a known cooling bias in the raw data (from TOBs and MMTS at least).

  19. Here is an old image of mine that shows GISS and my results for GHCN raw, GHCN adj, USHCN raw, and USHCN adj:


  20. Ya Zeke, I see that now, If you get a chance to post an Xls that would be great.

    I imagine then checking other parts of the world. for example africa which Nick has done