Monday, January 30, 2012

Reykjavik and GHCN adjustments.

There was a WUWT guest post recently by Paul Homewood on GHCN and GISS adjustments, and their effect on Iceland. It was somewhat along the lines of an earlier Eschenbach post on Darwin.

Then there were the usual thoughtful WUWT responses
When are the legal people going to be brought into this?
Someone should go to jail if this is tampering as it appears.
we’ve still a bunch of scumbags on this earth pretending that a dynamic history is OK
etc. So what happened

No, history was not rewritten. What the folks there don't seem to want to acknowledge is that GHCN circulates two files, described here. The file everyone there wants to focus on is the adjusted file (QCA). This, as explained, has been homogenized. This is a preparatory step for its use in compiling a global index. It tries to put all stations on the same basis, and also adjust them, if necessary, to be representative of the region. It is not an attempt to modify the historical record.

That record is contained on the other data file distributed - the unadjusted QCU file. This contains records as they were reported initially. It is generally free of any climatological adjustments. For the last 15 or so years, Met stations have submitted monthly CLIMAT forms. You can inspect these online. Data goes straight from these to the QCU file, and will not change unless the Met organisation submits an amended CLIMAT file. This is the history, and no-one is tampering with it.

The adjusted file does change, as the name suggests it may. Recently, it has been modified to use an improved pairwise comparison homogenization algorithm due to Menne and Williams. It is now (as of Dec 15 2011) used by GISS instead of their own homogenization algorithm, which makes the QCA file much more significant.

Update. I have a new post which looks at the GHCN adjustments more generally, with visualization.

The Iceland Met Office record

So Paul raised this with the Icelandic Met Office, with the following loaded questions:
a) Were the Iceland Met Office aware that these adjustments are being made?
No we were not aware of this.
b) Has the Met Office been advised of the reasons for them?
No, but we are asking for the reasons
c) Does the Met Office accept that their own temperature data is in error, and that the corrections applied by GHCN are both valid and of the correct value? If so, why?
The GHCN “corrections” are grossly in error in the case of Reykjavik but not quite as bad for the other stations. But we will have a better look. We do not accept these “corrections”.
d) Does the Met Office intend to modify their own temperature records in line with GHCN?

And the Met sent their own version of the Reykjavik data. But what is missing from this dialog is that GHCN was never altering their QCU record, and is not suggesting that the Iceland records should be changed. I'll show the QCU data for the period most complained of, post 1939. Units are .01°C. Unfortunately, the Dec numbers got lost in formatting.
Despite all the invocation of the Iceland data, there is no electronically readable form supplied. You can compare the link above, though, and you will see that the GHCN QCU data is almost identical. There are changes of a fraction of a degree - the Iceland Met do say that they have also made some adjustments.

Actually there are isolated differences. These seem to be sign differences, where GHCN has a negative sign. The sign may have disappeared in scanning the Iceland numbers.

The story of the Reykjavik adjustments was taken up by Ole Humlun, with again the same disdain for the record contained in the GHCN QCU file and the purpose of the homogenization adjustment. He refers to the Iceland data, and to Rimfrost, but does not mention the almost identical QCU.

The V3.1 adjustments

It is true that the adjustments have changed recently. V3.1 was released 4 November 2011. I have a QCA file from 14 July 2011, and this is, for Reykjavik, almost identical to the QCU file. I have a v2.mean file from Dec 2009, which is the V2 unadjusted file, and it is essentially identical to the current QCU file. In v2 style, it had duplicates. there were four for Reykjavik, but they had little overlap and where they did, were consistent. The unadjusted file has not changed, but the QCA file has.

I also have an adjusted v2 file from Dec 2009. The adjustments are substantial, but less than the current ones.

So here is the plot of the current QCU file vs the adjusted QCA file:

And here is the plot of differences

As you see, the adjustments are substantial. I have done a repeat of the GHCN analysis that we did for Darwin, with a histogram of the effect on trends. I will blog about this shortly. Current estimate is that this adjustment is in the top 10%.


  1. Nick, the reason people focus on the adjusted GHCN data, as you know, is that it's the adjusted data that is used by GISS and HADCRUT.

    On the GHCN ftp site you can get v3.0 and v3.1 from November, and see that v3.0-adjusted is, as you say, almost the same as the unadjusted.

    Can you make the v2 file you have available? Or at least plot the v2 adjusted for Reykjavik?

    The interesting question is which of the GHCN 'quality control' procedures (ha ha) are kicking in to generate these weird adjustments. Is the algorithm finding 'breakpoints' at 1940 and 1965 and 'correcting' them? Can you shed any light on this?

    Also, do you know if the GHCN v3 code is available?


  2. Paul,
    I don't think Hadcrut has used the GHCN adjusted data in the past. Maybe they will.

    I have uploaded the zipped v2 Dec 2009 adjusted file here. I didn't include the unadjusted file, or v2 inventory, because they don't change. But I can if you need them.

    Homogenization doesn't interact with quality control, which is mainly about finding clerical errors. As you say, the importance of the file is that it is used by GISS. But the test to apply for that is statistical - do the adjustments create a bias? And if so, is that justified?

    But I agree that here it is probably interpreting the sudden changes in 1940 and 1965 as extraneous. As I said at WUWT, it would be a big mistake to step in and overrule the algorithm piecemeal. The only option is to make a general rule modification and apply it everywhere. For example, make it reluctant to spot a break if it is shared by neighbors. I suspect they do that already.

    I don't think the GHCN adjustment code is available for download. The GISS code is, and has been updated recently; I suspect it will show whether GISS has scrapped its own process in favor of GHCN. I'm checking.

    I think an aspect that has bothered Ole Humlun, and maybe others, is the GISS interface for station data. It used to have unadjusted as first option, then the "suspicious values removed" as the saecond and default option, and then the adjusted. Now the first option is adjusted (they don't deal with the unadjusted any more), and the other menu options don't make much sense (they would be the same). I think that just hasn't caught up, but probably GISS should scrap the facility, and leave it to NOAA. They do it well now, and are the custodians of the data.

  3. Looks like the unadjusted versions are still available via gistemp's station selection page. That is: change the url to the old format for the "data_set"-parameter; 1 or 2. 1 is unadjusted I believe.

    When choosing gistemp's homogeneity adjustement this is the result:

    More like the unadjusted...

  4. The GHCN 3.0.0 code (Menne & Williams 2009) was available from the NCDC FTP site; that's where Dan Rothenberg got it to do the ccf-homogenization work last summer. The 3.1.0 code is not available yet. My understanding is that researchers need to get some sort of permission to post code on the internet. Bureaucracies can be like that - possibly a hangover from NCDC's military past? Who knows. Anyway, I believe that the code will soon appear.

  5. Thanks, Nick, I didn't know that. It's in the USHCN directory. I've downloaded it; there's a lot of documentation.

    Daniel Rothenberg's project is described here.

  6. Indeed, I put that blog post up myself. Note that the code which Dan started with was apparently not exactly the same as the code which was being used at the time (last summer) for GHCN 3.0. Matt & Claude had already made some improvements, which then combined with fixes for bugs identified by Dan, for the 3.1.0 code (the process is described in their tech note). See here.

  7. I find this wailing and gnashing of teeth somwewhat counter-productive. We need to start from a realization that none of these measures were made in a metrologically traceable manner to SI standards, that change has been ubiquitous even at the best managed sites, and so adjustments are required. In the absence of complete metadata as to changes at each station there will be inevitable uncertainty in how to do the adjustments. For climate trends this is the elephant in the room - because (lack of) adjustments yield red-noise changes which have increasing influence with trend length. Key is to get the adjustments right. There are ways that we can do this better than we have in the past see e.g. this recent effort to benchmark the performance of the algorithm in question over the CONUS sub-domain. But what we really need is more efforts to look at the adjustments problem which is the problem that matters from a long-term change perspective. Then we need a consistent set of benchmarks. More on this at BAMS article.

  8. I left a trailing html tag somewhere in the above. Sorry. My bad for writing quickly and not checking.

  9. Peter is right, of course, and I totally support his continuing efforts to develop homogenization testbeds/platforms/systems, which will allow us to improve this whole area of the science. I'm hoping to spend some time thinking about this in February (and revisiting Dan's code, if I have the time).

    I'm happy to be reminded of this excellent piece of work, which showed very clearly for GHCN v2 that there is a fairly symmetric (and fairly narrow) spread of trend effects due to homogenization, with very nearly zero net effect on global trends. I'd be glad if you would retread that for GHCN 3.1.

    Any homogenization process is bound to have trend effect outliers: stations for which the homogenization has a comparatively large effect on the long-term trend. Because the data is so noisy, it is inevitable that some of the adjustments producing those outliers are 'incorrect': that is, the station record at a change point reflected ground truth and not an inhomogeneity. Those are the eggs we need to break to make the climate omelette. Overall studies of homogenization processes, like the excellent recent Williams/Menne/Thorne paper in JGR-A, show very clearly that they are valuable in recovering climate signal from inhomogeneity noise.

    1. Nick (S), thank you very much for posting up the v2 file from Dec 2009.

      Looking at the data for Reykjavik shows that the v2 adjustments cool by about 0.9 in the 60s and 70s, increasing to 1.3 before 1940 and 1.7 before 1920.
      So the adjustment-fabricated warming in v2 is similar in magnitude to that shown in your graph for v3.1, though more monotonic. It is a similar picture for Stykkisholmur, with downward adjustments of around a degree before 1970.

      So to recap, we have fairly monotonic past-cooling adjustments in v2, no adjustment in v3.0, and then very erratic adjustments in v3.1.

      Incidentally I also found some sign errors in the v2 adjustments for Stykkisholmur.
      The unadjusted file for the first few months of 1973 and 1974 are
      1973 22 -34 4 21
      1974 -1 -11 26 54
      and the adjusted file is
      1973 -22 -34 -4 21
      1974 -1 -11 -26 54

      Regarding breakpoints, it would, as you say, be sensible if the algorithm checked nearby sites to determine whether an apparent break was genuine or spurious. It's pretty clear that the GHCN algorithm doesn't do this, since the genuine cooling in 1965 is erroneously adjusted away by the GHCN algorithm in 7 of the 8 icelandic stations.

      Regarding the GISS adjustments, the current situation is absurd. GISS starts from the GHCN adjusted data, which includes (erroneous) adjustments for breaks and homogeneity adjustments. It then applies its own algorithm to remove 'suspicious records', and then applies its own homogeneity adjustment!
      It's amusing to see that the GISS homogeneity adjustment puts back some of the 1940s warmth that was deleted by GHCN (0.7 degrees in 1940).


    2. PaulM: the GHCN algorithm is all about neighbouring station records. That's how it works. I recommend you read the papers on it.

    3. Nick,
      I'm planning a post which will show the V3.1 histogram, coupled with a Google Maps visualization so you can see which stations have large positive or negative adjustments. It should show up things like this Iceland pattern.

    4. Nick B, perhaps you haven't been following this story.
      I recommend you go and look through the posts at Paul Homewood's blog.
      In all 8 of the iceland stations unadjusted data, there is a sharp drop around 1965, and this is well established from a number of sources.
      In 7 of the 8 cases the GHCN adjustment algorithm incorrectly puts in a break here and gets rid of this sharp drop.
      Similarly, the raw data consistently shows a warm period around 1930-1940 which is also established in the literature on air temps and SST temps (papers by Hanna et al). In most cases the GHCN algorithm adjusts these downwards.
      So whatever is flagging the GHCN adjustments, it's not nearby neighbours. If the algorithm looked at neighbours in any sensible way it would know that the raw data was valid.

      Here's what the Iceland Met Office says:

      ' The GHCN "corrections" are grossly in error in the case of Reykjavik but not quite as bad for the other stations. But we will have a better look. We do not accept these "corrections". '

      Of course if they would publish the code, the error could be found fairly easily.


  10. As for GISTEMP, until recently it was using unadjusted GHCN data. I gather that in December it was changed to use the adjusted GHCN data instead of doing its own homogeneity adjustments. That's what it says here anyway. The code is here if you want to check.

    1. NickB, look at the dropdown options and download the data for Reykjavik for option 1 and option 3. You can see that GISS is still dong its own homogeneity adjustment after the GHCN one (in fact somewhat correcting the GHCN, as I said above).


  11. Could someone explain why it is that when ever adjustments are made the effect is to always cool the past while not touching the present?



  12. Mailman,
    "when ever adjustments are made the effect is to always cool the past "
    That's just not true. In the Darwin post that I linked, I showed the histogram of effects of V2 adjustments on trend. It's balanced; the trend is almost as likely to be lowered as raised.

    That post was a response to a WUWT post on adjustments to Darwin, where adjustments raised the trend. I showed that adjustments to Coonabarabran, also in Australia, lowered the trend there equally.

    It's true that adjustments don't change the present. That's the reference point.