Monday, October 19, 2015

How well do temperature indices agree?

In comparing TempLS integration methods, I was impressed by how RMS differences gave a fairly stable measure of agreement, which was quite informative about the processes. So I wanted to apply the same measure to a wider group of published temperature indices, which would also put the differences between TempLS variants in that context.

There are too many pairings to show time series plots, but I can show a tableau of differences over a fixed period. I chose the last 35 years, to include the satellite measures.

It is shown below the fold as a table of colored squares. It tells many things. The main surface measures agree well, HADCRUT and NOAA particularly. As expected, TempLS grid (and infilled) agree well with HAD and NOAA, while TempLS mesh agrees fairly well with GISS. Between classes (land/ocean, land, SST and satellite) there is less agreement. Within other classes, SST measures agree well, satellite only moderately, and land poorly. This probably partly reflects the underlying variability of those classes.

As an interesting side issue, I have now included TempLS variants using adjusted GHCN. It made no visible difference to any of the comparisons. The RMS difference between similar methods was so small that it created a problem for my color scheme. I colored according to the log rms, since otherwise most colors would be used exploring the differences between things not expected to align, like land and SST. But the small differece due to adjustment then so stretched the scale, that few colors remained to describe the pairings of major indices. So I had to truncate the color scheme, as will be explained below.

I am now including the adjusted version of TempLS mesh in the regularly updated plot, from which you can also access the monthly averages.

To recap, I am calculating pairwise the square root of the mean squares of differences, monthwise. I subtract the mean of each data over the 35 years (to Sep 2015) before differencing. Colors are according to the log of this measure. The rainbow scheme has red for the closest agreement. The red end of the scale finishes at the closest pairing involving at least one non-TempLS set. Pairings beyond that red end are shown in a brick red. Later I'll show color schemes with this cut-off relaxed. So here is the pairwise plot, with key in °C. If you want the numbers, they are here html, csv
AbbrevDataset nameLink
HadCRUTHadCRUT 4 land/sea temp anomalylink
GISSloGISS land/sea temp anomalylink
NOAAloNOAA land/sea temp anomalylink
UAH6.betaUAH lower trop anomalylink
RSS-MSURSS-MSU Lower trop anomalylink
TempLSgridTempLS grid weightinglink
BESTloBEST Land/Oceanlink
C@WkrigCowtan/Way Had4 Kriginglink
TempLSmeshTempLS mesh weightinglink
BESTlaBEST Landlink
GISS TsGISS Ts Met stations temp anomalylink
CRUTEM 4CRUTEM CRU global mean Stationslink
NOAAlaNOAA land temp anomalylink
HADSST3HADSST3 Sea Surfacelink
NOAAsstNOAA sea temp anomalylink

Some points to make, in no particular order:
  • TempLS interactions are bottom right. Adj means variants using adjusted GHCN. You can see that the differences in integration method makes much less difference than the variation elsewhere between different indices/datasets.
  • The difference due solely to adjustment is even less - this will be quantified better below.
  • The main global surface indices are top left. NOAA and HADCRUT are particularly close. I'll show comparisons with TempLS in a later plot. BEST agrees moderately with the others; C&W (Cowtan and Way kriging) notable better with GISS and worse with NOAA, and only moderately with HADCRUT, which it sought to improve (meaning probably that it succeeded). The agreement with GISS makes sense, since both improve coverage by interpolation.
  • The troposphere indices RSS and UAH agree only moderately with each other, and with others not much at all.
  • The land indices agree not much with each other, and BEST and NOAA diverge widely from other measures. CRUTEM and GISS Ts less so. Of course, GISS T2 is land data, but weighted to try for global coverage.
  • SST data agree well with each other, and not so much with global (about as well as UAH and RSS). Some agreement is expected, since they are a big component of the global measures.

Here is a plot of just the global surface measures. It shows again how there is a GISS family and a HADCRUT/NOAA group. The distinction seems to be on whether interpolation is used for complete coverage, upweighting polar data.

And here are the plots with the color maps extended. On the left the cut-off level is the minimum of the TempLS plots with different methods. It emphasisees how little difference integration method makes compared with differing indices. And on the right is the map with no cut-off. You can see that it is now dominated by the four cases where only adjustment to GHCN varies. Otherwis, same data, same method. Adjustment makes very little difference. It also shows why I originally restricted the color range. In this new plot, everything else is blue or green.


  1. Nick, Thanks for including adjusted TempLSmesh in the active graph. It makes it possible to demonstrate the effects of GHCN adjustments in an operative global dataset on a more permanent basis...
    The trend in adjusted vs unadjusted TempLSmesh is only 0.05 C/century larger from year 1900 (0.80 vs 0.75).
    The effects of adjustment diminish with time and reverses at 1970, after that the unadjusted version has the largest trend.
    From 2000 the trend of the unadjusted version is 1.5 C/century vs 1.3 for the adjusted. I believe that 1.5 is the highest trend of all global datasets.
    Anyway, this demonstrates that GHCN adjustments are partially contributing to the infamous "hiatus". The main reason is the seemingly unfair cooling of Arctic stations, but adjustments also take down the eastern Sahara, and South Pole (Amundsen Scott)

    Nick, the gadget in your old blog post, Feb 5, had an other functionality. In trendback mode it was possible to push the data button an get all back trends in tabular form.
    There are new versions of TempLS and newer data now, but for comparison, the trends in the Feb 5 gadget, from 2000 trough 2014, was 0.975 and 1.263, for adjusted and unadjusted TempLSmesh respectively.

    1. Thanks, Olof
      The data button is meant to give the numbers for what is actually plotted, so I'll fix it in the data page so it has the Feb 5 functionality. It's interesting to compare, as you did. The trends since 2000 have risen, partly because of recent months warmth, but also because of ERSST v4, which lowered some temperatures around 2000. The Feb 5 results were using V3b.

  2. #endTheRainbow

    Maybe I should read this blog more often, but could you add a list what datasets are behind the various abbreviations?

    You computed the RMSD on a monthly scale, it would be interesting to do this on a annual scale, if only because GHCNv3 is only adjusted on an annual scale.

    At what spatial scale did you compute the RMSD?

    Interesting that NOAA fits better to HadCRUT, than to GISS because GISS land uses the homogenized land data of NOAA nowadays.

    1. Victor,
      Yes, I'll add a list - probably of links to the datasets. I'll put it on the datasets page. I maintain an R dataframe of monthly averages, and these are the names from that, so they occur often. I append "lo" for land/ocean, "la" for land only. LS is for TempLS (unadjusted GHCN), and Adj for TempLS with adjusted GHCN. Details on the LS variants are here.

      RMSD on an annual scale would be less, because of noise damping, but I think should give the same pattern.

      I don't have a spatial scale. I'm just using the published monthly time series for the spatial (global) average.

      Yes, I was surprised that GISS stood apart from HAD and NOAA so much. I think it is the coverage issue, mainly poles. Actually, I shouldn't have been surprised, because I did a similar study on a more limited scale in the early days of TempLS, with similar results.

    2. Thank you, now I get it much more. That there is little difference between GHCNv3 adjusted and raw is no surprise (for a scientist), between 1970 and now, the adjustments have nearly no net effect.

      That the tropospheric temperatures have a higher differences is also because they have more variability. If you would look at correlations, they may fit better with each other.

      Figures comparing like with like would be helpful in interpreting the graph. Land, Ocean, World.

      Global is a spatial scale. The largest we have for this globe.

  3. Nice work so far, Nick!
    However, a bigger challenge is here now, the new GHCN v4 beta.

    25 000 stations, is that easy to handle? Or is it better to make a selection with long term, rural, and good geographic coverage, to reduce the number of stations?
    Anyway, you have the opportunity to be first with an operative dataset based on the newest...
    ( If NOAA doesn't do it this month)
    Karl et al 2015 doesn't count, and they only kriged the Arctic as I remember..

    1. Olof,
      Interesting, thanks. One practical question of course is, how punctual will they be.

      The other question is whether it gives a real boost to coverage. Especially current coverage. At the moment, I think GHCN uses all the stations that report under CLIMAT.

      Anyway, I'll check it out. There is no problem in principle in using it.