Sunday, June 6, 2010

A temperature data collection for comparisons

I have been collecting global surface data series for comparison with models. In the process I've made a big table, which is handy for reading into R. It contains:
1. The surface land and ocean indices - Hadcrut3, GIStemp and NOAA (current to April 2010)
2. The two satellite lower trop indices - UAH and MSU RSS (likewise)
3. A collection of model results for surface air temp (SAT). There are 24 models, using SRES A1B (scenario), and a total of 57 runs.
The table goes from 1850 to 2300, although of course most columns have blanks before and after the real data.

The model run results were collected from Geert's KNMI site using Steve McIntyre's script.

There are two auxiliary files which give data describing the columns, and a readme.txt. They are all in the file in the document repository.


  1. Nick

    I also have a collection of the 5 major global temperature series. I update it monthly, its available at this link as a csv file.

    Kelly O'Day

  2. Thanks, Kelly,
    I'd been thinking about an update mechanism, but it probably makes sense to just use yours.

  3. It would make sense to separate the models from the obs, since the former won't change. And Kelly has them anyway.

    Useful metadata would be the baseline used for each data set's anomaly calculation. I know them for the observation, but I have no idea how the models are done.

    For some kinds of analyses you want a common baseline across all data sets. 1980-99 is a commonly used one for the models (last twenty years of 20CEN).

    Of course that doesn't matter for trends.

  4. I have other items on a wish list (e.g. annual series).

    But mostly I just want to say - thanks so much for doing this!

  5. Deep,
    I agree that I probably won't keep the indices updated. I put everything in one table because in R at least, you can then read them into a big structure and experiment with comparisons.

    I agree annual is a good idea, and I've put up a new zip file. annualtable.txt has the same column structure as modeltable.txt, but with annual averages (calendar year). The file annualends.txt has the same structure as coldata.txt, but points to the start and end rows in annualtable.txt, for each column.

  6. Nick & Deep

    I'm thinking about adding several additional series to the 5 temp anomaly series, including:

    o CO2 - Mauna Loa
    o SOI
    o PDO
    o AMO
    o Nino34
    o SATO
    o SST (Hadley)

    My idea is to give users a single file that they can download as CSV and use Excel, R or whatever to analyze and plot the data as they see fit.

    Getting the data in a usable format takes time, this consolidated file would save users a lot of data manipulation time and effort.

    My approach is to download all source files each month so that I capture any edits that may have been made in individual data series.

    I have the R scripts for the pieces. Any thoughts before I start combining the series?

  7. Kelly,
    I think that is a very good idea, and CSV would be very suited to Excel users, and also works for R. You might like to add the Model E forcing data. Tamino and Lucia have collections of links which might give other ideas.

  8. Nick,

    I think it would be a good idea to look at 20CEN model runs.

    The correct way to analyze the A1B runs relative to 20th century is to first "splice" them onto the corresponding 20CEN runs. This was probably done in most cases.

    But in some cases (three models and 7 runs in all), there is no 20CEN portion, as seen below (this is from coldatafn, a merge of coldata and filenames).

    * 19 14 1813 3013 giss_aom
    * 20 14 1813 3012

    * 29 17 1801 4201 iap_fgoals1_0_g
    * 30 17 1801 4201
    * 31 17 1801 4200

    58 27 481 2999 ncar_pcm1
    59 27 481 5401
    * 60 27 1801 4201
    * 61 27 1801 3000

    The starred runs (*) don't have pre-2000 data. However, some of these may have the corresponding 20CEN data only in the 20CEN run data files, and have not repeated it here. If any of these runs do not have corresponding 20CEN runs for some reason, then they can't be used for certain kinds of analysis.

    It's also possible that some of the other runs do not have the corresponding 20CEN data (i.e. the pre-2000 data is in error or is actually "something else"), although I imagine that should be a rare problem.

    Feel free to get in touch if you wish to discuss further.

  9. Thanks for the advice, Deep. These files are a by-product of an effort I was making to estimate the variability of model results, to check on comparison with measured indices. I didn't get far with that, because I ran into an odd effect. Many of the runs have a period of great variation, typically around 2050. A typical plot is here. Do you have any ideas about that?

    I have added a full collection of plots of the monthly model data from the table