Saturday, October 22, 2011

A combined KMZ file for BEST, GHCN, GSOD and CRUTEM3

Update I have modified the ALL4.kmz file, which you can download here. I updated the data.txt file, where BEST had mistakenly posted the MAX file. And I found the problem which had led to the previous version showing very little early BEST data. BEST had a  column saying how many days in each month had readings, and I set a filter to require at least 10 days. However, much of the early period had this set to -99, which meant the data was rejected. I have removed the filter, and now there are a lot of pre-1850 sites.

This is a development foreshadowed in the previous post. I have put a combined .kmz file with data from 4 land station datasets. GHCN is actually v2, but there is very little difference at this level between v2 and v3. CRUTEM3 is the version released in July, discussed here. GSOD was discussed here.

There is now much more information - in fact, when you open the file in Google Earth, it looks colorful but cluttered. The pushpins are colored according to dataset - yellow for BEST, green for GHCN, red for GSOD, and a sort of dull green for CRUTEM3. They also vary in size - the smallest has 0-30 years of data, next has 31-60, and the largest has more than 60.

But the key to looking it is that the data is stored in folders. At the top level, there is a folder for each dataset. At the next level down, they are classified according to start year of data. The ranges are 0-1850, 1851-80, 1881-1900,1901-1920, 1921-30, 1931-40, 1841-50, 1951-60, 1961-70, 1971-80, 1981-90, 1991-2000, and 2001-2011. As I'll show in the next picture, in Google earth you can toggle on/off at any level. If a dataset is on, you can toggle the year folders. If you want to see in any set the years before 1921, just toggle off the later folders.

Update - I had a warning briefly that there were spurious sites in the BEST folder. These had start years of -9.9999 and so went into the pre-1950 folder. That is fixed, but there's a new problem that the pre-1850 folder is almost empty. That could be real, but the BEST Team have done analyses for this period. See below for a discussion of the data.Fixed

The Start year, End year and Duration have been added to the pop-up balloon that you get by clicking on a station. The file is called ALL4.kmz, and can be found here.

Here's a GE snapshot of the toggle facility. You have to click on a few +s to see this. GSOD and CRUTEM3 are not visible. BEST shows only stations with data before 1971. GHCN is visible, but you'd have to open that menu to see which years. I had it matching BEST.

Added: To get the data years for BEST, I used the data.txt file in their PreliminaryTextDataset folder. That looks right, but I need to investigate to see if it includes everything. There seem to be early stations missing.


  1. Doesn't BEST incorporate GSOD?

    Regardless, an interesting study would be to run TempLS on those stations that are reasonably far away from any GHCN stations, thus creating a temperature series using only "new" stations. That would answer those who believe that these results are due to the overlap of data.

  2. CCE,
    I had expected that they use everything available. If you click on the KMZ file with all folders open, you usually find that the pushpin breaks into several, and there are usually one, often two, BEST stations among them.

    Yes, I could do that test. It's a bit fiddly getting the pairwise distances. Another check might be how many BEST stations are effectively duplicates - close together. It seems quite common.