Wednesday, April 14, 2010

Incorporating SST and Land/Ocean models

The part of Gistemp that most people who look into it shy away from is where the gridded ocean data is included to make a global land/sea index. With some reason, because it's awkward, but not particularly challenging - the HadSST data is already gridded, and what remains is the messy stuff of keeping the grid methods consistent.

Other recent reconstructions have concentrated on land stations too, perhaps because, as Zeke says, the context was the examination of claims that drift in selection of stations creates biases.

The least squares method used here makes the incorporation of ocean data easier, because there is no intermediate gridding step. So I tried it, I downloaded HadSST2 data, and added a set of 2592 (36x72) synthetic stations - one at the centre of each grid cell. HAD uses the same 5x5 grid as I use for weighting for station density. So where the station sits in an ocean-only cell, it is fully weighted as the sole representative of that cell, and where ocean overlaps land, the new station is weighted according to the number of land stations in the same cell. If the cell is land only, there is no new data, so that cell does not add to the analysis.

New code

This all required some new coding. That's tied up with some other changes, so I'll post ver 1.3 shortly. The change was mostly in the preprocessor, which rearranges the Had data to the GHCN format and creates the new station file.

A summary of this series of posts is here.


Here is the table of trends from my analysis:
Trend se0.0045010.0038790.0059790.00381
Trend se0.02240.01780.031690.0171

Update: I hope to be routinely quoting standard errors for trends. At present these are based on OLS - no correction for correlation of residuals. So they are smaller than the AR4 values below.
I chose the time intervals for comparison with Table 3.3 from the AR4:
Temperature Trend (oC per decade)  
Dataset  1850-2005  1901-2005  1979-2005 
Northern Hemisphere 
CRU/UKMO (Brohan et al., 2006)  0.047 ± 0.013  0.075 ± 0.023  0.234 ± 0.070  
NCDC (Smith and Reynolds, 2005)  0.063 ± 0.022  0.245 ± 0.062  
Southern Hemisphere  
CRU/UKMO (Brohan et al., 2006)  0.038 ± 0.014  0.068 ± 0.017  0.092 ± 0.038  
NCDC (Smith and Reynolds, 2005)  0.066 ± 0.009  0.096 ± 0.038  
CRU/UKMO (Brohan et al., 2006)  0.042 ± 0.012  0.071 ± 0.017  0.163 ± 0.046  
NCDC (Smith and Reynolds, 2005)  0.064 ± 0.016  0.174 ± 0.051  
GISS (Hansen et al., 2001)  0.060 ± 0.014  0.170 ± 0.047  


Here my plots are compared with the published gridded data, taken from WFT. The smoothing uses the 13-point filter from AR4 Chap 3

The anomaly base periods have been adjusted to match in each case to the published data.


  1. I don't know if it's just me, but I have trouble viewing the second column of plots. They're cut off.

    But congrats on taking this step; I don't think any of the other home brewers have done ocean yet.

  2. CE, Thanks. The view just depends on window size. If you can stretch the window, they come good. That means a 22" screen.

    You can also right click and ask them to appear in their own window..

  3. CE, following up - even with a smaller screen, it depends on your zoom level. In Firefox, Ctrl- makes everything a bit smaller, but you see more of it. Ctrl+ gets you back.

  4. How easy would it be to incorporate HadISST?

  5. cce,
    well, first you actually have to get the data. After that, it is in principle easy. The grid is 1x1, so there are 25x as many stations to create. About 50000 in total. That's then a much bigger calculation, It would swamp the memory of my little nachine, but a bigger computer could do it.

  6. Interesting. It shouldn't be too hard to add this to my model; just need to figure out what to do if the same gridcell has both land and ocean temps (e.g. average them, drop the ocean temp, or drop the land temp).

  7. Ugh, the formatting of that HadSST2 is going to be a big pain to parse in STATA. I don't suppose you can export it in the same format as GHCN (e.g. a single row would be station_id, year, jan, feb, ..., dec) for me and stick it up on

  8. Zeke:

    Brohan 06 has a method for handling the cells with both land and ocean ( not optimal but its ok ) Giss hmm ( ask nick barnes)

    Nick, reformating to GHCN would be a great idea and neat tool to add to the little package I am working on.

  9. Zeke,
    Sorry I missed this earlier. The reformatting is what the new preprocessor does, to make a combined file. I've extracted the SST part and put it on the repository as

  10. Zeke,
    On what to do with mixed land/ocean - yes, I think this is the main reason for discrepancy between mine and GISS, and could do with more investigation. I let the weighting handle it, so if there is one land, one ocean in a cell, they weight equally, which is reasonable for a single island, say. But it probably lets the land stations swamp the SST in some cases.

  11. Thanks Nick.

    Is there somewhere I can find the lat/lon associated with each of those SST "station" IDs?

  12. Zeke,
    I should have included the corresponding inventory file. I'll do that. But it starts
    7281 87.5 -177.5
    7282 87.5 -172.5
    9872 -87.5 177.5

    I should have mentioned that in that file, I've removed all data 1850-1879, and also all cells with less than 80 months of data in total. I can do a run varying those restrictions if you like.

  13. ya Nick, see brohan 06 for the CRU weighting scheme. Roman and I have discussed it on a few occasions.

  14. Steven,
    B doesn't really give a scheme. He says how you should combine them knowing the errors, but I don't have that information, and neither I think did he. His bottom line is that it doesn't have much effect on the large scake averages and not much trouble is justified. That's my experience too. I tried upping the sea cell weightings by a factor of five, and it made no perceptible difference.

    I discrepancy I have with GISS and Had is less than what they have with each other (I'm closer to Had), so no ideal agreement is possible.

  15. Nick,

    The modified.inv file seems to have ocean station ids in the form 0001-2592 while the SST file has ids from 7389 to 9728. Can I just map 7389 to 0001 and go from there?

  16. Cool Thanks Nick, one thing thats nice about having your own code is being able to address theoretical questions ( best way to weight ) with practical approaches.. try different weightings and see if it matters or not.

  17. Zeke,
    My system is that the numbers in the "station numbers" in the data file are pointers to lines in the .inv file. There are 7280 land stations, and the sea stations then continue from 7281 to 9872. At both ends, a lot of icy cells have no data - they needs 80 months to appear in the data file (that's adjustable). I don't currently use the sea station ID's in the .inv file for anything.

  18. Zeke - did you find the modified.inv zip in the dropio? That's the file that the data file is pointing to.

  19. This comment has been removed by the author.

  20. Ahh, pointers to the line number. Tricky!

  21. hey nick,

    have you ever used sweave? if you do a write up Sweave would be cool.

  22. Steven,
    Yes, sweave might help - I use Latex. Currently I mainly go straight from R to html, where Sweave doesn't seem to help - dunno why not.

  23. When I get a chance I'll look into it.

  24. Zeke,
    I've discovered that the file I sent had an error in the dating. What I had recorded as 1881 should have been 1880. And since I only went to 2009 (I thought), in fact 2009 data is missing.

    It doesn't make a big difference, but I've put a replacement on the It also includes the fairly small amount of 2010 data.

    It's in the V1.4 style, where cells that don't have data (eg land) aren't included in the inventory. My latest V1.4 post shows the map of the stations.