Wednesday, April 28, 2010

Ver 2 - Regional spatial variation.

The last thing I wanted to get working in TempLS before posting was mapping regional spatial variation. But that's been more difficult than I expected. It comes hard up against resolution and the irregular distribution of stations. But these turn up in somewhat unexpected ways. It's been instructive, so I thought I could blog about that while I'm thinking about what to do next.

My idea was to look at roughly rectangular regions in a similar way to the example of the globe with spherical harmonics. A more appropriate family of functions there are trig functions - a standard Fourier series in 2D, although fitted by least squares.

I looked at ConUS, where NOAA has plots that I can compare with. The results were not disastrous, but not very good. The basic reason is that the number of US stations in ConUS (especially currently) is much smaller than the number NOAA would use.

I could use a larger set, eg USHCN. But I'm looking for a general purpose tool based on GHCN that will work for other places that don't have this alternative. I'll describe progress below the jump.

Monday, April 26, 2010

Spatial Temperature distributions in TempLS v2

This is another trailer for v 2. It can fit temperatures for individual years, and I expect even months, though I haven't tried that yet. The model is that data depends on local effects as before, plus just the temperature of the specified year, over space. Of course that doesn't make a good predictor overall, but the aim is just to get the temp distribution for that year that gives the best fit.

A technical snag which I'm still working on is that this model does not generate the info needed to adjust to a base period. It may be necessary to add that as another fitting parameter. However, since I'm currently comparing to GISS, I just add an offset to match the GISS average anomaly for the relevant year.

I've compared these with the GISS plots. I've also gone back to match the trands of the previous post with the GISS trends for the same period, since GISS produces better maps. BY request, I've also plotted the trend for the period 1900-1945 and compared with GISS. All below the jump.

The last big task remaining for V2 is a mechanism for doing all this regionally, where spherical harmonics would be inefficient.

Sunday, April 25, 2010

Plotting spatial trends in TempLS

This is a preview of Ver 2. It is a substantial extension, because it takes the principle of least squares fitting further, fitting spatial approximations to global patterns. Whereas before the spatial variation was attributed to local effects (station latitude, altitude etc), now the global model can be allowed spatial properties.

To do this, it is necessary to have a space of approximating functions. I'm using spherical harmonics - typically up to 200. So far I've analysed a model in which the global temperature is assumed to consists of a linear trend, spatially varying. But it will be possible to do yearly or monthly plots and, I think, trends and spatial averages for sub-regions, subject to enough data being available. I'd be interested in suggestions and comments, which is my excuse for posting at this fairly early stage.

Below the jump, I'll compare trends over the Earth's surface with plots from the AR4.

Tuesday, April 20, 2010

An update on global land/sea reconstructions

Zeke also now has a reconstruction using HadSST2, and has posted a thorough comparison of his, mine, and the major indices.

My earlier post had a small error. The dates were out by a year, so my plots lagged by a year, but were also shifted a little vertically because of the changed base for normalisation. The differences aren't obvious, but I've plotted the global V1.4 versions below, mainly to show the new metadata plots.

Monday, April 19, 2010

The "Bolivia effect"

Trying out V1.4 of TempLS, I looked into the Bolivia effect.
This dwells on the fact that Bolivia hasn't reported any temperatures to GHCN for about 20 years, which is supposed to be a gap in our knowledge. It has had a run at WUWT, and I noticed that it popped up on Google as a suggested search, so someone thinks it is important.

So is it really true that Bolivia creates a big gap? Well, it's true that interior S America is one of the poorly covered regions of the non-Arctic world, so info from there would be useful. But let's look at the maps of V1.4.

Update 15 July - there is a new post which looks at data from the GSOD data base. This has about 30 stations reporting from 1990 to present. GHCN seems to do quite well when compared with this better coverage.

Sunday, April 18, 2010

V1.4 with maps, conjugate gradients

I'm posting version 1.4 of TempLS. It has some new features. One is computational - it uses a conjugate gradient solver instead of the direct solver. This is much faster - at least five times, and more than doubles overall speed. The solution step goes from being the big time consumer to taking just a few seconds for even the biggest problems.

The next is a mapping facility. To use it, you'll need to install the "maps" package. Installing a package in R is easy - just type (with R running)
and answer queries about mirror sites etc. With linux, this is better done as root.

If you don't want to do this, there's a variable at the top of the GlobTempLSv1.4.r called UseMaps - set it to F and everything but maps will be done.

It provides for each selection that you run a world map with the stations that qualified marked as dots. It's a useful check.

V1.4 also provides plot of number of stations in your sample, by year. This helps to keep from overinterpreting when numbers get small.

V1.4 also corrects an error with the ocean modified data, which had the years mislabelled by a year - it said 1881-2009 when it was actually showing 1880-2008. That made a minor difference to results. Plots look not much different now, but slightly better.

The selection function chooser() is simplified using a R switch statement. Now you just add lines to the list like:
"SST" = tv$urban=" ", # the last comma is needed
This works here because for sea "stations" many fields like urban are blank. As usual, the code is on the repository here, and at the document store.

A summary of this series of posts is here.

Saturday, April 17, 2010

The Math description of TempLS

I've put up a file called TempLS.pdf (about 80 kb) on the document repository. It sets out the least squares theory, and goes through the analysis code explaining the implementation.

It may already be a little bit obsolete. It's likely that V1.4 will solve the design system by the conjugate gradient method. The matrix is well conditioned, and the method seems reliable, and at least five times faster, making inversion a matter of  10-20 seconds.

Thursday, April 15, 2010

TempLS - what's been happening, and where it's going

The development of the least squares calculation of a regional or global temperature is now spread over a series of posts, so I'd like to gather together what has been done, and where it might be going. The list of posts is here:

TempLS Version 1.3 - SST, Land/Sea index, distance utility

I need a shorter name for the project. So this is TempLS V1.3, which I used for my previous post on Sea Surface Temperatures (SST) and combined indices. SST is the big change, but there are some others you'll notice.

One is a new facility for calculating distances by lat/lon. You can ask for all stations within a distance of a location, for example. You can even vary the weighting function. So if you wanted to replicate the GISS calculation of all the trend of rural stations within 1000 km of a location, weighted with a distance taper, you can.

The code is rearranged so the selection criteria sit in a function near the top, below the other user-choice items. If you build up a lot of these, you can put them on an external file, which you can include with a source().

As usual, the code is on the repository here, and at the document store. More details below.

A summary of this series of posts is here.

Wednesday, April 14, 2010

Big City Trends

Joseph asked about an analysis for trends in cities over 2M population. I'm not sure that the GHCN database captures this well. I looked up my own big city (Melbourne) and of the 5 stations listed, three were on the city fringe, and may well by climatically similar to any city fringe area.

However, the analysis is easy to do. The instruction I inserted was:
if(tc == "BigCity") i01 = i01[tv$ipop>2000 &tv$urban == "C"];

I also did MedCity, with pop >500,000. Joseph's analysis is here

Incorporating SST and Land/Ocean models

The part of Gistemp that most people who look into it shy away from is where the gridded ocean data is included to make a global land/sea index. With some reason, because it's awkward, but not particularly challenging - the HadSST data is already gridded, and what remains is the messy stuff of keeping the grid methods consistent.

Other recent reconstructions have concentrated on land stations too, perhaps because, as Zeke says, the context was the examination of claims that drift in selection of stations creates biases.

The least squares method used here makes the incorporation of ocean data easier, because there is no intermediate gridding step. So I tried it, I downloaded HadSST2 data, and added a set of 2592 (36x72) synthetic stations - one at the centre of each grid cell. HAD uses the same 5x5 grid as I use for weighting for station density. So where the station sits in an ocean-only cell, it is fully weighted as the sole representative of that cell, and where ocean overlaps land, the new station is weighted according to the number of land stations in the same cell. If the cell is land only, there is no new data, so that cell does not add to the analysis.

New code

This all required some new coding. That's tied up with some other changes, so I'll post ver 1.3 shortly. The change was mostly in the preprocessor, which rearranges the Had data to the GHCN format and creates the new station file.

A summary of this series of posts is here.

Tuesday, April 13, 2010

The impact of war on GHCN observations

For computational reasons I've been studying the Schur complement of the matrix involved in LSS fitting of temperature models. That sounds hopelessly geeky, but it's just something which is basically a measure of the number of missing values in the GHCN station data. If I can predict them, I can get a much faster algorithm. So I plotted the diagonal, and ...

A pretty dranatic marker of the effect of world wars on Met observations.

Monday, April 12, 2010

Continents and trends

A joint post with Steve Hempell

In another run with ver 1.2.1 Steve has focussed on the continents. The trends are interesting, and somewhat different. For comparison, we show the rather rough plot from the AR4 SPM (SPM 4), which seem to be in agreement.

The Output files for this and the previous post are now on the document store.

Table of trends

TrendNorth AmericaSouth AmericaEuropeAfricaAsiaAustralia

The last set of trends, for one decade only, are the least certain. All trends were calculated by OLS fitting.

Latitudinal temperature histories and trends

A joint post with Steve Hempell

Steve Hempell has been using v1.2.1 of the code to analyse temperature history in sub-regions of Canada, and in global latitude bands. Here are his results for the latitude behaviour in 20 deg bands. Steve (his blog handle here) has also made a finer resolution, which we will present in a future post.

The results confirm the high recent uptrend in the Arctic, with relatively small trends in the SH. Remember, though that these are land stations only, and do not give a good picture of the whole SH.

Table of trends

Trend Tropics NExTropicsNMidLat NHighLatSExTropics SMidLat SHighLat
C/Decade20N to 20S40N to 20N60N to 40 NAbove 60N20S to 40S40S to 60SBelow 60S
1900 - 20090.0578 0.0667 0.105 0.1015 0.0511 0.0409 0.190
1978 - 20090.174 0.323 0.353 0.495 0.11 0.00463 0.1299

Tuesday, April 6, 2010

Ver 1.2.1 - Various results

I tried some of the more exotic options that I describeded in the last post. There was ConUS, the "Kept" and "cut" GHCN classes - referring to those that were maintained after the GHCN project finished its historical phase, and those which were not. Conspiracy-oriented folks have been suggesting that GHCN dropped preferentially stations that were not warming. I also checked B.C. and Africa.

I made some slight changes to the code. I mentioned last post the fix of Carrick's bug; this time I found issues when selections had years in which there was no data at all. These related to plotting - they had been set to zero, which could affect plot and trend - now they are set to NA, which R treats well.

A summary of this series of posts is here.

Version 1.2 of GHCN processor

I have put up version 2. It has some extra capabilities, and some improvements in layout. Like V1.1, it is eparated into a preprocessor and a main processor. You should only need to run the preprocessor once, unless a new v2.mean comnes along (or a new code version, as here).

I think users will find this version more convenient and powerful in selecting categories. You can control the year intervals for plotting and trend calculation (separately). The preprocessor now passes a file statyrdata.txt which gives the start and end of each station record, and the length (total years).

The code is here and at the document store.

Update. Carrick pointed out a bug a while ago, which I found frustrating, because I couldn't reproduce it. It turns out that it was due to bad treatment of the end year, and I couldn't see it because I was using a Jan version of v2.mean which didn't have 2010 data. Anyway, it's fixed now (thanks, Mosh) - so I'll update with v1.2.1 files. 

A summary of this series of posts is here.

Sunday, April 4, 2010

Tips on File Download

I see there have been some problems in getting files down and running, so here are some suggestions that may help.

More GHCN results.

I've put up version v1.1 of my least squares GHCN code. It now makes it possible to easily analyse subsets using any of the information on the inventory file. I've used it to calculate a number of much discussed examples. The code now runs very fast. On my fairly old PC, the biggest case (global) takes less than 100 secs. So although each special case is calculated pretty much from scratch, there's not much penalty there. The code is here, and also on the document repository.