- GHCN processing algorithm
- Comparison of GHCN results
- More GHCN results.
- Tips on File Download
- Version 1.2 of GHCN processor
- Ver 1.2.1 - Various results
- Latitudinal temperature histories and trends
- Continents and trends
- The impact of war on GHCN observations
- Incorporating SST and Land/Ocean models
- Big City Trends
- TempLS Version 1.3 - SST, Land/Sea index, distanc...
- The Math description of TempLS
- V1.4 with maps, conjugate gradients
- An update on global land/sea reconstructions
- Plotting spatial trends in TempLS
- Spatial Temperature distributions in TempLS v2
- Ver 2 - Regional spatial variation.
- Just 60 stations?.
- A success for open coding!
Least squaresThe first post, GHCN processing algorithm, introduced the basic idea, variants of which Tamino and Jeff/Romanm had been using to combine stations in grid cells. The model is that the array of station temperature observations, over month and year, can be described as the sum of a local function L, depending on station and month (not year), and a global temperature G, depending only on year. The idea is that L contains the unwanted effects of site variability and seasonal effect, while G is the item of interest.
A weight function is used for the fitting. This has a value for each month-year station combination, and is the product of two components:
1. It is zero if there is no observation
2. It varies inversely with area density of stations.
The density is estimated with a 5deg x 5 grid. the stations falling in each cell are counted, and the cell area divided by that number is the cell estimate.
So a major difference with other methods ia that no grid averages are calculated. A grid is invoked, but only to count the stations contained.
The LS leads to a big linear system (t(B) is transpose):
A1*L + B*G = R1
t(B)*L + A2*G = R2
In principle, there could be about 90,000 equations and unknowns. But A1 and A2 are diagonal matrices, so L can be eliminated (the time consuming step), giving about 100 (number of years) equations for G. This takes about a minute of computing time for a global set. Total time for big global runs is generally less than two minutes.
That first post then pointed to the R code, and gave some basic results for land-only GHCN global data.
The next post, Comparison of GHCN results, was meant to compare the results with other efforts and the major indices. However, Zeke had done the hard work there - mine just slotted in, and seemed to give results very much inl line with the others.
A side-effect of using R is that it is easy to create all sorts of masks for selecting different kinds of sites. The GHCN file v2.temperature.inv is read in as a dataframe, and each of its components can be used in logic tests. This includes latitude and longitude, country code, altitude, pop, urban status, etc. So the next post, More GHCN results., announced Ver 1.1, and showed some results for much-discussed classifications like urban/rural, airports etc. The main V1.1 change was to separate out the process of removing duplicates into a preprocessor program, since it only needs to be done once for any version of v2.mean.
The most commented post was the next, Tips on File Download. The lead-up to V2 was the effort to make the process more user-friendly - the adoption of a text repository, and structuring the code so the parts needed by users were visible at the top.
The next post, Version 1.2 of GHCN processor, described these improvements in detail.
Using the capability
The stimulus prompting much of this effort was Steve Hempell's determined efforts to get the code to work for him. The code was much improved by this (and helped by Carrick, Mosh and others). I posted another set of tests of some past issues - whether "dropped" GHCN stations had been trending down relative to "kept" stations, for example. This used an extension to the inventory categories whereby stations could be selected also by reporting history.
Then Steve got to work on the questions that were interesting him. Two posts were based on his thorough examination:
Latitudinal temperature histories and trends and Continents and trends.
The Big City Trends post related to an interest that Joseph had been exploring with his code.
Incorporating Ocean data
The least squares approach makes it relatively easy to combine data from multiple sources. The next challenge was a full global index, and this was done in V 1.3. It involved more preprocessing, so the inventory file was augmented by a series of synthetic stations based on the HadSST2 ocean grid cells. These could then be included in the fitting in the normal way. The cell density weighting took care of scaling issues. The results were described in Incorporating SST and Land/Ocean models. The fit to SST is excellent - the fit to combined land/sea is still good, but with more disparity. This could probably be improved by better balancing at the shore. V1.3 is described in TempLS Version 1.3 - SST, Land/Sea index, distance utility.
Probably more applications posts. On the technical side, using the R global mapping capability. I expect V1.4 will routinely give a map showing the distribution of stations in each run. But beyond that, LS makes it possible to use other parametrisations, particularly spatial. We could show, say, trends expressed in terms of spherical harmonics. R does not seem to do shade plots, but its filled contours aren't bad.
In another direction, finer temperature plots will be possible. Monthly fitting needs little more calculation than annual. The months can be done separately, but for the big elimination step, that only reduces the computing by 1/12, which then has to be done 12 times.