Friday, May 17, 2019

GISS April global down 0.12°C from March.

The GISS land/ocean temperature anomaly fell 0.12°C in April. The anomaly average was 0.99°C, down from March 1.11°C. It compared with a 0.097°C fall in TempLS V4 mesh. As with TempLS, because March was so warm, dhe drop still meant that last month was the second warmest April in the record.

The overall pattern was similar to that in TempLS. Warm in all of Eastern Asia, especially Siberia. Warm in Europe, NW Canada and Alaska, and much of Africa. Mostly cool in Antarctica.

As usual here, I will compare the GISS and previous TempLS plots below the jump. I am planning to post in the next few days about a better style of TempLS plotting

Saturday, May 11, 2019

April global surface TempLS down 0.097°C from March.

The TempLS mesh anomaly (1961-90 base) was 0.872deg;C in April vs 0.969°C in March. As with the very similar drop (0.086) in the NCEP/NCAR reanalysis base index, that still makes it a very warm April, the second warmest in the record. It was the warmest month since March 2017.

GHCN V4 results were a little later this month. I've noticed a reversion to the V3 custom of reporting by country; earlier it seemed that most countries were represented at an earlier stage. I'm a bit late reporting, too, because I found that the preliminry results changed from day to day. But I think all the main countries are in now.

This time the very warm Arctic region was in NE Siberia, with a rather cool spot NW of Canada. Europe was warm, and China, especially W.  Africa was warm, the Middle East rather cool.
Here is the temperature map, using a spherical harmonics approximation to the mesh results.



And here is the map of stations reporting:




Friday, May 3, 2019

April NCEP/NCAR global surface anomaly down 0.086°C from March

The Moyhu NCEP/NCAR index fell from 0.582°C in March to 0.496°C in April, on a 1994-2013 anomaly base. March was a warm spike, though, so that still leaves April as the second warmest April in the record.

It was very warm in the Arctic, including over NE Siberia and Greenland, but once again cold over the Canadian Archipelago. Cold in Spain and the W Sahara. There was warmth From Mongolia through to N India and Thailand. Quite cool in the Middle East, from Egypt through to Iran.

The BoM ENSO Outlook is remains at Alert - but "...This indicates that if El Niño does develop, it is likely to be short-lived and weak." Still, it's already warm. .





Wednesday, April 24, 2019

Tests and results from TempLS v4 global temperature program.

I have been writing a series of articles leading up to the release of V4 of the prgram TempLS, which I run on the monthly data from GHCN (land) and ERSST V5. The stimulus for the new version was the release of GHCN V4. I use the program to compare with the major indices, which are basically GISS, NOAA, HADCRUT and BEST. I'll make comparisons of the TempLS output with those indices here.

I have been developing different integration methods, with the basic idea is that the agreement between a number of good methods with different basis is a guide to the amount of uncertainty that is due to method. I think it is quite small, and I will show comparative graphs. The key thing is that the different methods within TempLS agree with each other better than the indices agree among themselves.

One of the methods I have been exploring is the use of Spherical Harmonics (SH). This is not so much an integration method as an enhancement, and is so treated in TempLS V4. So the agreement between enhanced but otherwise inferior methods, and the better methods that are not enhanced, is further corroboration of the convergence of integration methodology.

I will illustrate all this with an active graph, of the kind I have been using for the index results themselves. You can, with mouse, change scales, translate axes, choose various subsets to plot, and also smoothe. An added facility here is that you can switch to difference plots in any combination you choose. I will at some stage post a facility like that for WebGL for doing this.

The post will be long, so I start with a table of contents.

Table of contents

Brief summary of methods

Some recent articles are here, here and here, with links to earlier. The methods I'll list are
  • No weighting - the simple average of stations reporting in each month is used. This is a very poor method, but of interest here because it becomes quite serviceable after SH enhancement.
  • Simple grid. This is the traditional method, where the temperature anomalies within each cell of, say, a lat/lon 2° grid, are averaged, and the global is then the area-weighted average of the cells that have results. Area-weighting relates to the shrinking of area of cells near the poles. It is used by HADCRUT; the paper of Cowtan and Way showed that the effect of accounting for cells without data gave an important correction to trends. I do not now use lat/lon grids, but rather a cubed sphere, or other platonic solid based grid. The alternative is usually icosahedron. In each case I use mappings to make the cells of almost uniform area.
  • Grid with infill. This assigns to empty cells a value based on neighbors. Most recently, I do this by solving a Laplace equation, with the known cell values as boundary condition. That sounds complex, but the simple Southwell relaxation method, , which initially guesses the unknown cells, and then replaces with the average of neighbors until convergence, is quite adequate.
  • Irregular triangular mesh. This has been my workhorse; it is basically finite element quadrature, with linear interpolation within triangles with stations as vertices. I have thought of it as my best method.
  • First order LOESS. This sets up a regular array of nodes (icosahedral), and assigns values to them based on local first order weighted regression with a typically 20 of the nearest stations. The regular array is then simply averaged. I think this is a rival for the best method.

Active plot of results

The idea of active plotting is described here, and my regular example is the monthly plot of indices here. The active plot for this post is here, with details below:


Thursday, April 18, 2019

Description and code of TempLS V4 global temperature program.

It's now time to post the code for TempLS V4. I know that it is a day when many folks who really want to get their message read put it out to the world. I didn't want to steal anyone's thunder; the week just didn't have enough days. I've been writing background articles leading up to the release of V4 of TempLS; here are the links:
ideas about spatial integration
icosahedral grid with equal area mapping
documentation system
details of new methods
tests
math methods

TempLS is a code written in R, dating back to about 2010 - there is a summary of its history up to V3 here. The release post (Aug 2010) for V2 is here; Ver 3 was rolled out in three parts, of which the last, here, links to the earlier parts.

Release of V4 is, as I mentioned in the documentation post, complicated by the fact that I use extensively a library of R functions that I keep, and will need to include. But I hope the documentation will also help; I'll be relying on it for this release.

The release is in three zip files:
  • There is a file of code and some internal data - LScode.zip. To run it, you should extract it in an empty directory; it will create six subdirectories (x,g,i,l,m,n). The R file to run it is LS_wt.r, explained below. This zip file is about 98 Kb.
  • As mentioned before, the code now has a lot of generic functions embedded from my general library. There is a documentation system explained at that link, which I also used for the functions of TempLS, below. This zip file includes a list Moyhupack.rda. This you can attach, R style, as a package, which has advantages. The functions from it are also on a file Moyhupack.r, which you can just source() into your environment. It has a lot of functions with short names, most of which you won't want to know about, so there is potential for naming clashes. Finally, there is a documentation file Moyhupack.html. If there is interest, I will write more about the library. I think it does have useful things for math programming. The zipfile is about 800 Kb
  • Finally, there is a set of data. The main one is a recent version of GHCN V4 unadjusted TAVG, named ghcnv.dat. There is also the inventory file that came with it, called ghcnv.inv. The SST data is from ERSST V5; it is part processed in TempLS style and in a directory called x. Note that the code zipfile also created an x directory, with just one list, which would overwrite this larger version if unzipped later. This is a big file - about 54 Mb (mainly GHCN).

The math task and code structure

As before (V3 and earlier), the code is divided into four parts, The code starts with a text version of GHCN V4, and goes looking for a NetCDF version of ERSST V5, if needed. It assembles all this data into 12 (month) arrays of station temperature (rows) vs year (cols). ERSST grid cells count as stations. Missing values are marked with NA. The objective is to fit the model
T=L+G
where T is the station temperature, L the time-invariant normal, and G the space-invariant global anomaly. The fitting requires integration, which comes down to a weighted sum of the data. The third section of the code uses the data about stations reporting, but not the temperatures, to work out weights for integration by various methods. This can be by far the most time-consuming part of the code, alleviated greatly by the use of past calculated weights where the list of stations reporting hasn't changed.

The fourth section is where the fitting is done, via a brief iteration (few seconds) to get convergent values of L and G, which is the main output. It is also where the optional Spherical Harmonics enhancement is done.

Code details.

The code is now almost entirely done as function calls. I call the main file LS_wt.r, which calls functions from LS_fns.r. These are documented below. The main sequence functions (the four parts) are
  • SortLandData() handles land data
  • SortSSTData() handles ERSST5 data.
  • DeriveWeights() calculates weights from the x array of the previous parts
  • FitGlobal() does the iterative fitting to produce L and G.
There are three lists of data that are switched in and out as needed; they are:
  • I the inventory data, which governs the dimensions of later data matrices. It is updated in a controlled way by newinv() on LSinv.rds

  • J is to describe the results in SortLandData and SortSSTData - assembling x. On LS.rds
  • K is for the method-dependent data in DeriveWeights and FitGlobal, including the main result, K$G.
The 12 (month) temperature files are stored in directory x. The 12 weight files are stored in directories, one for each method
("g","i","m","n","l") for ("grid","infilled","mesh","none","loess"). Since the K data is method dependent, a separate version is stored on each directory as LS.rds.

Code sequence

The main code is:
source("LS_fns.r")
if(!exists("job"))job="UEMO"
print(paste("Starting",job,Sys.time()))
i=vsplit(job,"")
kind=tolower(i)
kinds=c("g","i","m","n","l")
RDS=pc(kind[3],"/LS.rds")
K=readRDS(RDS) # I is inv list, J is list about x, K is list for method kind (w)
K$edition=I$edition
K$kind=kind; K$job=job;
do=i!=kind
wix=pc(kind[3],"/") # info about w
tm=timex(0); tx=unlist(tm); t0=Sys.time();
yr=c(1900,tx[1]+(tx[2]>1)-1); yrs=yr[1]:yr[2]; # you can change these
ny=diff(yr)+1
saveR(K)

if(do[1])SortLandData()

if(do[2])SortSSTData()

if(do[3])DeriveWeights()

if(do[4])K=FitGlobal()


As previously, the user supplies a job string of four characters. They are uppercase if that code section is to be performed. A typical value is job="UEMO". At the moment there aren't realistic alternatives for "UE", which is unadjusted GHCN V4 and ERSST V5. M stands for mesh method, but could be any of ("G","I","M","N","L"). "O" just means perform last section; "P" means go on to produce graphics with another program.

A second control variable that can be set is nSH=4, say. It induces the spherical harmonics enhancement, and the value sets the order, using (nSH+1)^2 functions. Going past 10 or so is risky for stability (and time-consuming).

The third control is called recalc. It lets you override the system of using stored values of weights when the stations reporting in a given month is unchanged. This saves time, but it can happen that you suspect the stored data is wrong, for some reason. Something might have gone wrong, or the inventory might have changed. The default setting is FALSE, or 0, but if you want it to not use stored data, set recalc to 1. There is also a setting that I find useful, recalc=2, which recalculates only the most recent year. This takes very little time, but will check if the code is currently working for that method option. Otherwise if it uses entirely stored data, it could take some time to find errors.

So the actual code here just brings in K, which also acts as a record of what was done. It stores some information and puts K back on disk. The other stuff just makes some time information for use in the main sequence. Note that the last step outputs K. This is where the results are (K$G and K$L).

Documentation of functions

Remember there are a lot of generic functions on the Moyhu package. The functions here are those specific to TempLS.

Tuesday, April 16, 2019

GISS March global up 0.21°C from February.

The GISS V3 land/ocean temperature anomaly rose 0.21°C in March. The anomaly average was 1.11°C, up from December 0.90°C. It compared with a 0.208°C rise in TempLS V3 mesh. Jim Hansen's detailed report is here. So far, April is looking warm too.

I think that now that TempLS and GISS are using GHCN V4, the agreement will be even better than in the past, as in this month. There extra coverage does make a difference. The earlier NCEP/NCAR average also agreed very well (0.19° rise). It was the third highest March in the record, just behind 2017.

The overall pattern was similar to that in TempLS. Huge warm spots in Siberia and NW N America/Arctic. Cool spots in NE USA through Labrador to Greenland, and Arabia through N India. Warm in Australia and S Africa.

As usual here, I will compare the GISS and previous TempLS plots below the jump.

Monday, April 15, 2019

The math basis for the TempLS V4 global temperature program.

This is, I hope, the last of the preparatory posts before I post the code and description of TempLS V4. Earlier posts were on ideas about spatial integration, an icosahedral grid with equal area mapping for a new method, the documentation system that I'll be using, details of new methods, tests, and some math methods that I'll be using and referring to.

The math basis is similar to what I described back in 2010 for V2, and in the guide to V3, still the most complete reference. The statistical model for temperature for a given month (Jan-Dec) is:

T = L + G, where T is measured temperature, L is a time-invariant offset for stations, and G a space-invariant global average temperature. In more detail, using my index notation, with s for station, y for year, and m for month, it is

Tsmy = IyLsm + IsGmy

The I's are added for consistency of indexing; they denote arrays of 1's.

A change in V4 is that the analysis is done separately over months, to avoid over large arrays in RAM. That simplifies the analysis; the subscript m can be dropped. Another change is more subtle. The model fitting should be by area and time, ie by minimising the integral of squares

In earlier versions I discretised the variables jointly, giving a weighted sum

w(sy)(Tsy-IyLs + IsGy)2

In fitting, this gave a matrix which was symmetric positive definite, which seemed natural and good. But it means that if you choose the weights to be right for the spatial integration, they can't be controlled for the time integration, so locally, they vary over years, which isn't really wanted. For the methods I used in v3, the weights were all positive, and of reasonably uniform size, so it didn't really matter. But in V4 I use methods which are higher order, in taking account of derivatives. Sometimes, for good reason, the weights can be negative. Now for spatial integration, they add to the area, so that is still OK. But for time integration, there might be significant cancellation, with a sum that is near zero, or even negative. It doesn't often happen, but since you have to divide by the sum of weights to get a time average, it is bad.

Revised weighting - Gauss Seidel iteration

The previous equations, derived from the sum squares, were
w(s)y(Tsy-IyLs - IsGy)=0(1a)
ws(y)(Tsy-IyLs - IsGy)=0(2a)

To get even time weighting in (1a), I use a weighting J(s)y, which is 1 where there is a reading, and 0 where there isn't (where w also had zeroes).
J(s)y(Tsy-IyLs - IsGy)=0(1b)
ws(y)(Tsy-IyLs - IsGy)=0(2b)
This can be written in part-solved form as
J(s)yIyLs = J(s)y(Tsy - IsGy)=0(1)
ws(y)IsGy = ws(y)(Tsy-IyLs)=0(2)

The first just says that L is the time average of T-G where observed, since its multiplier is just the set of row sums of J, which is the number of years of data for each station. The second then is just the area average of T corrected for L (the anomaly, whne L has converged). Suitably normalised, the multiplier of G is 1. Starting with G=0, this gives "naive averaging", with L just the overall time mean at station s. As I wrote here, this gives a bad result, which is why the main methods insist on averaging over a fixed time period (eg 1961-90). But that then leaves the problem of stations that do not have data there, which this method avoids.

So the first equation is solved, and the L used to estimate G, which is then used to correct, L, and so on iterating. The reason that iteration works is that the equations are weakly coupled. Eq (1) almost fixes L, with variations in G having only a small effect. Conversely Eq (2) almost fixes G, not very sensitive to local variations in L. There is an exception - if you add a constant value to G, it will cause the L's to drop by a similar constant amount.

So that is the iteration sequence, which can be characterised as block Gauss-Seidel. Start with a guessed value for G. Solve (1) for L, then solve (2) for an updated value of G. For GHCNV4, as used here, this converged gaining at least one significant figure per iteration, so four or five steps are sufficient. In practice, I now start with a previous estimate of G, which converges even faster. But in any case, the step takes only a few seconds. At each step, G is normalised to have mean zero between 1961 and 1990 (for each month), to deal with the ambiguity about exchanging a constant value with L.

Program structure

As before, TempLS has four blocks of code, now expressed as functions:
  • SortLandData
  • SortSSTData
  • DeriveWeights
  • FitGlobal
The first two are just housekeeping, ending with xsy, the array of monthly temperatures. The third derives the corresponding spatial integration weight matrix wsy by one of five methods described here. The fourth, FitGlobal(), performs the iteration described above. The result are the parameters G and L, of which G is the desired global temperature anomaly, which I publish every month.

For the more accurate methods, DeriveWeights is the most time consuming step; a full meshing can take an hour, and LOESS takes ten minutes or so to do the full record since 1900. But the weights depend only on the stations reporting, not what they report, and for most of those years this doesn't change. So I store weights and reuse them unless there is evidence of change in the list of stations that reported that year. This brings the compute time back to a few seconds.

In V3, I had a separate method based on Spherical Harmonics. As described here, I now treat this as an enhancement of any method of integration. In V3, it was in effect an enhancement of the very crude method of unweighted averaging over space. It actually worked well. In V4 it is implemented, optionally, at the start of FixGlobal(). The weights from part 3 are modified in a quite general way to implement the enhancement, with results described in the post on tests. I think it is of interest that as a separate integration principle, it yields results very similar to the more accurate (higher order) integration methods. But I don't think it will have a place in routine usage. It takes time, although I now have a much faster method, and it does not give much benefit if the more accurate methods are used. So why not just use them?







Friday, April 12, 2019

Some math used in TempLS - index notation and sparse matrices.

Index notation for arrays, and the summation convention

Index notation for arrays became popular with the tensor calculus, partly because it also elegantly embraced the concepts of contravariance and covariance. It is sometimes called the Einstein notation. I'm not using the tensor aspect here, so no superscripts. Arrays, including vectors, are simply represented by writing down their coefficients with subscripts, with the understanding that those vary over a known range. There is no attempt to visualise them geometrically. The coefficients, like arrays, can be added if the subscripts match in range, and be multiplied by scalars or pother subscripted arrays.

But the core of the system is the summation convention, also named after Einstein. It expresses the idea of inner product, which is what distinguishes matrices from just vectors. If an index is repeated in a term, it implies summation over the range of that index. Some familiar cases:

aibjis the outer product of two vectors, functioning as a 2-index array, or matrix, but
aibiis the inner product, with a single number (scalar) result. i is referred to as a dummy variable, because it won't be referenced again.
aijbjis a matrix right multiplied by a vector. The result is a vector, indexed with i
aijbiwould be left multiplied. But that terminology isn't needed; the indices say it all.
ajjis the trace of matrix A
aijbjkis the matrix product A*B, indices i and k.
aijbjkckis the product A*B*c, a vector, index i


You can exempt an index from the repetition count with braces, so a(i)bi is not summed. The multiplication is real, so it is as if A was a diagonal matrix. It often appears as a kernel, as in a(i)bici, which would be a quadratic form with diagonal matrix A as kernel.

I use a special notation Ij or Ijk for an array consisting entirely of 1's.

I have used this notation at least since introducing V2 of TempLS, but I'll be making more extensive use of it with V4. A feature is that once you have written down an index version of what you want to do, it maps directly onto R coding.

Sparse matrices

Often the indices have a small range, as over space dimensions. But sometimes the range is large. Most integration methods in TempLS involve mapping from one large set, like stations, to another, like grid cells. At some point there will be a matrix involving those two indices. In V4 we might have 30,000 stations, and 10,000 cells. You would not want to enter such a matrix into RAM.

The saving grace is that most nodes have no interaction with most cells. As with partial differential equations in maths, the relations are local. Most terms are zero, a sparse matrix. So it is possible to enter the non-zero terms, but you also have to say where they are in the matrix. I just list the N nonzero terms in a vector and in a Nx2 matrix, I list first the row and then the column of each term.

The key issue again is matrix-vector multiplication y=A*b. This is what can take a set of numbers over stations to one over grids. In a language like C, with this structure, the algorithm is simple. You go through that list, take each apq, multiply by bq, and put the result in yp. But that is a lot of operations, and in an interpreted language like R, they carry too much overhead. It would be possible to stretch out the terms of b to match a, so the multiplication could be between vectors. But they can't be added into the result vector in one operation, because several numbers would have to be added into one location. I use an algorithm that sorts the indices of a into subsets that do correspond to a unique location, so the multiplication can be reduced to a small number of vector operations.

Use in TempLS

I have written. for example here and very recently here about ways in which spatial integration is done in TempLS, and why. Integration is a linear operation on a set Ts of measured temperatures, or anomalies, and so the result can be represented as a weighted sum, wsTs. For each method, the first task of TempLS is to calculate those weights.

I'll go through the basic arithmetic of simple gridding from this viewpoint. To integrate directly, you work out which cells of the grid the stations belong to, calculated an average for each cell, and then multiply those by the areas of each cell and add. It's fairly simple to program, but I'll describe it in sparse matrix terms, because more complex cases have a similar structure.

The key matrix being formed is the incidence matrix In over indices g for grid (rows) and s for stations (columns). This is zero, except where station s is in cell g, when it has a 1. So the action of summing station data x in cells is the product Ingsxs. The sum has to be divided by the count, which in matrix terms is the product IngsIs. The result then has to be multiplied by the grid areas ag and added. Overall

Integral over earth = ag (ItIn(g)t)-1 Ings xs

The result has no indices, so all must be paired. Note the appearance of a bracketed (g) in the denominator. The repetition of separate station indices s and t indicates two separate summations.

Forming the index set for the sparse matrix is usually simple. Here the right column just numbers the stations, starting from 1. The left column just contains the cell number that that station is in.

That was reasoned starting from known x. To calculate the weights w, x must be removed, and the linear algebra started from the other end, with matrices transposed. Transposing a sparse matrix is trivial - just swap the columns of the index.

I'll go through the algebra for more complex cases in a future post, probably in conjunction with the code. But here is a table showing how the basic structure repeats. These are the various integration algorithms in index notation.

No weighting (ItIt)-1Isxs
Simple grid ag (ItIn(g)t)-1 Ings xs
Triangle mesh ag Ings xsa = areas of triangles
Loess order 0 Ip (ItW(p)t)-1 Wps xsW is a function (with cutoff) of distance between station s and nearby node p.
Loess order 1 ypj (ztjztkW(p)t)-1 Wpszsk xs zsk are the station of station s; ypj are the coordinates of node p.






Tuesday, April 9, 2019


Testing integration methods in TempLS V4.

I posted yesterday about some new methods and changes in integration in TempLS V4, explaining how the task is central to the formation of global temperature anomaly averages. Today I will show the results of testing the methods. We can't test to see if actual data gives the right result, because we don't know that result. So tests are of two types
  • Testing integration of functions whose average is known. Values of those functions replace the observed temperatures in the process, with the same station locations and missing values.
  • Comparing different methods, to see if some converge on a common answer, and which do not.

I tested six methods. They are, in order (approx) of ascending merit
  • Simple average of stations with no area weighting.
  • Simple gridding, with cells without data simply omitted. The grid is cubed sphere with faces divided into 24x24, with a total of 3456 cells. Cell area is about 25000 sq km, or 275 km edge.
  • Zero order LOESS method, as described in last post. Zero order is just a local weighted average.
  • Gridding with infill. Empty cells acquire data from neighboring stations by solving a diffusion equation, as briefly described in the last post.
  • First order LOESS, in which values on a grid of nodes are calculated by local linear regression
  • Finite element style integration on a triangular mesh with observation points at the vertices.
I said the order was approximate; there is some indication that full LOESS may be even slightly better than mesh. I also tested the effect of using spherical harmonics fits for enhancement, as described here. This option is now available in TempLS V4. The parameter nSH is a measure of the number of periods around the Earth - see here for pictures. For each level of nSH, there are (nSH+1)^2 functions.

Testing known functions

An intuitively appealing is simply latitude. Using just the latitude of stations when they report, in January 2011, what does the process give for the average latitude of the Earth. It should of course be zero.


nSH=0nSH=2nSH=4nSH=8nSH=12
No weighting26.6295-0.0393-0.06030.04791.9308
Grid no infill0.50950.0806-0.0121-0.035-0.0197
LOESS order 00.022-0.0342-0.0438-0.0342-0.0216
Grid diffuse-0.0432-0.0466-0.037-0.0267-0.0175
LOESS order 1-0.023-0.0231-0.0252-0.0223-0.0182
Mesh FEM-0.0209-0.022-0.0218-0.0177-0.0129


The case with no weighting gives a very bad result. A very large number of GHCN V4 stations are in the US, between lat 30° and 49°, and that pulls the average right up. Grid with no infill also errs on the N side. The reason here is that there are many cells in the Antarctic and near without data. Omitting them treats them as if they had average latitude (about 0), whereas of course they should be large negative. The effect is to return a positive average. The other methods work well, because they are not subject to this bias. They have errors of local interpolation, which add to very little. The poor methods improve spectacularly with spherical harmonic (SH) enhancement. This does not fix the biased sampling, but it corrects the latitude disparity that interacts with it. Deducting the non-constant spherical harmonics, which are known to have average zero, leaves a residual with is not particularly biased by hemisphere. The no weighting case is bad again at high order SH. The reason is that the fitting is done with that method if integration, which becomes unreliable for higher order functions. I'll say more about that in the next section.

Testing spherical harmonics.

Latitude is a limited test. SH's offer a set of functions with far more modes of variation, but testing them all would return a confusing scatter of results. There is one summary statistic which I calculate that I think is very helpful. When you fit SH Fourier style, it is actually a regression, involving the inverse of the matrix of integrals of products of SH. This is supposed to be the identity matrix, but because of approximate integrations, the SH are not exactly orthogonal. The important number indicating deviation from unit is the condition number, or ratio of max to min eigenvalues. When it gets too low, the inversion fails, as does the fitting. That is what was happening to unweighted averaging with nSH=12. So I calculated the minimum eigenvalue of that matrix (max remains at 1). Here is a table:

nSH=2nSH=4nSH=8nSH=12
No weighting0.07030.0205-0.8361-0.9992
Grid no infill0.78930.6310.43540.1888
LOESS order 00.95010.82980.45930.1834
Grid diffuse0.96940.90130.60260.2588
LOESS order 10.9880.95030.69050.3037
Mesh FEM0.98750.94180.66770.3204


Close to 1 is good. There is a fairly clear ranking of merit, with full LOESS just pipping mesh. No weighting is never good, and completely falls apart with high nSH, being not even positive definite. All methods are down quite a lot at nSH=12, although a minimum of 0.3 say is not a major problem for inversion. These values are for January 2011, but it is a robust statistic, varying little from year to year. Here is the comparison over years for the nSH=8 level:


201120122013201420152016201720182019
No weighting-0.8361-0.759-0.2471-0.2119-0.0423-0.01040.010.01670.0199
Grid no infill0.43540.4650.44810.41290.44930.42370.43940.43250.4436
LOESS order 00.45930.49720.47420.44770.47910.44720.47340.46970.4764
Grid diffuse0.60260.62070.61020.57410.61320.59290.61270.6080.6095
LOESS order 10.69050.7110.69650.67620.69120.68320.69030.69270.6944
Mesh FEM0.66770.66630.67030.66360.66270.67370.67410.67720.6626


Test of global average of real anomalies.

For consistency, I have used real temperature data with the station normals calculated by the mesh method. It would not matter which method was used, as long as it is consistent. Here are January results over years with no SH enhancement:


201120122013201420152016201720182019
No weighting-0.15651.22770.53410.34751.04010.72130.98810.80650.8183
Grid no infill0.32540.29950.48340.52510.67290.94640.81940.61270.7443
LOESS order 00.3660.29880.52240.56650.66491.00360.84990.6190.7425
Grid diffuse0.36740.29190.51130.54430.64950.98020.82830.61860.7111
LOESS order 10.37060.29940.50570.53970.64290.99320.82810.62090.7112
Mesh FEM0.37790.28620.50510.5170.64740.99690.80880.61780.7017


The best methods are really very close, generally within a range of about 0.02. Even simple grid (with cubed sphere) is not that different. But no weighting is bad. The effects of SH enhancement are variable. I'll show them for January 2018:


nSH=0nSH=2nSH=4nSH=8nSH=12
No weighting0.80650.88420.56870.56331.4847
Grid no infill0.61270.62530.60770.61650.6139
LOESS order 00.6190.61960.61380.62240.6152
Grid diffuse0.61860.61940.61910.62410.6237
LOESS order 10.62090.6210.61980.62250.619
Mesh FEM0.61780.61820.6180.62060.6147


The better methods do not change much, but converge a little more, and are joined by the lesser methods. This overall convergence based on the separate principles of discretisation type (mesh, grid etc) and SH enhancement is very encouraging. Even no weighting becomes respectable up to nSH=8, but then falls away again as the high order fitting fails. I'll show in a future post the comparison results of the different methods for the whole time series. There are some earlier comparisons here, which ran very well back to 1957, but were then troubled by lack of Antarctic data. I think LOESS will perform well here.

Monday, April 8, 2019


New methods of integration in TempLS V4 for global temperature.


Background

TempLS is a program that takes the extensive data of surface temperature measurements and derives a global average of temperature anomaly for each month over time. It also produces maps of temperature anomaly distribution. The basic operation that enables this is spatial integration. As with so much in science and life, for Earth's temperature we rely on samples - it's all we have. To get the whole Earth picture, it is necessary to interpolate between samples. Integration is the process for doing that, and then adding up all the results. The average is the integral divided by the area.

The worst way of getting an average is just to add all the station results and divide by the total. It's bad because the stations are unevenly distributed, so the result reflects the regions where stations are dense. This generally means the USA. Some kind of area weighting is needed so that large areas with sparse readings are properly represented. Early versions of TempLS used the common method of gridding based on latitude/longitude. The default method of spatial integration is to form a function which can be integrated, and which conforms in some way to the data. In gridding, that function is constant within each cell, and equal to the average of the cell data. But there is the problem of cells with no data...

Since V2, 2011 at least, TempLS has used unstructured mesh as its favored procedure. It is basically finite element integration. The mesh is the convex hull of the measurement points in space, and the area weight is just the area of triangles contacting each node. For over seven years now I have reported average temperature based on the mesh method (preferred) and grid, for compatibility with Hadcrut and NOAA.

Early in the life of V3, some new methods were added, discussed here. The problem of cells with missing data can be solved in various ways - I used a somewhat ad-hoc but effective method I called diffusion. It works best with grids that are better than lat/lon. I also used a method based on spherical harmonics, with least squares fitting. As described here, I now think this should be seen as an enhancement which can be applied to any method. It is spectacularly effective with the otherwise poor method of simple averaging; with better methods like mesh or diffusion, there is much less room to improve.

So why look for new methods?

We don't have a quantitative test for how good a method is applied to a temperature field. The best confirmation is that the methods are relatively stable as parameters (eg grid size) are varied, and that they agree with each other. We have two fairly independent methods, or three if you count SH enhancement. They do agree well, but it would be good to have another as well.

V4 changes.

V4 does introduce a new method, which I will describe. But first some more mundane changes:
  • Grid - V4 no longer uses lat/lon grids, but rather grids based on platonic solids. Currently most used is the cubed sphere, with ambitions to use hexagons. All these grids work very well.
  • Spherical Harmonics - is no longer a separate method, but an enhancement available for any method. It's good to have, but adds computer time, and since it doesn't much enhance the better methods, it can be better to use them directly.
  • I have upgraded the diffusion method so that it now solves a diffusion equation (partial differential) for the regions without cell data. The process is very simple - Southwell relaxation, from the pen and paper era, when computer was a job title. You iterate replacing unknown values by an average of neighbors.

The LOESS method

The new method uses local regression - the basis of LOESS smoothing. Other descriptive words might be meshless methods and radial basis functions. The idea is that instead of integrating the irregular pattern of stations, you find a set of regularly spaced points that can be integrated. In fact, using an icosahedron, you can find points so evenly spaced that the integral is just a simple average. To estimate the temperatures at these points, weighted regression is applied to a set of nearby measurements. The regression is weighted by closeness; I use an exponential decay based on Hansen's 1200 km for loss of correlation. But I also restrict to the 20 closest points, usually well within that limit.

The regression can be relative to a constant (weighted mean) or linear. The downside of constant is that there may be a trend, and the sample points might cluster on one side of the trend, giving a biased result. Linear fitting counters that bias.

I'll show test results in the nest post. I think the LOESS method is at least as accurate as the mesh method, which is to say, very accurate indeed. And of course, it agrees well. It is flexible, in that where data is sparse, it just accepts data from further afield, which is the best that can be done. You could think of a grid method as similarly estimating the central values, which can then be integrated. The grid method, though, artificially cuts off data that it wall accept at the cell boundary.

The LOESS method also gives a good alternative method of visualisation. My preferred WebGL requires triangles with values supplied at corners, when GL will shade the interior accordingly. I have used that with convex hull mesh (eg here), but when triangles get large, it produces some artifices. Using the underlying icosahedral mesh of LOESS has uniformly sized triangles. Of course, this is in a way smoothing over the problem of sparse data. But at least it does it in the best possible way.

Here is a WebGL plot of February 2019 temperature anomaly, done the LOESS way. As usual, there are checkboxes you can use to hide the mesh overlay, or the colors, or even the map. More on the facility and its use here.



You can contrast the effect of the LOESS smoothing with the unstructured mesh representation here. Both present unadjusted GHCN V4, which clearly has a lot of noise, especially in the USA, where quantity seems to degrade quality. None of this detracts from global integration, which smooths far more than even LOESS. I think that which it is occasionally of interest to see the detail with the mesh, the LOESS plot is more informative. The detail of mesh had been useful in GHCN V3 for spotting irregularities, but in the US at least, they are so common that the utility fades. In much of the rest of the world, even Canada, coherence is much better.









Saturday, April 6, 2019

March global surface TempLS (with GHCN V4) up 0.208°C from February.

This is the first month of full use of the new GHCN V4 land temperature data (unadjusted). It's also using the new version V4 of TempLS, which I will post and describe shortly. The usual report is here, showing the map, the breakdown in regions, and the stations reporting. The detailed WebGL plot is here, and it is a good way of capturing the extra detail available in GHCN V4. It also shows the greater patchiness of the stations, which is partly countered by the greater numbers. It is particularly bad in the US, with so many volunteer stations.

I'm still getting a feel for when is the best time to post. I used to wait until the main countries reported, but with V4 it isn't done by country, and even after four or five days there are a lot of stations with apparently good coverage. But there are still many more stations to come, so there is still the possibility of a late drift. We'll see.

The TempLS mesh anomaly (1961-90 base) was 0.964deg;C in March vs 0.756°C in February. As with the NCEP/NCAR reanalysis index, that makes it a very warm month, especially as February was already warmer than January, and even more so than last November. It was the warmest month since March 2017 (which was just 0.01°C warmer), and the third warmest March in the record.

As with the reanalysis the main features were big warmer areas in Siberia and Alaska/NW Canada, joining up over the adjacent Arctic ice layer. The warmth extended into Europe, especially the East. It was generally cool in the US, and in a belt from Tibet to Egypt.
Here is the temperature map. Southern Africa and Australia were quite warm.



And here is the map of stations reporting:




Thursday, April 4, 2019

New TempLS v4 preamble - a documentation system.


Background

I'm gearing up to release TempLS V4. I have been working on this (and am now using it) to manage the extra demands of GHCN V4. A complication is that I now use a lot of my own library functions in R. For V3 I could replace them, but now they are too heavily embedded. So I'll hneed to supply the library as well. This works like an R package, but has a different documentation system, which I'll now describe. I think it is better, and I'll use it for the code of TempLS V4 as well. There is an example of the system at the end, and following that, a zipfile with all the entities involved, including the R code.

Motivation for a different system

I write a lot of R code, and I have a set of functions that I currently load in to be part of the working environment. I also have a library of Javascript functions that I import. I need a scheme where these are kept in some organising system with associated documentation. I would like this to be very easily edited, and then generated as HTML for easy reference.

The classic R way of doing this is to create and attach a package. This is a list of parsed functions that are not in the working environment, but sit on the search path at a lower level. That means that they don't overwrite existing names, but those names in the environment might block access to the package functions. R has a system for documenting using an associated .rd file. This has a Latex-like language, and a lot of structure associated with it.

I want to use the attach system, but I find the requirements of the package documentation onerous. They are meant to support packages for general release, but I am the only user here. I want a closer association between documentation and code, and a faster system of updating the HTML.

A system designed for this is Doxygen. This has an implementation as the R package roxygen2, based on embedded comments. This again works through .rd files which it turns into pdf. This is more cumbersome than I would like. I also want to use the system for objects other than functions, which can be included in the package.

So I have created a set of R programs which work through a list of lists, each specifying a function, or perhaps some other object that I would like to see in a package. I call that master list an object. It isn't useful in itself, but can easily be converted into a package that could be attached. But it can also easily generate the HTML documentation with appropriate layout. But a further facility is that it can generate a text file in which the documentation and code are listed together and can be edited, and then mapped back into the object form.

Although I adopted the scheme for its flexibility when I make frequent changes, I think it is now justified by the better active HTML reference page. Not only does it have various linked indexes, but you can choose to display the code next to the documentation. And since the code is placed there at the same time as it is entered into the package, it is guaranteed to be current, even if the manual editing of the description may lag.

The material for the object comes generally from collections of R functions used in a program environment, so there needs to be a scheme to import this into an object. It is desirable to be able to map back, because writing and debugging code is best done in the native form. Since the object contains a lot of text as well as functions, it is important to have a merge facility, so that updated functions can be imported while retaining the documentation.

This all works, and can be extended to other languages, although I'll refer to it as R code here.. I use it for Javascript. There isn't the exact equivalent of an attachable package, but code can still be output in a usable form. Importing from code works in much the same way, since it relies on functions being delimited by braces. So it could be used for C and similar languages as well. The main thing is that the html and text forms are useful in the same ways.

Here is a diagram of the various forms and transitions:


Pack
⇑↓
TextObjectCode
html


And here are more details about the entities. The naming convention will be that there is a stem name, and the files will be called stem.rda, stem.txt, stem.html The results can be accessed directly in the namespace, as well as returned.

Wednesday, April 3, 2019

March NCEP/NCAR global surface anomaly up 0.19°C from February

The Moyhu NCEP/NCAR index rose from 0.393°C in February to 0.582°C in March, on a 1994-2013 anomaly base. Cumulatively, that is a rise of 0.29°C from January, and is 0.4°C warmer than last November. That makes it the warmest month since the peak of the 2016 El Niño, Feb/March 2016, and so is the second warmest March in the record.. It was warm through the month, but most at the beginning and end.

The US was still cool, as was E Canada. But the NW of N America was very warm, as with the Arctic ocean above. There was warmth right through Siberia, into most of Europe and down to Kazakhstan and N China. Below that, a belt of cool from Tibet to the Sahara. Mixed, with nothing very pronounced, in the Southern Hemisphere. However, although Australia looks only moderately warm on the map, the BoM says it was the warmest March on record.

The BoM ENSO Outlook is upgraded to Alert - "This means the chance of El Niño forming from autumn is around 70%; triple the normal likelihood" Remember, that is SH autumn - ie now. .





Monday, March 25, 2019

Evenly spaced icosahedral grid for global temperature

I have been working on a new integration method for the next version of TempLS, V4. I call it the LOESS method; it works by local regression onto an evenly spaced set of nodes, where the anomalies can then be averaged. But the question then is, how to get that evenly spaced set of nodes.

I have written a lot about that before. The basic idea is to place points evenly on one of the five platonic solids, and then map them onto the sphere. The platonic solids are at least somewhat similar in shape to the sphere. I wrote here about using the cube (called a cubed sphere), and here with general ideas about gridding, with emphasis on the icosahedron .

A problem is that projection from the regular solid to the sphere does not preserve equal spacing. The central parts of the face are spread out by being projected further, while the areas near the vertices are actually compressed, since the face is inclined to the surface normal to the projection. For the cube, the area ratio is about 3:1; for the icosahedron it is about 2:1. In the cube post I explained a mapping that could be applied to the faces before projection, which would counter the disparity. I wrote here about how this could be applied to get an equal area projection (at the cost of cuts) for earth mapping.

I had some trouble extending this to the icosahedron, because I couldn't see a general mapping scheme with the right properties. But I now do see one based on homogeneous coordinates (also called projective or trilinear), which is simple and effective. I talk a bit about homogeneous coordinates, so I should digress to say a bit more about them. I'll put this digression in blue, so you can skip if you want.

Homogeneous coordinates

Every point x in a triangle is represented by three numbers (u1,u2,u3), which must add to 1. You can get x back, given the vertices (x1,x2,x3)
x = u1x1 + u2x2 + u3x3
and that illustrates an important role. They can interpolate properties known at the vertices by a similar formula. I'll make three points about them:

  1. The coordinates are often explained by areas. If you divided the triangle into three by connecting x to the vertices, then the proportion of area in the triangle opposite node 1 is its coordinate, etc.
  2. The coordinates are the real 3D coordinates of the equilateral triangle joining the unit points on the axes ((1,0,0), (0,1,0),(0,0,1)). The requirement to add to 1 is just the requirement that they lie on the plane through those 3 points. And since you can map those unit vertices onto the general triangle given by nodes (x1,x2,x3), you can map the interior points by multiplying by the same matrix. And it is the inverse of that matrix that takes an arbitrary triangle into the unit triangle. That is how you can calculate the homogeneous coordinates.
  3. The finite element method also has the idea of an interpolation, or basis, function. You multiply the nodal values by the values of the basis function at an interior point to get the interpolate. So homogeneous coords are just a special case of linear basis functions for one particular shape. That enables generalisation of the mapping idea I will explain.

Mapping for uniformity

A mapping before projection has some basic requirements:
  • It must keep points within the face.
  • It must comply with the symmetries of the face. The reason for this is that points on the edges will be mapped within each face that they belong to. Symmetry is needed so they do not get sent to different places.
A way of ensuring this is to apply a simple function of one variable to each projective coordinate, to make new interpolation functions. The function will have to be monotonic, and zero at zero (to keep boundary nodes on the boundary). And to comply with the requirement that the new coords add to 1, the result muct be normalised by dividing by the sum.

I chose a function of the form u/(1+a*u), because it is easily inverted (u/(1-a*u)). It is possible to determine a theoretically to make various match-ups, or just trial and error, since there is no perfect outcome. I found a value of 0.28 worked well. It means that instead of mapped triangles varying by a factor of 2, they only vary by 5%, which I think is good enough.

Here is a mapping of what is done. The icosahedron has 20 equilateral triangle faces. Each edge is divided into N equal parts. Connecting them with lines parallel to the edges makes N^2 with corresponding nodes. In this diagram, the purple triangles are the original regular grid, and the lines show the mapped grid. You'll see that the line triangles are larger near the corners, and smaller near the centre.

WebGL picture of the result

I used the WebGL facility to show the result on the sphere. In what follows, the sphere is a trackball that you can drag around. It's actually a LOESS plot of the February temperatures, which I'll say more about in a future post. But in the style of the facility, you can use the checkboxes top right to hide various things. The mesh is clearer if you clear "mesh". make 'mesh_l" go away, and you can see the points better. You can dispense with the map too, if you want.













































Tuesday, March 19, 2019

GISS February global up 0.05°C from January, third warmest February.

The GISS V3 land/ocean temperature anomaly rose 0.05°C in February. The anomaly average was 0.92°C, up from December 0.87°C. It compared with a 0.046°C rise in TempLS V3 mesh. I can't find Jim Hansen's monthly report on the web site, but his email linked to this page.

Like GISS, Moyhu is coming out with a V4 which will use GHCN V4. Actually, V3 could use it too, but it is time for a new version. GISS has marked theirs as beta - the monthly averages are here. V4 said the rise was only 0.01°C; Moyhu's V4 said the rise was higher, at 0.128°C. V4 of course has many more stations, and somewhat to my surprise, has xtensive reporting quite early; however, there is also extensive later data, and I'm still getting a feel for when it settles. My reporting is mixed; the main table shows V4, but the report and graphs are V3. This will change soon. I'll show below the anomaly maps for Moyhu and GISS V4 as well.

The overall pattern was similar to that in TempLS. W Canada and NW US were very cold, though SE US was quite warm.. Europe was warm, especially NE (Russia). Warmth also in Siberia and Alaska, and nearby Arctic. Australia was still quite warm. I'll note now that March so far is looking a whole lot warmer.

As usual here, I will compare the GISS and previous TempLS plots below the jump.

Sunday, March 10, 2019

February global surface TempLS (with GHCN V4) up 0.033°C from January.

TempLS is in transition from using GHCN V3 (which it has used for the last eight years) to the new GHCN V4. I'll report the V4 mesh results here, but for now, the Moyhu posted data is based on V3. I've been delayed a little because it is timely to also bring out a new version of TempLS (V4). Apart from anything else, the code needs reorganising to handle the bigger datasets smoothly. The Moyhu data page is still reporting values based on V3, as are the various pages with globe maps.

I worried about whether GHCN V4 would report as punctually as V3. But it seems to do better. Their version of 9 March showed 8779 stations reporting, compared with 10653 in January. But unlike V3, where countries report en bloc, here the reporting is spread, so no region is completely missing. I'll show a station map below.

TempLS mesh with V3 GHCN showed a larger rise, from 0.686°C to 0.779°C in February. This was more in line with the rise calculated with NCEP reanalysis. However, the V3 based infilled grid method was in agreement with V4 mesh, as was V4 infilled.

I'll write a post soon with a comparison of the whole V4-based history compared with V3. I'll also post a new version of TempLS which I have been using - I'll call that V4 too. It's mainly a restructuring to deal with the greater memory demands of GHCN V4, and also to include my new approach to handling spherical harmonics enhancement.

The TempLS mesh anomaly (1961-90 base) was 0.746deg;C in February vs 0.708°C in January. That makes it the third warmest February in the (V4-based) record.

The main feature of the month was rather severe cold over western Canada and adjacent US, but with warmth just a little to the North in the Arctic and Alaska. Europe was also very warm. There were patches of cool in the Sahara and western China.
Here is the temperature map.



And here is the map of land and sea stations reporting:




Monday, March 4, 2019

February NCEP/NCAR global surface anomaly up 0.097°C from January

The Moyhu NCEP/NCAR index rose from 0.296°C in January to 0.393°C in February, on a 1994-2013 anomaly base. That makes it the warmest month since May 2017. The warmth came mainly in a spurt at the end of the month, which is continuing.

It was quite cold in W Canada and the US, although warm in Alaska and the SE USA. Warm in Europe, cold in the Sahara, and mostly warm in Antarctica, with the Arctic mixed.

The BoM ENSO Outlook was is still set to Watch.





Saturday, February 16, 2019

GISS January global down 0.01°C from December, 0.11°C higher than November.

Update With this post I initially misread the GISS data file, taking values from the end of the row which are period averages, rather than the correct monthly averages for November and December, as ErikGBL noted in comments. It actually made only a small difference, but I have fixed it. I first had a small rise from December, now a small drop.
The GISS land/ocean temperature anomaly fell 0.01°C in January, which followed a 0.11°C rise the month before. The January anomaly average was 0.88°C, down from December 0.89°C. It compared with a 0.034°C fall in TempLS mesh . Jim Hansen's report is here.

The overall pattern was similar to that in TempLS. The US was actually fairly warm overall, despite the cold snap at the end of the month. Canada's Arctic islands were very cold, but a big band of warmth in NW Canada/Alaska, and another from E Siberia (which also had a cold snap) through to the Caspian region. SE Australia was hot.

As usual here, I will compare the GISS and previous TempLS plots below the jump.

Friday, February 15, 2019

Improving averaging of global temperature anomaly with fitted functions - seen as an aspect of a general method.

I have written a lot about two topics - anomalies in general, and spatial averaging of temperature anomalies. Both are headings in the Moyhu interactive index. My main point on anomalies as used in temperature statistics is that they play an essential role in reducing inhomogeneity, thus making the inevitable sample use more reliable.

But I think they have a wider use. The general pattern is that a field, temperature T, say, is partitioned:
T=E+A
where E is some kind of expected value, and A is an anomaly. That sounds vague, and there are many ways you could make such a partition. But it becomes clear when associated with the use made of it. The purpose is often to calculate some linear function of T, say L(T). Here L will be some kind of average. But suppose L is an ideal, and we actually have to rely on an approximate version L1 (eg coarse sampling).

So L(T) is approximated by L1(T)=L1(E)+L1(A). Now the objective is to choose E in such a way that
  1. L1(E) can be replaced by L(E), which is more accurate for that part
  2. L1(A) loses some of the error that marred L1(T)

I will explain with three examples here, of which the third will be developed in detail, because it bears on my long-term project of more accurately calculating surface temperatures:
  1. Temperature anomalies for global average (GISS). Here the station temperatures have some kind of mean of past values subtracted. It is usually the mean of some fixed period, eg 1961-90. Here is a typical post in which I explain why absolute temperatures are too inhomogeneous to average with the sample density we have, which the anomalies thus constructed are homogeneous enough, if only because they have approximately the same expected value (0). But in terms of this structure, they don't quite fit. The component E (climatology) that is removed is not calculated with a more accurate estimator, but is discarded, with only the anomaly average reported. But the mode of partition is still important, to be sure that the discarded part does not carry with it variation that should have been attributed to A. Using a consistent base period is one way of ensuring that.
  2. Another example is one I developed in a post at WUWT. This was about the time samples, for a single station, that go into forming a monthly average. An objection sometimes raised that characterising that by two values per day (max and min) "violates Nyquist". Nyquist doesn't really tell us about this kind of sampling, but I looked at the error introduced in sampling regularly a few times a day (eg every 12 hours). This does create aliasing of some harmonics of the diurnal frequency to zero, which adds spurious amounts to the monthly average. I showed that this is a limited error, but which can be reduced with a T=E+A partition. Now E is an estimate of the regular diurnal cycle, based on fine (eg hourly) sampling of a few reference years. That takes most of the diurnal variation out of A, and so sparse sampling (L1) aliases very little to zero, and causes little error. The fine operator (L) can be applied to the E component.
  3. The third example, to be developed here, is for the monthly spatial integration of anomalies. I'll describe it first rather generally. Suppose T is a variable to be integrated over any domain, and L1 is an approximate integration operator. Then suppose E is a Fourier fit with the first n orthogonal functions, where the inner product is integration with L1.

    Then E contains the low-frequency terms which contribute most to the integral, leaving A with just high frequency fluctuations. These may be of lower amplitude that the whole, but more importantly, they will make little contribution to the low pass integration operator. This is still true for the approximate L1, since the basic reason for the low-pass is cancellation.

    In fact, by the Fourier process, L1(A) is exactly 0, since A is orthogonal to the fitting functions, of which the lowest is a constant. So if the improved integration operator L is just exact integration, the varying fitted terms will make zero contribution to the integral, except for the constant term. The improved integral is just the integral of that constant. The improvement comes because the approximate integral of those low frequency components is replaced by the exact.

Integration methods for the sphere

I have developed four main methods for averaging temperature anomalies via numerical integration on the sphere, which I have incorporated into my monthly program TempLS, with results regularly reported here (click TempLS button). They are described in greatest detail here, with some further ideas. The methods described are
  • Simple gridding - just assign to each cell of a lat/lon grid the average of the anomalies in the cell, and then us ethe area-weighted average of the cells with data. This is basically what is done in HADCRUT
  • Infilled gridding - where cells without data are assigned a value derived from neighbouring cells in some way, and then the area-weighted average of all cells can be used. With the assignment done by kriging, this is the Cowtan and Way improvement of HADCRUT.
  • Finite element integration using an irregular triangular mesh. This is my preferred method, as it assigns to each point on Earth the interpolated value from the three nearest data points for each month.
  • Spherical Harmonics. I'll now talk more about that. In its original form, I did a least squares regression of the temperature with a set of spherical harmonics, and took the lowest coefficient as the integral.


The final method, using Spherical Harmonics (SHs), is not like the others, in that it doesn't introduce a subdivision of the area. But I now think it is best seen as not an independent method, but as an improvement applicable to any method. To just write out the regression maths, if L1(xy) is the scalar product, and H is the vector of fitting functions (SH), then the regression coefficients of the fit are

b = L1(H⊗H)-1L1(H⊗T), where ⊗ is the outer product from the weighted integration

and so b0L1(H0) is the average of T

I generally denote L1(H⊗H) as HwH, since L1 is just a weighted sum with weights w (this is the architecture of TempLS). The functions H are orthogonal wrt exact integration L, so L(H⊗H) would be a unit matrix. The approximate version deviates from this, and a figure of merit is the condition number of HwH. This is kept down if the integration method L1 is good.

In addition to the averaging methods listed above, I should add a very primitive one - the simple, unweighted average. This is generally regarded as too unreliable for use, but my original SH method, which actually performed very well, was basically the improvement of this method. You can of course expect better outcomes from improving better methods, and as I shall show, this happens. However, the relative improvement is much less. The reason is that the improvement comes from the improved integration of the SH components, and with, say, the mesh method, the finite element integration was already very good.

More on Spherical Harmonics

I described the functions here, with visualisation here, and a movie here. There are two parameters L and M; it is natural to think of M as determining the number of oscillations, with L varying from 0 to M (for fixed M) going from purely latitude oscillation to purely longitudinal. In the latter case (L=M), there are M periods of the sinusoidal variation around the equator. For SHs of order up to M, there are (M+1)2 functions, and these are the groupings that I will test below.

Tests

I looked at various means of anomalies for each of the months of 2018. I used the anomalies calculated by the mesh method; other integration schemes give slightly different values, but I don't think that changes the issues of integration. For each month and order M of SH, I calculate the mean for each of the four methods
  • No weighting (simple average)
  • Simple grid with no infill
  • Infilled grids
  • Integrating on irregular triangular mesh.
The value for M=0 is the unimproved value. Here is a plot of results. You can see the different months by using the buttons at the bottom:



You can see that the better methods, mesh and infilled grid, start out usually close together, with often the simple average furthest away. But then, as the order M incrases, they all converge for a while. Eventually this gets ragged as the higher order SH's are less well integrated by the integration scheme. The direct cause is ill-conditioning of the HwH matrix. I'll show here a similar plot of log10 of this condition number for each case.

There is an earlier discussion of SH improvements here. It shows (for Jan 2017) the sum of squares improvement, but more relevantly, a detailed map of the residuals. It shows how, after removing up to M=10, there are still spatial patterns in the residuals, some of which look like higher order harmonics.

Discussion

As you look through the months, the blue mesh line is usually fairly stable, at least up to about M=10. This indicates that the removal of SH components has little effect, because the mesh integration was already good, and replacing with exact integration makes little difference. The other lines start out separated; the black unweighted average often a long way away. But they converge toward the blue line, which says that the SH separation is giving better results.

The condition numbers also give a good quantification of the merit of the integration schemes. mesh is markedly better than infilled grid, which is in turn better than simple grid weighting, which in turn is better than no weighting at all. Better conditioning has the practical effect that mesh integration can be used to approximate temperature anomalies with a SH fit (as done monthly here) to higher order, and better resolution, than other methods.

An anomaly in approximation behaviour is February, in which there was convergence toward the blue, but that showed a gradual increase rather than stability after about M=5. I have looked into that; I don't really know what is different about the February mesh that would cause that. The meshes can be seen here. It is true that the February mesh does do a less accurate job with integrating the harmonics, as can be seen from the top row of HwH. I can't track that down to a visible difference in the mesh. The condition number for February rises earlier than it does for March, although not so different from January.

I had thought that, although subtracting fitted SHs did not improve mesh integration here, it might do so if stations were fewer. I tried that in conjunction with my culling process. However, no improvement was found there either. The reason seems to be that when stations are too sparse for SHs to be well integrated, so there is room for improvement, they are also too sparse to allow the fit to be identified. There just aren't enough degrees of freedom.

Conclusion

The idea of separating out an "expected" component prior to integration, which can then be integrated with a more accurate operator, has general application. In the case of spatial integration of monthly anomalies on the sphere, it gives a radical improvement of inferior methods. It does not do much for the better mesh integration, but the different behaviour quantifies the merits order of the various schemes.

Although the mesh method seems to be at least as good as SH upgrades of grid methods, the latter is faster if meshes would have to be calculated. Simple gridding is fast, as is the SH fitting. However, on my PC even full meshing takes only a few minutes. So I have upgraded the "SH method" in TempLS to be the enhancement of the mesh method, since I can generally re-use meshes anyway.



Thursday, February 7, 2019

January global surface TempLS down 0.033°C from December.

The TempLS mesh anomaly (1961-90 base) was 0.704deg;C in January vs 0.737°C in December. That makes it the fourth warmest January in the record.

Land temperatures were down, but SST rose a little after two months of decline. Despite the late cold snap in the mid-west, the US was not particularly cold, although the Hudson's Bay region of Canada was. The Sahara was cool too, but there were big warm patches in Siberia, the Caspian region, Alaska, and, notably, Australia.
Here is the temperature map. As always, there is a more detailed active sphere map here.




Tuesday, February 5, 2019

January NCEP/NCAR global surface anomaly same as December, which was up 0.11°C from November

Update In comments, Bryan Oz4caster pointed out that the results I posted were wrong, following the shutdown and recovery.  So I checked, and found the problem - I was calculating correct numbers, but posting numbers for corresponding days in 2015.

Oddly, the corresponding month figures for 2015 showed almost exactly the same changes. 
Nov to Jan 2015  0.106  0.212  0.209
Nov to Jan 2019  0.176  0.286  0.296
So I haven't had to change the headline, but I've fixed the figures in the text. The map and discussion based on it were correct.

The Moyhu NCEP/NCAR index was delayed by the US government shutdown, and then by some confusion after an apparent delay in finalising 2018 (which my system did not handle well - code being revised). So I'm reporting both December and January. There were ups and downs in late 2018, leaving November a cold month at 0.176°C. So the rise in December to 0.286°C was more of a return to normal, and that continued with the 0.296°C in January.

One remarkable feature in January was a band of warmth starting from the Kimberley region of W Australia and extending through the continent to New Zealand and beyond. This was indeed a very warm month in Australia's inland. It also shows the cold plume from Canada's Arctic islands through the US mid-west, though there is another band of warmth from Alaska to California. There was also a broad band of warmth from China through to Arabia and central Africa, although the Sahara and around the Mediterranean were cold. Polar regions tended to cool.

The BoM ENSO Outlook was downgraded to Watch, although some models predict a reappearance.





Wednesday, January 30, 2019

December global surface TempLS up 0.12°C from November.

The TempLS mesh anomaly (1961-90 base) was 0.748deg;C in December vs 0.628°C in November. That continues the recent up-down alternation, and makes it the third warmest December in the record (near tie with 2017 for 2nd). The GHCN results were delayed by the US government hiatus.

Significantly, for the second month in a row, SST was down, after a few months of rise. But land temperatures were generally higher. The big contributors to this were N America and the Arctic. Almost everywhere was warm, including Europe and Australia. A cool patch in mid-Siberia/Mongolia.
Here (from here) is the plot of relative contributions to the rise (ie components weighted by area):

Here is the temperature map. As always, there is a more detailed active sphere map here.




Saturday, January 12, 2019

December global surface anomaly up 0.038°C from November - ECMWF reanalysis

Usually I would have my analyses out by now, but as explained, the data I need is not available yet due to the US government shutdown. I'll post, with 2018 discussion, when they are available. However, the ECMWF, via Copernicus, is not subject to that shutdown, and their reanalysis data is out. A CSV listing of monthly global and European surface average is here. Relative to 1981-2010, November was 0.430°C; December was 0.468°C.

A description of relevant ECMWF data on average monthly temperature is here.

I won't try to summarise the features of the month; ECMWF has a very detailed description in their December report. I'll just show their map, which includes Europe too:




The ECMWF also reports on the year's result; 2018 was the fourth warmest year in the record, after 2016, 2017, 2015. Details and links here. Most surface records including TempLS will report the same ranking.