Monday, May 20, 2019

Changing anomaly base in spatial plotting.

This post tries to give a more exact treatment to comparing global temperature graphs with different anomaly bases. It is preparatory to one where I try out a new style of graphics which I think has better resolution than the spherical harmonics (SH) based graphs that I use now, and compare with GISS. I want to compare with high resolution graphics from reanalysis, but first there is something I need to improve.

Temperature anomalies are created by subtracting from each reading of temperature an estimate of the normal, or expected value, for that time and location. Various estimates can be used, as long as they are consistent. Often used is a temperature over a three-decade interval. I use a least squares method, but normalise it to zero average over the 1961-90 period. That last step, though, is an adjustment at global level.

Comparing anomalies with different base periods is normally done by aligning the global average over those period. This works well at that level. But it does not align the anomalies in different spatial regions, where the local averages may have changed differently in those periods. In comparing GISS with 1951-80 basis with TempLS, with actually a least squares basis, I have just added an adjustment constant for each month. I set out the process here, giving a table of changes to make for all the popular conversions. But more is needed for spatial alignment, although the actual effect on say a month anomaly plot is small.

Practicalities - spherical harmonics again.

However, it isn't obvious how to do that. The problem with three decade bases is that not all stations have data in those times. That is why I use least squares. To make a comparison between two periods, you would, in principle, have to limit to stations where you can calculate averages in both periods. Even if that were possible, it would be a pity to have to in effect redo two temperature anomaly constructions just to change base.

But the change that matters is what shows on a spatial plot, and rather than do every station, it is sufficient to approximate the difference with spherical harmonics. This is the method I normally use to show TempLS monthly maps. Since the issue is what the regional temperature difference actually is, rather than the individual measurement methods, it is sufficient to work out coefficients by just one method. I naturally use TempLS.

Method and numbers

I first calculate a spherical harmonics fit for the TempLS anomalies for each month since 1900. It is like a Fourier fit, but using regression. The integration method is the mesh method. I specify order 10, which actually gives 121 functions in the SH basis. For each order n, one of the functions is just cos latitude multiplied by sine of n times the longitude, so this gives a measure of the spatial frequency. More details of spherical harmonics are here.

Once the SH approximation for each month is made, there is no further reference to which stations are reporting. Since the operations are linear, for each tri-decade we can just average the 121 coefficients, and then difference to give the change between bases. Then to regenerate the temperature map, it is just necessary to compute the values of the harmonics at whatever points are being used to make the map, for example a lat/lon grid, and then combine these using the difference coefficients.

Results

First, here is a map of the TempLS values for mean April between 1961 and 1990, plotted with negative sign. It would be similar for other months, since it mainly represents changes over a thirty year period.


The mean should be approximately zero, since TempLS was set to have mean zero in this range. It isn't quite zero, because the mean represents the SH-enhanced mesh integral. It may not be obvious from the plot (in GISS intervals and colors) that the mean is zero, but the area of the very warm part is small relative to the large areas of S Hemisphere cool.

This plot gives an idea of the magnitude of differences to expect. They can be large. Now I'll show the practical effect, for the most recent month, April 2019.



The top two maps are those I displayed. Left is the SH representation of normal least squares TempLS, and right is the GISS plot, which is based on actual means of months in 1951-80. So below it I have put the plot of TempLS calculated on this basis, as described above. Bottom left is the map of expected corrections in going from TempLS to a true 1951-80 base, less the spatially constant offset that was used.

There isn't that much difference between top left and bottom right, or GISS for that matter. But the bottom right is closer to GISS in some respects, and that can be explained by the bottom left. The cool spot above NW Canada is much reduced at bottom right, in better agreement with GISS, and that is the effect of the warming correction shown bottom left. The cold of the W Sahara almost disappears, as it does in GISS. Bottom right shows the warm correction there. Same in Saudi Arabia. Finally the hot spot in NE Siberia is made even hotter, in broad agreement with GISS, although the shape there is slightly different. Generally the ocean corrections are too small to notice.

I'll use the corrected versions in new monthly reports. But mainly I am setting this out because I am planning to use what I think will be a better resolution map. I will show that, in the next post, in comparison with reanalysis, and for that I need the extra accuracy.











4 comments:

  1. Hi Nick,

    I'd like to talk about these parts of your post:

    "The problem with three decade bases is that not all stations have data in those times. That is why I use least squares. To make a comparison between two periods, you would, in principle, have to limit to stations where you can calculate averages in both periods. Even if that were possible, it would be a pity to have to in effect redo two temperature anomaly constructions just to change base."

    Why do you think that to compare different baselines you would have to limit the sample only to stations that have full data in both periods? It is because you don't believe the differences aren't distributed normally? The meaning of the standard error in the mean is that if you do the experiment over with a completely different set of measurements, your new mean should fall within that range of the original mean.

    "But the change that matters is what shows on a spatial plot, and rather than do every station, it is sufficient to approximate the difference with spherical harmonics."

    The TempLS results have what looks to be 7280 stations in its sample. Steven's BEST had 5905 stations, and mine had 5036. How long does it take TempLS to crunch through all of the stations in your sample? It seems odd to me that you would use an approximation of the results of all the stations, rather than doing the work and getting the real numbers.

    Thanks!

    James

    ReplyDelete
    Replies
    1. James,
      "Why do you think that to compare different baselines you would have to limit the sample only to stations that have full data in both periods?"
      I'm describing what you would have to do for literal rebaselining. I should first emphasise that I'm really only talking about comparison between maps. Averages, and plots of averages, are not affected. But if you want to compare a plot that someone made averaging stations 1951-80, and someone else made averaging 1981-2010, then the simplistic way to do it is to change, say, the newer to the older. Then, for each station, you would subtract the average for 1951-1980, so you'd need numbers for that. And it wouldn't be in the average if you didn't have numbers for 1981-2010, so that means you'd have to have both.

      That's hypothetical, though, because the SH way I use avoids all that. There isn't any assumption about distributions in it.

      "How long does it take TempLS to crunch through all of the stations in your sample?"
      It depends on the averaging method, but can be quite quick. A minute or so. And I'm not using GHCN V3 but V4, about 27000 land stations altogether, plus SST. However, less than half of those land stations report in any month.

      "It seems odd to me that you would use an approximation of the results of all the stations"
      Well, I'll say it again - this is for comparing maps. It isn't for numerical output, such as averages. And it isn't for getting the maps right. They weren't wrong in the first place. People computed the anomalies using the actual station averages for the period.

      I'll be posting in an hour or so my high resolution graphs for 2016. That is where this stuff really could matter. But they are fine differences. Basically the crude maps are pretty good. I'm just trying to get them a bit better, because I think I can. I don't think it will make a difference to anything.

      Delete
  2. Thanks for the reply, Nick. One thing I forgot to ask: what are your criteria for using a station in the baseline? I use stations that have at least 345 of the 360 measurements for the full period, have at least 26 of the 30 months of an individual month's series, and that can generate all 12 months of the baseline. Steven uses a 75% figure for the 360- month series. Don't know about the rest. Where do you fit in there?

    Thanks!

    James

    ReplyDelete
    Replies
    1. James,
      "Where do you fit in there?"
      I don't. The point of the least squares approach, which BEST and I use, is that you don't have to restrict to a time period. Instead the global time variation is coupled, so that factor is removed and the overall mean can then be safely used. When I normalise to an interval (1961-90) it is after averaging. That was one point of my previous post; if you do then want the regional means to be zero over a period, there is some work to be done.

      The LS approach means you can use all stations.

      Delete