Wednesday, June 29, 2011

More proxy temperature reconstruction plots

This is a sequel to the previous post, where the list of data information, taken from the NOAA file, is printed. I have extended the time scale to 2000 years. This time I have used 41-year triangle smoothing. And I have experimented with inclusion and omission of instrumental temperatures.

This the plot over 2 millennia with HADCRUT3NH (with 11 year smoothing) in a background light grey.


More below the jump...

Monday, June 27, 2011

Northern Hemisphere proxy plots - last millenium

This is the next in what may be a series of multiple time series plotted with animated gif's in the hope of greater visual clarity.

I found at NOAA a large table of proxy-based temperature reconstruction data. The background is described in this Wiki article. For the moment, I've plotted just the Northern Hemisphere reconstructions for the last millenium. For those who like that sort of thing, I've currently omitted the instrumental curve at the end. More plots may appear later here.
Update - I misspoke here. The Mann2008g and 2008h curves are a composite including instrumental temperatures. Crowley2003b also includes instrumental, but only up to 1993.
Unrelated - I have replaced the plot with one including a few more series from the same resource. I have use a consistent anomaly base period of 1936-1965 (the most recent 30-yr period common to all). The Oerlemans set is based on glaciers and is described as global rather than NH.

Here is the plot, unsmoothed. Below the jump is a summary of the data sets.


Thursday, June 23, 2011

Steven Mosher's GHCN V3 R Package

It's now up at CRAN. You can find it here.

The Mac and Windows binaries are not there yet, but coming soon. But the reference manual, which is the only thing I have really looked at yet, is there.

I'll write more when I've been able to try it out (waiting for binaries). More information at Steven's blog.


Wednesday, June 22, 2011

Time series plots - using animation

In my posts so far in trying to find more readable ways of presenting multiple series plots, I've tried multi-colors, and varying spectral maps. With multi-colors the lines may be easier to distinguish but harder to follow.

Eli, in a comment in the first post, suggested making curves respond to the pointer rolling over. This needs Java programming, which I can't do. But it occurred to me that an animated gif would achieve something of the same effect.

If each curve becomes, for a period, a continuous black line, then its path can be traced easily. Between times, the dots will separate out the local features. I'll switch to this more topical example - JAXA Ice extent:



More visible time series plots

In my previous post I described the use of alternating colors to improve the readability of "spaghetti" plots of time series, especially for readers who had trouble distinguishing fine shades of color. I updated several times, so if you read it a while ago, you might like to check it again.

There was feedback, here and at Lucia's, from readers concerned about color-blindness, especially red-green, That got me thinking more about appropriate color schemes.

The benefit of three colors alternating, as I had, is that one can hope thatmost people could distinguish at least two of them, since they come from different parts of the rainbow(). But maybe that can be reinforced.

The downside to all this is that alternating colored lines are harder to follow by eye than single color.

Anyway, I've looked more into the R function rainbow(). It is just scanning the hue spectrum in the hsv() function. I'll talk more about HSV and RGB color numberings below. For the moment, this just makes possible a more flexible approach to the spectrum, which may help with color difficulties.

I've plotted here the same TSI example using different spectral ranges. You can click on any plot to see enlarged. The top blue mini-graph shows the part of the spectrum that is enhanced - more colors are chosen from the region where that function is higher. Below is a bar with the uniform spectrum, and below that, the spectrum actually chosen. The rest is as before. Below the jump come some thick line versions.


Friday, June 17, 2011

Cheerful colors for time series

Over at the Blackboard. Lucia was looking at how to get a good color scheme in R to show multiple time series. It's quite hard to get a set with good contrast.

I've been wondering about that too. It's a personal problem - my ability to distinguish color shades has decreased.

I've been dabbling with an alternative idea - stripy lines. Or at least alternating color segments. Then you don't have to rely on shades to make the distinction.

Lucia illustrated with some solar data from Leif Svalgaard. She used different dot-dash line styles to nelp make contrasts. I thought it would be really good to make these in alternating colors. You can do this by over-writing.

So here's what I came up with. Some may like it, some not. The lines are in principle more distinctive, but it's harder to see where they are going. Single contrasting colors are certainly better, if you can get enough of them.

Anyway, here's my plot. The R code is below the jump, and I'll put a zip file (TSIcolors.zip) with data on the doc repository. As Lucia noted, Leif's file just has blanks for missing data, so I edited the NA entries in.The colors are automatically and randomly chosen.

Update:
Peter O'Neill (oneillp) in comments  suggested using R-supplied palettes. I think this is better, specifically rainbow(). He also suggested a way to fix the line segments in legend, using seg.len. I found my legend() function would not take that as an argument. I also found that the problem with lines only applied when in jpeg or png mode. I couldn't find the bug, so I wrote my own legend routine - using a subset of the regular arguments. 
Update.  Replacing the above update. I've redone in the spirit of Peter's second comment. Instead of a new legend function, I use the values returned by the the standard oneto overwrite the line segs. I don't then need to use seg.len

Revised pictures and code below. 


Wednesday, June 8, 2011

Effect of selection in the Wegman Report

The Wegman Report was a report to Congress, invited by Rep Barton, Chair of the House Energy and Commerce Committee. The report has recently been revealed as heavily plagiarised. It was the centerpiece of hearings directed at Michael Mann's "hockey-stick" papers (MBH98, Nature 1998,MBH99)

However, this post is about the science. The thrust of the WR scientific criticism of MBH is that they used an inappropriate mean to normalize the proxy data - the mean for the calibration period, rather than the full period. This would tend to produce hockey-stick results.

The WR report was based on papers by McIntyre and McKitrick, particularly MM05b GRL. Wegman used their code, archived here. An important claim, frequently cited, is that the MBH algorithm would generate results of hockey-stick appearance, even if the data consisted of red noise with no such tendency. To this end, they showed three figures based on red noise simulations:
  • Fig 4.1 compared the first PC generated from such a simulation with the MBH reconstruction.
  • Fig 4.2 showed a histogram of "hockey-stick index" (a difference of means as a measure of HS shape)for 10,000 simulations using the limited and the full mean.It showed a normal unimodal distribution for the full mean ("centered"), and a bimodal distribution for the partial mean ("decentered").
  • Fig 4.4 came with this caption:
    One of the most compelling illustrations that McIntyre and McKitrick have produced is created by feeding red noise [AR(1) with parameter = 0.2] into the MBH algorithm. The AR(1) process is a stationary process meaning that it should not exhibit any long-term trend. The MBH98 algorithm found ‘hockey stick’ trend in each of the independent replications.
    It showed twelve HS-like PC1's generated from a MBH algorithm.

Deep Climate did a thorough investigation of these graphs and their provenance, to complement the work he and John Mashey did on the plagiarism. Regarding these plots he found:
  • the HS PC's shown were anything but random samples. In fact, the 10000 simulations had been pre-sorted by HS index, and the top 100 selected. A choice was then made from this top 100.
  • Although Wegman had said that "We have been able to reproduce the results of McIntyre and McKitrick (2005b)", the PC in Fig 4.1 was identical to one in MM05b. Since the noise is randomly generated, this could not have happened from a proper re-run of the code. Somehow, the graph was produced from MM05 computed results.
  • The red noise used in the program was very different to that described in the caption of Fig 4.4.

In this post, I mainly want to concentrate on the first issue. How much of the HS shape of the PC's that they showed was due to the MBH selection process (and there is some), and how much to the artificial selection from the top 1% of sorted HS shapes? To this end, I tried running the same algorithm with the same red noise, but using correct centering.

It's a fairly long post, but you can peek at the conclusion.