Sunday, March 17, 2013

Next stage of Marcott et al study



In my last post, I described a rather primitive emulation of a long period proxy temperature reconstruction by Marcott et al, going back to 11300 BP. I tried first using the published dates, and got a fairly unsurprising result, similar to the paper, except that it lacked a recent spike visible there. Then I tried using the modified dates of Marcott et al, and got s generally similar result, except for a very large initial spike.

I have found a rather trivial reason for this, but I'm not sure what to make of it. At least one proxy sheet in the spreadsheet, proxy 65 RAPID-12-1K, had some junk in a block of numbers well down from the main data block. These are in the data columns E:G, which are, respectively, published date, temperature, and Marcott date. Update, 64 and 68 have similar junk.

My program read those as data, and for some proxies it considerably affected the interpolation. The reason for the difference is that for published dates, the extra numbers seemed to be dates beyond 11300, which did no harm. But in col G, the Marcott dates, they were small numbers, and interpreted as early years.

When I fixed this, and read only numbers in the consecutive data block, the numbers were much more reasonable, although a spike remained.

I don't know if a corresponding issue may have affected the paper. I would doubt it, although it would explain the discrepancy between the thesis version and the Science paper.

Update I have posted an active viewer for the individual proxies.

Update - I have made some minor changes to the plots:
1. I changed the anomaly base from 5800-6200 BP to 4500-5500. The period I initially used was from Fig S26, but the latter is the right one. I don't think it made a perceptible difference
2. Romanm at CA noted some spurious zeroes in proxy 62. These caused the earlier dip at about 9000.
3. I improved the y-axis alignment of the combined plots.

Details below.

Here are the plots. Firstly the new plot with published dates, which is not much changed:



Then the plot with Marcott revised dates. This has a still considerable spike, but less than before:



And here are the two orange curves solo:


Discussion


The new spike is less than before, and less than for Marcott et al. But it still represents a change caused by dating, and I'm looking into the reasons. I think it may be particular proxies; theree are two main contributions.

One is from, 44: Mount Honey on Campbell Island, has the early part of the date range rather compressed - about 30yr intervals go to about 10 years. The temperatures themselves are unremarkable, but the trends increase.

The other comes from two proxies which have large negative anomalies, but which disappear from the early years on redating. These are 54: OCE326-GGC30 and 58: Flarken Lake. In both cases, re-dating moves the date back 100 and 150 years respectively. In both cases there had been substantial cooling over many centuries.

Code

I won't post a new version yet. There are no method changes; just messing around to get a contiguous block of numbers, and some convenience stuff (which is useful) connected with the two date cases.

29 comments:

  1. h[,11] (Moose Lake) is all NA, presumably because there isn't data to cover the your 5800-6200BP range for normalization. Also, why did you choose that range, when the paper uses 4500-5500?

    ReplyDelete
    Replies
    1. Oops, no, it's probably because there is an extra column in the sheet - I thought I had caught all those. I'll fix it. It may be that I used an unedited sheet.

      The range I used came from Fig S26. It's the first thing I found when I was looking for the range. But I've changed it.

      Delete
    2. Sorry, you're right. I'll update the zip file. The figs should be OK, since I've used the new range.

      Delete
  2. Has anyone looked at the best linear unbiased estimator of the mean using the covariance matrix of the sites? (It's identical to the calculation of the 'trend' term in ordinary kriging)

    The would deal with the area weighting in an optimal way. I had a good article on this the other day, but can't find it at the moment.

    Kevin C

    ReplyDelete
    Replies
    1. The paper was Cressie, The origins of Kriging, doi:10.1007/BF00889887
      The BLUE for mean is:
      \mu = (1' C^{-1} 1)^{-1} 1' C^{-1} Z

      C is the covariance matrix of the field at the locations of the observations Z. 1 is an Nx1 vector of 1s. ' is transpose.

      Kevin C

      Delete
    2. Thanks, Kevin,
      Yes, the area weighting could be done in many better ways. I would use a tri mesh.

      I'll look up that Cressie paper. It sounds like a good explanation.

      Delete
  3. Google Scholar has a pdf of the paper (but not of the SI) as of right now. Enter the title of the paper, then click on the [PDF]link on the right of the first result.

    ReplyDelete
    Replies
    1. You can find the supplemental information on the science web site. It's not pay-walled.

      link.

      Delete
  4. With this kind of data I tend to think a good check against the mean is to try a simple median average. Using the median reduces the influence of outliers and preserves common patterns across proxies rather than smoothing over them, which might be particularly likely with high-noise data, as proxies tend to be.

    Comparing the median to the mean there is a reduced multi-millennial trend in the last half of the series, because a larger trend is seen in only a handful of proxies. In the last two thousand years there is less of a clear distinction between the warm MWP and cold LIA - instead there is a continuation of a fairly steady downward trend spliced with large blips, which seem to coincide with known volcanic events. Using all the proxies seems to produce a "hump" at around 1500BP, but this appears to be caused by proxy drop-out, and is lost if we remove any datasets ending prior to 200BP.

    ReplyDelete
  5. Nick,

    Your interactive plot is a great device. I often study temperature distribution plots for work and I will employ this technique in the future. Thanks.

    One thing I noticed is that your MD01-2421 is plotted before BP 0. As you know Steve Mc has criticized Marcott for "Hiding the Decline" by inserting an NaN in the most recent three records for that combined series. Presumably you're using the Published Age as opposed to the Marine09 dates.

    Isono reports NC for calendar date of the 3 cm depth which corresponds to the ones that Marcott NaNed.
    ftp://rock.geosociety.org/pub/reposit/2009/2009139.pdf

    Why would Marcott be wrong in reporting NaN for the same depth/time?

    Thanks in advance for indulging the dumb questions of a newbie,

    DGH

    ReplyDelete
    Replies
    1. DGH,
      Sorry I missed this earlier. Thanks for the comment; I hope the code I've posted may help.

      For some reason Steve Mc refuses to acknowledge the consequence of the fact that the carbon dating program just can't assign dates post about 1954. So if Marcott et al are going to be consistent, and the program doesn't offer an alternative date, they can't use it. In this case the program brought 25BP forward to 11BP, and the rest would have been in the post-1954 region.

      Delete
    2. Thanks.

      I asked Steve the same question and his reply didn't seem to get to my point. In fairness my question wasn't well worded over there.

      But it seems to me that if Isono couldn't date the .03m sample, how can Marcott be expected to? On one hand Marcott is wrong for redating a series without consulting the original author and on the other hand he's criticized for using the same data? Call me crazy but Steve's entire post "Hiding the Decline: MD01-2421" hinges on that issue.

      On the other hand, you're all too willing to give these authors a pass IMO.

      Robert Rohde of BEST emailed Revkin, "Previous work had already pointed towards a period of early Holocene warmth somewhat higher than recent centuries." Indeed none of the comments from the authors' peers that I've read suggest that this Holocene reconstruction was anything more than interesting. So why all the attention?

      Because they included the spurious spike. And they also paid a great deal of attention to the temperature record and IPCC projections. That gave the media the distinct impression that the spike was meaningful. Revkin titled his post on the matter, "Scientists Find an Abrupt Warm Jog After a Very Long Cooling." According to Revkin, they didn't compare their data to the the temperature record they FOUND an abrupt warm jog. His colleague Justin Gillis made the same error as did Seth Borenstein at Huff Post.

      My point is that Marcott et al are basking in the limelight. It seems fair that they should also deal with the heat. At the least shouldn't they do an official correction of the plot? You've pointed out its flaws. And it has clearly, demonstrably contributed to significant confusion about the research.

      Their FAQ and further response should be interesting.

      Delete
    3. Anon,
      I agree that the paper has had more attention than it deserves, and it may be because of the spike. But it's still a useful account of Holocene temperatures.

      It's true that the press doesn't distinguish well between the known heat spike of the 20thC and the spurious one from the proxies. But I think Marcott did, generally. And in a way the thrust is correct - there was a spike, even if it is mis-attributed.

      Boernstein's report shows both sides of the confusion:
      "Marcott's data indicates that it took 4,000 years for the world to warm about 1.25 degrees from the end of the ice age to about 7,000 years ago. The same fossil-based data suggest a similar level of warming occurring in just one generation: from the 1920s to the 1940s. Actual thermometer records don't show the rise from the 1920s to the 1940s was quite that big and Marcott said for such recent time periods it is better to use actual thermometer readings than his proxies."

      Delete
  6. I'd post this on the other page, but the comment is blocked by the plotting graphics.

    How hard would it be to add a second table at the bottom -- same data, but with the recalculated dates?

    I really like the blink comparisons, it would be interesting to use it to compare original versus recalculated versions, to see how that's affected the reconstructions.

    nice work by the way.

    ReplyDelete
    Replies
    1. Carrick,
      Thanks for noting the comment issue - I've fixed it.

      The scheme requires a spaghetti plot plus a black line png for each proxy. So it's not hard, but not trivial. My thought is to do a comparison on a 2-millenium time scale. I don't think much could be seen on the whole Holocene plot.

      I'm currently working on a post-able version of the code for the viewer. That will be easier for me to use as well.

      Delete
  7. Nick,

    I rewrote everything in PERL so that I could use the same Hadley processing software used on CRUTEM4. I also wanted to avoid the 20 year interpolation step. In the process I discovered several problems with the Excel spreadsheet.

    Proxy: 46, 56, 39 have a column shift for the Published Year/ Temperature

    Several also have NaN for the Year and or Temperature eg. Proxy 33, 54

    Having fixed all this I also get a similar result to Marcott's. The statistically small uptick in the last bin seems to originate from the Southern Hemisphere. http://http://clivebest.com/blog/?p=4761



    ReplyDelete
    Replies
    1. Clive,
      I checked - the sheets without revised dates are 25,27,50,61,67

      Delete
  8. Clive,
    Yes, I think those sheets just have an extra column near the start. Another snag is that EPICA at least has an empty revised date column.

    Did you check the spurious zeroes in proxy 62? I think they may be the cause of your 10000 yr event.

    Your post is very interesting.

    ReplyDelete
    Replies
    1. I also just spotted that the problem 10,000 years ago is with proxy 62 ! There are indeed two spurious zeroes! thanks.

      I think that by avoiding any interpolation I should be able to give a statistical error on the last uptick. It should simply be proportional to sqrt(N) where N is 25. So the random error on the measured temperature is about 20%.

      Delete
  9. I have now also discovered that all the ice core data have their "Published Temperature" set exacltly the same as " Published Proxy Temperature Anomalies" . This is very sloppy !

    Sorry - but I am now convinced that the temperature uptick is simply an artefact ! It is based on wishful thinking !

    see http://clivebest.com/blog/?p=4790

    ReplyDelete
    Replies
    1. Clive,
      I agree that it's an artefact, tho I agree with Tamino and Steve Mc that it is mainly due to the way proxies drop out, which is related to re-dating.

      It shouldn't have been wishful thinking, rationally. It predicts a spike in 1940 that we know didn't happen. That should have reduced its prospects of publication. But maybe it didn't.

      Delete
    2. I have done the same analysis for the re-dated data. I still think the core problem is the interpolation to a 20 year timebase, which is then accentuated by by the re-dating. The graphs can be found here.

      Delete
    3. Clive,
      I think the basic problem is trying to get aggregate modern estimates from few data. Using interpolation shifts the blame to that step.

      Tamino's difference result is interesting; it seems to behave more reasonably. It's equivalent to padding the curves beyond the ends by the most recent value available, so dropouts don't make a discontinuity. Of course, padding for decades may be reasonable, but maybe not for millenia.

      Delete
    4. Yes I agree. There is simply not enough data in the modern era to draw any conclusions. and Tamino's analysis shows this rather nicely.

      However, my main point is that it is simply wrong to generate pseudo-data by interpolating measured data to a 20 year time-base, especially since the resolution of the actual measurements is often 100-300 years ! The re-dating then just compounds that basic error.

      I think your Javascript graph viewer is great !
      Looks like I may have to learn "R" since I can't afford to buy IDL or Matlab !

      Delete
    5. Thanks, Clve,
      I've found R really useful, not only for what it is best known for, but also because people have been writing packages to bring all sorts of other things into the environment (like webGL).

      Delete
  10. Clive Best: However, my main point is that it is simply wrong to generate pseudo-data by interpolating measured data to a 20 year time-base, especially since the resolution of the actual measurements is often 100-300 years ! The re-dating then just compounds that basic error.


    My suspicion is the re-dating as it has been implemented is a disaster. I really need to see their code though. Not enough detail to know.

    ReplyDelete
  11. I think the spike can be understood by interpolating just one Proxy TNO5-17. We have only 2 measured points.
    Date Anomaly deg.C
    1904 2.3
    1950 4.5

    Interpolation then gives us.
    1900 2.0
    1920 3.1
    1940 4.0
    1960 5.0

    a linearly increasing spike !
    These values are meaningless because the random variation on this one proxy is +- 1 degree over the last 10,000 years. Interpolation invents a trend where there isn't any.

    ReplyDelete
    Replies
    1. Clive,
      Interpolation is basic to reconstruction. You have a lot of readings at different places and times and there is no way you can usefully put them together if you don't assume that temp is continuously varying between sampling points. The recon needs to attribute some temp to TNO5-17 in 1904-50. What better values could it use?

      What should really be done is a kriging type reweighting (eg BEST) so that the assumed values, where info is poor as here, are not allowed to dominate.

      Delete