Saturday, May 1, 2010

Just 60 stations?.

Eric Steig at Jeff Id's site said that you should be able to capture global trends with just 60 well-chosen sites. Discussion ensued, and Steve Hempell suggested that this should be done on some of the other codes that are around. So I've given it a try, using V1.4 of TempLS.

I looked at stations from the GHCN set that were rural, had data in 2009 2010, and had more than 90 years of data in total. The selection command in TempLS was
"LongRur" = tv$endyr>2009 & tv$length>90 & tv$urban == "A",
That yielded 61 stations.

Update: This topic was revisited  here

Results and comparisons below the jump.

A summary of this series of posts is here.



First a map of the 61 stations. I've called the station set "LongRur":
First comparison is with the Hadley land-only CRUTEM3, annual and smoothed with the IPCC AR4 Chap 3 13-point filter. Note that the dates are wrong in the titles - the plot is actually 1902 to 2009.


Much more annual fluctuation, but the smoothed curves track well.

And here is GISS Land only:

Now a comparison of the inferred temperature with GISS Land/ocean. This and the next are less direct analogues, because of the inclusion of ocean temperatures.

Here is a comparison with HadCrut3 (land/ocean):
Clearly the smaller set has greater excursions. But it's true that the trend looks much the same, although the 61-station set rises more rapidly towards the end. That is expected when comparing a land station set  (with several oceanic islands)  with a land/ocean trend. I'll post comparative numbers later.(trends are below)


Here are the plots of the TempLS outputs with trends and a smooth, over two time periods:

Trend Comparison

Here are the linear trends, 1902 to 2009, in deg C/Decade
LongRur     CRUTEM3  Giss          HadCRUT
0.0860.087  0.0690.076
The match to the land-only trend (CRUTEM3) is  very good.

77 comments:

  1. Can you post the time series for the 60 versus the full stations? I'd like to compare the spectral properties of these.

    I'd predict the biggest difference is that the full station average reduces "high frequency" (short-period) climate fluctuations. If so, it wouldn't be surprising if you could recover long-duration temperature trend estimations with a small subset of all the stations.

    ReplyDelete
  2. Carrick
    It's now on the <a href='http://drop.io/erfitad">repository</a>. I've included all the (unsmoothed) plot data. The file is "Run61StationsCompare.txt".

    I agree with your prediction.

    ReplyDelete
  3. I'd love to see an animation that shows the number of stations narrowing in on fullness. That is: starting with a black like that represents the full data set (like you have), include a slim red line that shows 30 stations. Then 60. Then 90. And so on until the red line and black line are completely the same. It would be a nice way to illustrate how a full data set is best (of course) but not exactly necessary.

    ReplyDelete
  4. I blame the rural heat island effect :P

    ReplyDelete
  5. On a more serious note, I found something quite similar when I used a really-rural station set (e.g. dark, low pop density, non-airport, rural), though it wasn't nearly as small a set as 60 stations.

    ReplyDelete
  6. To be kept in mind is that these aren't even 60 well chosen locations. They're just the actual sites that happened to fit your search criteria.

    If you gave me 60 dots to put on the map at will, the map would look quite different. That's an exercise you can do with the results of a climate model.

    I must say that excluding the US is a bit strange. Seems like most bloggers, when looking globally, only use GHCN instead of GHCN+USHCN. It takes a bit more programming, but at some point, it makes sense to incorporate the USHCN.

    ReplyDelete
  7. CE,
    Yes, I wanted to see how a rather arbitrary (non-subjective) choice would work - so I asked for rural and stretched the length of record requirement until the number came down to about 61.

    The absence of US stations is a surprise, which would bear further investigation. There are plenty of US stations in the GHCN, but, it seems, none that are both rated rural and with more than 90 years of data. It's been remarked recently that US stations seem to fail a rural test when it seems intuitively that they should be rural. The US was not deliberately excluded.

    ReplyDelete
  8. Nick, I compared the outputs of this to your land only output, all stations.

    I think the biggest difference is your rural 60 stations has a much higher sampling of oceanic stations. As we suspected, short-period fluctuations are enhanced. But what is really interested is by how much. The biggest change is the 3.7 year amplitude is about 30 dB high.

    I suspect if you picked a subset of stations with a similar geographical distribution to your original series, you wouldn't see this.

    Maybe we can get Eric Steig to reveal the list of stations he used?

    ReplyDelete
  9. Nick,
    I think the "had data in 2009" requirement is what's knocking out the US. I think the US stations in GHCN that continue past 2006 tend to be airports and/or urban(somebody can check on that).

    For all the other US stations post 2006, you need USHCN.

    Now and then you see somebody claim that the evil conspirators at NOAA or NASA or wherever have eliminated all rural US stations; you get this from people who don't know the USHCN is a parallel data set with the stations they're looking for.

    ReplyDelete
  10. Carrick,
    "Maybe we can get Eric Steig to reveal the list of stations he used?"

    In the RC post, they give the dataset and selection criteria they used to get to 318 stations. For their point to stand, any (or at least many) random subset of those 318 could be tested and show about the same thing. Of course, some random subsets would be better spatially distributed than others, so you'll get more or less of a match.

    All that said, if the point is merely about spatial correlation and not the issues with real station data (station moves, instrument changes, non-climate site influence), then working with the results of a climate model is a logical way to go here.

    ReplyDelete
  11. carrot eater: I wouldn't trust climate models to get the spatial correlations on these spatial/temporal scales right.

    For purpose of comparing algorithms, you need the same set of stations of course. I think it is a reasonable expectation that Steig should release the actual set of stations used in his analysis.

    ReplyDelete
  12. Model spatial variability seems reasonable to me. Reasonable enough that it tells you something, and can guide somebody who's looking into what parts of the Earth are undersmpled, and how badly.

    As for Steig, you can ask him for what those 65 stations were, but it's a much stronger test to randomly pick 65 of your own, after using the criteria he stated. Half the point is that the stations were random, so somebody looking to repeat should also go random.

    All that said, going random won't tell you the theoretical minimum number of stations you need. For that, you'd need perfect knowledge of the field (if you like a model result, you can use that), and strategically pick locations, regardless of where actual stations are.

    ReplyDelete
  13. For what it's worth, that post at RC also doesn't quite complete the loop and compare the subsamples to the whole set. It's really just comparing a raw subsample to the same stations, CRU-adjusted, to show that the CRU homogenisations and whatever else aren't having a big effect from the raw.

    ReplyDelete
  14. Carrot Eater: Model spatial variability seems reasonable to me. Reasonable enough that it tells you something, and can guide somebody who's looking into what parts of the Earth are undersmpled, and how badly.

    Have you actually looked at this? I've done some admittedly cursory testing, and found that the model output didn't appear to do a good job of describing real-world climate fluctuations. (As somebody who does numerical modeling himself, I know of the dangers in using code outputs for problems they weren't specifically tuned to work for.)

    At the least I would want to compare experimental measurements against code output before settling on the code for more detailed studies.

    As for Steig, you can ask him for what those 65 stations were, but it's a much stronger test to randomly pick 65 of your own, after using the criteria he stated. Half the point is that the stations were random, so somebody looking to repeat should also go random.

    The issue here is it does appear that Nick has not replicated Steig's results. I know of no explanation for why Steig gets as little variability as he does with just 60 stations as he does, it could be he hand selected 60 stations to do this. But it would helpful to understand what is being done differently.

    ReplyDelete
  15. I haven't looked at model spatial correlatoin formally, no, but I thought I'd seen some side-by-side maps about that. It's really a question of whether the model's spatial correlation drops off with distance in the same way as shown in the Hansen 1987 paper.

    As for replicating Steig: I fail to see what isn't being replicated. Is Steig's variability really that much less than Nick's? Hard to eyeball, with things on different scales. And if anything, Steig's trends are lower. What's Nick's standard error - that's about the only other statistic we have to compare there.

    You do realise that the red and blue plots of Steig are both with a small number of stations? Unlike Nick, Steig did not plot a data series for the entire set of stations.

    ReplyDelete
  16. Steig actually did something different than you describe. To show something different. reread his methodology.

    he went to UCAR. selected 100 years stations. that netting him 318 stations.

    He then did a random selection geographically uniform.

    He then selected those having 26 years in the 1960-91 window.

    That netted 65 stations.

    Then he selcted the corresponding stations from CRU. whats that mean?

    Anyways then he compares them in two seperate bins.

    Steig: selects 65 from UCAR
    CRU selects 65 from GHCN

    the comparision shows that CRU has not corrupted the data.

    That was my read on the issue. The comparison of means had SDs on on the order of .1-.15C so a -.3C to .3C spread.

    My sense the thrust of the analysis was that CRU had not corrupted data, a charge some less informed people had made.

    ReplyDelete
  17. err.. CE and I ...looks like we agree. Steig was really answering a different question.

    ReplyDelete
  18. Yes, Steig was answering a different question, about what CRU had done to the raw data.

    But at the same time, he would have also been able to address this question; he just didn't show it explicitly there as Nick does above. He mentions it in passing when talking about the standard error in the trends.

    At some point, you have to set a standard for how good a match is good enough, for the purpose of this question. Are we comparing trends over 1900-2010, 1960-2010, and so on? Are we computing the correlation between the entire set and the subset? How good a correlation do we want?

    ReplyDelete
  19. steven,
    I don't think your description is perfectly accurate (though it's close), so I'll paste the original here.

    In any case, the red series at RC is from raw data for 32-33 stations, and roughly what you might compare to Nick's red series of 61 stations. Steig didn't make a graph of ~60 stations at once, and again, didn't graphically compare to the total set.

    from RC
    "As an example, we extracted a sample of raw land-surface station data and corresponding CRU data. These were arbitrarily selected based on the following criteria: the length of record should be ~100 years or longer, and the standard reference period 1961–1990 (used to calculate SAT anomalies) must contain no more than 4 missing values. We also selected stations spread as widely as possible over the globe. We randomly chose 94 out of a possible 318 long records. Of these, 65 were sufficiently complete during the reference period to include in the analysis. These were split into two groups of 33 and 32 stations (Set A and Set B), which were then analyzed separately."

    ReplyDelete
  20. I agree that Steig's analysis was for a different purpose, which was basically to try to get a dataset as different as possible from CRU's, to check robustness against data source and selection. But I believe both he and Gavin have suggested as an independent proposition that 60 stations should give a reasonable estimate of a global signal.

    ReplyDelete
  21. My guess is that this idea is coming from models, at least to some extent. It's handled that way in some of the literature.

    ReplyDelete
  22. Stephen, thanks for the comment, interpretation and explanation. It makes a lot more sense as a test of goofy nonsense about manipulating or faking data.

    I guess my interest is a bit different. I'd like to know how few stations one needs in practice to replicate CRU (in strictly a statistical sense).

    I know in practice that the global climate modes fail nominal statistical tests, I see no reason to start with something that you know has defects.

    In terms of the variance question I was asking, Carrot Eater, what I am referring to is the standard practice of detrending the data (e.g., the output from the rural 60 stations), then computing the variance or standard deviation of the residuals.

    E.g.,

    Rural61 -- 0.22°C
    LandGISS -- 0.11°C
    CRUTEMP -- 0.14°C
    TEMPLS (land) -- 0.15°C

    The question I'm interested in (which is different than researcher malfeasance) is how few stations do you need to use with a particular algorithm before you recover e.g. 0.15°C in the land-only record?

    It would also be interesting to compare real data to model output. I'd predict you need fewer "stations" with the model output.

    ReplyDelete
  23. CE, The earliest argument I know of for correlation in scale lengths comes from Hansen and Lebedeff, 1987

    This was definitely an experimental assay, rather than a model-based one.

    Stephen has commented on other threads about the 1200-km correlation length. Perhaps he'll repeat them here.

    ReplyDelete
  24. Carrick, I cited Hansen 1987 myself up above.. and the relevant part was not purely experimental. The spatial correlation test was experimental, but later in the paper, the error due to undersampling is model-based. We've been through that before. They took somebody's model output, found the global mean anomalies, and then computed what you'd find if you only had model output at the spots where you have stations.

    There have been papers since then that use that same method to figure out the minimum number of observations. Menne (2010) mentioned this procedure in passing, when discussing whether Watts had published a big enough list of CRN12 stations to do the analysis.


    Anyway, you're after a somewhat different question, which is whether the models have enough temporal noise. Even if individual model runs don't have as much wiggle (ENSO and the like) (and I didn't think that was so bad, some of them are pretty wiggly), that doesn't mean the model output's spatial persistence length will be too long.

    It's the spatial correlation you should be comparing here, model vs reality, not the temporal wiggles.

    As for Stephen's complaint about the 1200 km, I haven't seen it backed up with anything. You can see the graphs in H&L for yourself. They use a weighting factor, so anything at 1200 km is given relatively very little weight. Only if all stations are that far away from the grid point do the ones that distant really come into play.

    ReplyDelete
  25. Carrot Eater: As for Stephen's complaint about the 1200 km, I haven't seen it backed up with anything. You can see the graphs in H&L for yourself. They use a weighting factor, so anything at 1200 km is given relatively very little weight. Only if all stations are that far away from the grid point do the ones that distant really come into play

    It's easy enough to test.

    Simply plot the standard deviation of the detrended temperature residuals against average spacing between met stations.

    Also, to be clear if you look at Figure 3 of H&L 87, the plot on correlation length, that is derived purely from experimental data not model results. I agree he later reports on an error analysis of the effect of incomplete coverage using his computer models, but that's separate from the question of the correlation length.

    In terms of models having too little temporal amplitude, what I do know is they fail to give the observed temperature fluctuation spectrum, and that there is a relationship between this and the correlation length (in this case not necessarily a simple one, though under certain assumptions it is simple). Failing on one makes the other suspect, so I certainly wouldn't start with model results to inform on reality.

    Again, perhaps Stephen (ahem) could amplify on his comments re H&L and the 1200-km number. At the least, it gives something to test.

    ReplyDelete
  26. I meant to say "the observed short-period temperature fluctuation spectrum".

    ReplyDelete
  27. "and that there is a relationship between this and the correlation length"

    This need not be true. It's possible to get temperature fluctuations over time completely wrong, and still get the spatial correlation about right. So the latter is what you should be looking for, for this question.

    An interconnection would exist if the spatial correlation patterns change over time: Meaning, if a certain density of stations is sufficient in, say, the Western US during El Nino, but a different density is required during La Nina, because the spatial gradients of temperature anomaly change with ENSO, AO, etc. But these are the sorts of things that can be examined, if somebody just rolled up their sleeves and looked at model output and land record, and examined whether spatial persistence length in any given broad region changed over time. I'm not interested enough to do it.

    ReplyDelete
  28. carrot eater: This need not be true

    There is in general a mathematical relationship between the temperature fluctuation spectrum and the resulting temporal correlation (they are related by a Fourier transform).

    The trick then is connecting temporal correlation to spatial ones, which depends only on there being a mean (nonzero and stable over time) advection velocity (this certainly is true for large scale motions on the Earth). So in general temporal correlations are connected to the spatial ones.

    It's completely non-obvious to me how a model that doesn't get the temporal characteristics correct is going to get the spatial ones right. One possible opportunity for this to happen is if the global average of temperature anomaly were to some how exactly cancel out all of the spectral components that the model isn't correctly capturing.

    I know that this isn't happening (ENSO for example shows up in the global temperature fluctuation spectrum), so I don't hold out much hope the models are getting the spatial correlation right either. Still worth testing, but as I said, I personally wouldn't start with the assumption the models "get this right".

    ReplyDelete
  29. Carrick, you're asking for much more than you need, and for some reason, you don't seem to realise it. The model output need not be nearly as good as you think.

    What you need to know is if New York is at +1,, averaged over May, then how likely is it that Philadelphia and Baltimore are in the same ballpark? Does the model output roughly agree?

    If you care about time, then you ask does the answer to that question somehow change with ENSO, AO, NAO, etc? Yes, all those cycles change spatial patterns, but how much do they change spatial gradients? That's the question.

    Basically, to the extent that time matters at all here, it is in what is d (dT/dx)/dt.

    ReplyDelete
  30. Here, these URLs aren't stable, I don't think but looking at Feb 2010 with the longer smoothing, it looks like there are some pretty impressively steep spatial gradients around the North American Great Lakes and the Caucaucus and southern Russia. The AO, perhaps combined with ENSO, give us these patterns.

    For the question at hand here, you'd only underestimate the number of required stations if the model output never showed spatial gradients that steep. Though even these steep gradients came from a finite number of observations.

    http://data.giss.nasa.gov/cgi-bin/gistemp/do_nmap.py?year_last=2010&month_last=3&sat=4&sst=0&type=anoms&mean_gen=02&year1=2010&year2=2010&base1=1951&base2=1980&radius=1200&pol=reg

    ReplyDelete
  31. RE Hansen and the 1200km.

    The fundamental problem I have with Hansen on this is that the selection of 1200km is unmotivated by any substantial analysis. First, as I recall, the NH shows a correlation of .5 at 1200km ( with a spread of roughly .2 to .8.. recalling please check) while the SH shows a correlation of .4 at this distance. so is .4 "good enough" this gets to the "motivation" or justification of selecting a figure for the minimum correlation required. What one would like to see is the effect of selecting different correlation slections. that is the sensitivity of selecting .8 as opposed to .3. that is, how does changing this figure change the final answer. What are we trying to optimize.. Since the bias due to spatial sampling is a function of the inter station correlation ( see brohn 06) What I would expect to see in the analysis is a sensitivity curve that showed how the metrics change as a function of slecting this parameter. Now, of course weighting by distance does attentuate the influence. here it strike me the same way that Hansens UHI calculation works.. throwing in 100 stations at great distance with little weight, strikes me as either wrong or inconsequential bit twiddling. just an impression.

    WRT the sensitivity: we have this

    http://clearclimatecode.org/trendy/#comment-429

    but that only compares 250 km to 1200.

    As I note Karl used a cutoff of 750km for the Tobs adjustment

    And this paper does a finer treatment of the problem than Hansen

    http://hadobs.metoffice.com/hadghcnd/HadGHCND_paper.pdf

    ReplyDelete
  32. Note at high latitudes and lo lats the figure they use does approach hansens figure of 1200km.

    ReplyDelete
  33. Carrot Eater, if all one wanted to do was answer the question "if you had a correlation relationship, C(r) = exp(-rho0 |r|), what is the minimum spacing necessary to recapture all of the information of the field (in the Shannon information sense)", you don't even need a GCM, you can do this calculation analytically.

    If you want to go beyond that, you need to start considering the actual properties of the real temperature field, but in turn how you analyze that depends on what you are trying to achieve. For example, you might need one spacing to get a global mean temperature that accurately reproduces the temporal properties to e.g. 1 one-year period, a second (wider) spacing for 5-year reconstruction accuracy, and an even wider period if all you want is e.g. the 30 year trend:

    In general the correlation length depends on the time-scale you are interested in.

    Regarding your "d (dT/dx)/dt", remember that we are describing a system in which the fluid is in general in motion with respect to the frame of reference, so you need to use material derivatives. So even in this case, the mean flow of the medium shows up.

    Beyond that, the effect is pretty straight forward. If you have a nonzero straight-line (1-d) flow with velocity "v0", C(t) = C(r/v0). It's a bit more complicated if you have a nonzero |v^2| but v0->0, but you still get a time-space relationship even there.

    The presence of steep temperature gradients is an interesting observation. I suspect they can be related to the same basic observation though. If you have high frequency spatial noise that you aren't resolving that gets aliased back into your sampled data as lower spatial frequency "noise". But if the magnitude of that "noise" can be neglected relative to the other spatially varying components of the temperature field, it can still be neglected.

    Sorry that this is rushed--I have to run off to a meeting.

    ReplyDelete
  34. Stephen, my big worry about H&L is it appears they assume the correlation function is axially symmetric. In reality it should be written as C(x,y) where x is along the direction of the mean flow for a given measurement location and y is transverse to it.

    ReplyDelete
  35. Carrick, As I conceptualize it.. your correlation in space has of course a "temporal" dimension.. I get the weather you had yesterday... to put it in words.. is that what you are angling at with the discussion of flow.. cause if so I've been scatching my head about that one as well..

    so to your point along the direction of flow of course u have one correlation figure... orthogonal to the flow you have a different figure..fluids.

    have a look at the paper and their "angular" weighting

    ReplyDelete
  36. First off, can anybody recommend a freeware program that's easy to use, and digitises data from old plots in PDFs? I've heard of people using these, and it's high time I got on that wagon.

    Steven: Yes, around 0.5 at 1200 km in NH, with better performance at high latitudes. Yes, it's a little arbitrary, and if somebody were really motivated, you could take an area with very dense stations (the US), and do some tests, and come up with something more sophisticated. But we already know that it pretty much doesn't matter, so long as there is a decent station density. Hence the match between CRU and GISS, outside the Arctic. Remember, the 1200 km instances have such little weight, they are irrelevant, if there are nearer stations.

    It only really comes into play in the Arctic, and the lack of stations in the Arctic makes it hard to test the method in the Arctic.

    ReplyDelete
  37. Good lord Carrick, I didn't mean for you to take the derivative literally.

    All I'm saying is that the temp anomaly contour lines in the model output have to be reasonably spaced, in order to use model output for this purpose. Yes, for an annual average, the spatial gradients will be less steep than for the monthly average.

    The point of using the model results is in filling in what you don't know, or at least, trying to. In a part of the world that's undersampled, it's hard to know what you could be missing, because you don't have the real data. You could of course learn some things in an oversampled area and hope the lessons apply elsewhere, or you use a model.

    ReplyDelete
  38. Steven,
    "the selection of 1200km is unmotivated by any substantial analysis"
    There's a lot of geostats about this sort of thing. Kriging is the method to get the right scale. I'm hoping to write a post on this soon.

    ReplyDelete
  39. Carrot Eater: Good lord Carrick, I didn't mean for you to take the derivative literally.

    That is possible a first. The use of a derivative in an interpretive rather than literal sense! LOL.

    i got your meaning, I was just pointing out that even that innocuous expression you wrote down ended up with transport terms.

    ReplyDelete
  40. Nick looking forward to the kridging..

    ReplyDelete
  41. But we already know that it pretty much doesn't matter, so long as there is a decent station density. Hence the match between CRU and GISS, outside the Arctic. Remember, the 1200 km instances have such little weight, they are irrelevant, if there are nearer stations.

    Yes, like I said..

    Either there are closer stations and the 1200km is irrelevant.. err usless

    OR...

    untestable? hmm seems too strong..

    Same sort of thing with the UHI adjustment..

    ReplyDelete
  42. steven: "useless"

    If you're going to use a simple linearly decaying weighting factor, then there's always going to be some distance at which stations are mathematically included, but unimportant if there are nearer stations.

    As for testing the concept, you can do it to your hearts content in the US or Europe or other dense station areas.

    ReplyDelete
  43. CE.

    I think you made the point. where you can test it ( dense stations) its not needed, where you need it, its not tested.

    Something like that.

    ReplyDelete
  44. Right. So there will be some continuing uncertainty over the Arctic, though if you feel you can use models to help you, then model reanalysis helps.

    I suppose satellites can be of some use. Meaning, if there were some hidden magical little regional patch that didn't correlate at all with the distant surrounding stations, maybe the satellites would clue you into that.

    If only there were a satellite measurement for altitudes closer to the ground.

    ReplyDelete
  45. There is 10 years worth of MODIS/Terra land surface temperature data. It's got problems, but it might be sufficient for a correlation analysis.

    https://lpdaac.usgs.gov/lpdaac/products/modis_products_table/land_surface_temperature_emissivity/monthly_l3_global_0_05deg_cmg/mod11c3

    ReplyDelete
  46. CE.

    I think the biggest issue with the arctic is the seasonal effect.

    I havent spent much time looking into this, but its on the list

    http://wattsupwiththat.com/2010/01/29/diverging-views/

    ReplyDelete
  47. CE..

    The issue being this. For the stations on the coast in the north.. when the ice clears up near shore these station see an jump in warming that is most likely highly localized. so a correlation that may hold true for a long part of the year probably degrades was the water opens up.

    If I had to guess I would say the truth lies somewhere between a CRU answer and a GISS answer.

    ReplyDelete
  48. In case zeke pops up, I've just finished coding up the extraction of the Antarctic metadata from the relevant web sites.

    Ugly. its done in the style of GISS but using R. found and reported my first R bug. caused a couple days of grief.

    Then when time permits the scripts to download the data.. and then reconciliation with GISS.

    ReplyDelete
  49. Not sure if something Reber dreamed up in the last few months can be called the biggest issue yet.

    I noticed that when it came out. Couldn't quite work out what he was really getting at. Then forgot about it.

    ReplyDelete
  50. OK, enough nonsense.

    To the relevant literature on this topic:

    Wang, Shen. "Estimation of Spatial Degrees of Freedom of a Climate Field" Journal of Climate 12:1280-1291 (1999)

    It's a BAMS journal so you can get it your own darn self.

    Anyway, they use both observational data and model output. Honestly, I didn't bother to follow all the math, but:

    For global annual mean, you want 112, model-based, or only 45, observation based.

    You need more for monthly means, and it varies with month, peaking in June (206, model; 95 observation).

    So where did Steig get 60?

    Maybe from this earlier work referenced in the introduction:

    "Shen et al. (1994) showed that the global average annual mean surface temperature can be accurately estimated by using around 60 stations, well distributed on the globe, with an optimal weight for each station."

    ReplyDelete
  51. CE we've discussed that paper way back when. Gavin referred us to it long ago

    ReplyDelete
  52. problem is NOAA disagree for capturing US trend. go figure

    http://surfacestations.googlecode.com/files/Stationdensity.pdf

    ReplyDelete
  53. CE,

    I wouldnt take Tilo work at face value. Issue is the difference btween cru and giss. Jones didnt think much of Giss on this and other issues so take it up with him. hehe. but seriously the way giss handles the arctic bears SOME looking at.

    ReplyDelete
  54. Steven,
    The googlecode link didn't work for me, but I got to the paper from http://surfacestations.googlecode.com

    They are talking about resolving spatial distribution, which is a more sensitive requirement. In fact, it is highly relevant to my previous post, where I found the 122 stations in GHCN quite inadequate for resolving regional US temp in 2008.

    ReplyDelete
  55. Yes Nick -

    Steven, that's addressing a more difficult problem.

    "“temperature change for any location within an area can be represented by a single station with an average mean absolute-error less than 0.1°C per decade.”" They also do error less than 0.05 C/decade, 0.075 and 0.125.

    Getting the trend at any one specific location right within some error is a much more difficult task than getting the spatially averaged global/regional trend right within some error.

    So if this paper is going to be the basis of Pielke/Watts reason to not do any macro-quantitative analysis (reading the tea leaves based on the URL), they might not get too far with that.

    In any case, if you'd already read the Wang/Shen paper, then that should take out some of the mystery of where numbers like 60 come from.

    ReplyDelete
  56. CE: For global annual mean, you want 112, model-based, or only 45, observation based.


    You need a lot more than 45 stations to recover global mean temperature. Nick's results demonstrate that (and Chen does not provide a counter example of this.)

    Nick raises the fair question of how many stations you need to for long-period global temperature trend.

    I could believe 45 is possible for that, or even less if you make some assumptions about the temperature trend teleconnections function (or use models to provide those). But not a chance in the world you can recover global mean temperature from just 45 stations.

    ReplyDelete
  57. CE.

    That paper doesn't have anything to do with Watts and Peilke AFAIK. I had one 2 minute conversation with Anthony on that paper. I told him I thought the effect was going to be small, confined to Tmin, and probably wont show up in an area average unless the N is large. I dont expect he will do an area average because he would have to pick some area averaging code. what would he use? GISS? Anyway, I simply refuse to look at any paper before its published unless that paper is open to everyone to see before its published. What comments I would make would be public comments. no holds barred.


    Personally, since day 1 I have argued that the BEST approach for folks to take is to select the best stations, remove the suspect ones. why? because you CAN get away with fewer stations. there was no need to keep the CRN45 in any analysis, or try to adjust urban with a flaky meatgringer algorithm.

    Pick the best data. Its a pretty consistent approach. I take the same view of Yamal, bristlecones and Tiljander. Remove the suspect data, compute your result. live with the uncertainty. if you also want to present the case with the 'suspect' data, thats fine too. In journal papers this is hard, but we have the web, no word restrictions there.

    ReplyDelete
  58. "I dont expect he will do an area average "

    That's what I'm saying. He should do an area average, because the claim is that the area average is affected by whatever effects he's highlighting. And yet nobody thinks that Watts will. Pielke will need an excuse for not doing so. Excuse #1 could be that there aren't enough CRN12 stations to do a spatial average, and they'll need to back that up somehow. Excuse #2 would be to further reduce the number of CRN12s because something about airports. Excuse #3, if it gets to it, would be the silly cop-out of saying that spatial averages are meaningless, because the global average said February was warm, yet it was snowy in my hometown this winter.

    "because he would have to pick some area averaging code. what would he use?"

    Watts was never going to do the coding, but for pete's sake, that is what Pielke's grad students are for. Surely one of them can do math. It took Zeke what, all of a day and a half to write a gridding routine, once he put his mind to it? But there's a history here; there was a similar exchange between Pielke and NOAA where Pielke came with some qualitative stuff, and NOAA came back quantitative.

    "Personally, since day 1 I have argued that the BEST approach for folks to take is to select the best stations, remove the suspect ones. why? because you CAN get away with fewer stations. there was no need to keep the CRN45 in any analysis, or try to adjust urban with a flaky meatgringer algorithm."

    Meet the US CRN. As it gets older, I think you'll see it used more and more. As for screening USHCN for quality stations: the paranoids will always complain about removing anything, no matter what. But you can do multiple products, if you want. Also keep in mind that CRN45 didn't always used to be 45; if we go with the idea that CRN45s tend to be MMTS, then presumably they were located a bit differently before.

    As it is, not every last thing is typically used; you can also draw on all the coop stations. Though I think Menne does use those for homogenisation now.

    I thought Mann did present results with and without Tiljander in the first place. And others have done bristlecone-less - Moberg was essentially tree-less, and I think Mann has done one as well. There's value to having people present the reconstructions in different ways; you see how much different things matter.

    ReplyDelete
  59. CE: . He should do an area average, because the claim is that the area average is affected by whatever effects he's highlighting. And yet nobody thinks that Watts will. Pielke will need an excuse for not doing so

    I don't think this assumption of poor faith and bad motives is very helpful.

    ReplyDelete
  60. Carrick:

    I don't think this is very helpful:

    http://wattsupwiththat.files.wordpress.com/2009/05/surfacestationsreport_spring09.pdf

    Note the conclusion: "The conclusion is inescapable: The U.S. temperature record is unreliable. And since the U.S. record is thought to be “the best in the world,” it follows that the global database is likely similarly compromised and unreliable."

    Where is the analysis? Yet to appear.

    I don't think this is very helpful, as discussed at length elsewhere:
    http://wattsupwiththat.com/2010/01/26/new-paper-on-surface-temperature-records/

    Where is the analysis? Yet to appear.

    See a pattern?

    You want to talk about helpful? I wish I could.

    As it is, Watts pretty much said they weren't going to do this sort of analysis. Let's see if the reviewers make them do it.

    ReplyDelete
  61. CE, if Watts is behaving in poor faith or even just doing science very poorly, he will be hoisted by his own petard. In the meantime there is no benefit incurred in terms of moving the arguments forward by this level of personal animus.

    I agree about the abysmal quality of Watt's surface papers compendium. (I notice that this was one thread where dissenting views appear to have been scrubbed, which itself is shameful behavior.)

    But you mentioned Pielke. Perhaps you can link where Pielke advocates against using area weighting? That's the part that I thought was over the top.... Watt's deserves a bit of a thumping for that dreck he published, IMO.

    ReplyDelete
  62. "he will be hoisted by his own petard"

    This has been an ongoing event.

    "In the meantime there is no benefit incurred in terms of moving the arguments forward by this level of personal animus."

    Unless forced to, I highly doubt Watts et al were going to do the spatial average that is sorely needed here. Do you disagree? He's basically said as much - he mentioned that his analysis was going to be done in a different way.

    "Perhaps you can link where Pielke advocates against using area weighting? "

    I'll see if I can find it. It's mostly Goddard who says this, but I'm pretty sure Pielke did at one point as well.

    ReplyDelete
  63. Speaking of Goddard and WUWT, that discussion on Venus is just painful. Though to their credit, a lot of the readers are being critical.

    ReplyDelete
  64. Apologies for not being around much of late. Between finishing revisions and reviewer responses to a paper I'm publishing on energy modeling, traveling all around the country for business meetings (4 meetings in 4 cities this week... ugh), and a new girlfriend, I've been a tad swamped. Hopefully will be able to spend a bit more time blogging next week.

    Carrot,

    The Venus thing was indeed rather painful to read. I tried to direct Goddard to some of the early (1960s-1970s) literature on modeling Venus' atmosphere, but apparently "read the literature" isn't a good argument even if it would really help Goddard actually understand the atmospheric dynamics and forcings involved...

    The whole affair did produce this gem when Richard Steckis tried to parrot Goddard's argument over at RC:

    Richard Steckis says:
    7 May 2010 at 12:19 PM

    The essential argument is that the heating of the Venusian atmosphere occurs through adiabatic processes and not through absorbance of IR by GHGs.

    [Response: Since 'adiabatic' means without input of energy it seems a little unlikely that it is a source of Venusian heating. - gavin]

    Mosh,

    Shoot me an email with the Antarctic metadata when you have a chance, or just add it to your DB. I'm still trying to decide what project to undertake next, though Nick already seems to be doing most of the interesting stuff :-p

    I am in a conversation with Matt Menne about potentially partnering up on a UHI paper, and I'd be happy to tap the expertise of various folks here if that ends up going forward (as well as the great work done by Ron Broberg, of course).

    ReplyDelete
  65. Thanks for posting all that lovely data!

    Your (MS Excel 97) correlation matrix is:

    Y = Year
    L = LongRur
    La = LandGISS
    C = Crutem3
    A = AllGISS
    H = Hadcrut3

    Y L La C A H
    Y 1
    L 0.707 1
    La 0.812 0.881 1
    C 0.830 0.881 0.984 1
    A 0.871 0.866 0.976 0.979 1
    H 0.873 0.834 0.951 0.967 0.987 1

    So much for needing thousands of stations.

    ReplyDelete
  66. You are right, BPL. Out of several thousands of noisy stations there must be at least one that is very-very close to the global average. Pick this one. No need in 60. So much for needing all these measurements when the answer is already abundantly clear - more CO2 creates more backradiation, and we all must be warming up rapidly under the extra 3.7W/m2.

    ReplyDelete
  67. Carrick,

    You need a lot more than 45 stations to recover global mean temperature. Nick's results demonstrate that (and Chen does not provide a counter example of this.)

    Wang & Shen (1999) estimated 45 degrees of freedom for global annual average temperature, so...

    Assuming that station locations are randomly distributed and that record length is not an issue, is it possible to estimate the minimum number of stations selected at random that can reliably construct a global average (presumably that would be more than 45)?

    ReplyDelete
  68. ... presumably Wang & Shen imply 45 statistically independent records, whereas the ones selected may not be? (Just askin')

    ReplyDelete
  69. ... so 45 is a lower bound?

    ... any general thoughts on their method?

    Regarding the higher degrees of freedom they found in the GCM (ECHAM)... what does this imply wrt spatial variability and correlation? ECHAM has greater power than observations at the annual frequency for global average temperatures. OTOH GISS ModelE has lower. I could easily imagine similar happening with spatial variance, the observed power spectrum sitting somewhere in the model range...

    ReplyDelete
  70. 45? Are you kidding?

    You are trying to measure/estimate behavior of a spatio-temporal field of atmospheric temperature at 2m above ground. This is the object that varies at typical spatial scale of, say, 50km. Yes or No? Therefore, in accord with scientifically established and mathematically proven methods (Shannon-Nyquist-Kotelnikov sampling theorem) the spatial sampling MUST have at least two samples per characteristic scale, or 25km. Now, the Earth surface has area of 2*10^8 km2, which means that to cover all surface one would need about 800,000 surface stations. No less than that, and likely more.
    The other "challenge" to determine how frequently one has to take measurements is left for readers :-)

    ReplyDelete
  71. Thanks, Ron,
    They seem, at least the second link, to be focussing more on temporal rather than spatial inhomogeneity. I guess they need to, since radiosonde measurements are much less frequent.

    ReplyDelete
  72. When you say 90 years, do you mean 90 years that have _any_ data in that year, or 90 years all of which are complete years?

    ReplyDelete
  73. drj
    The former - years in which at least one month was reported.

    ReplyDelete
  74. I can not see the map of station location very well but it seems most if not all are near the coast.

    Frank Lancer has done some work also and found the coastal areas follow sea surface temp while inland locations do not.

    We already know that oceans have temperature oscillations and that is a lot of what you are picking up.

    Frank's Site:http://hidethedecline.eu/pages/posts/ruti-global-land-temperatures-1880-2010-part-1-244.php

    A more readable version: http://joannenova.com.au/2011/10/messages-from-the-global-raw-rural-data-warnings-gotchas-and-tree-ring-divergence-explained/#comment-625436

    ReplyDelete
  75. Anon,
    There's an update post here. I don't think it's entirely true that in this earlier version the stations were overly near the sea - there are clusters in Russia and Central Europe, for example. The second post used a scheme to try to get more representativeness of global area - that strongly selected for sea-side.

    I've always thought land temp did not make as much physical sense as global. Temperature is really a proxy for total heat, which has fluctuating spatial variation but tends to be conserved globally.

    ReplyDelete