Sunday, December 4, 2016

Where do GHCN monthly numbers come from? A demo.

I've been arguing at WUWT, eg here. More and more I find people saying that all surface measures are totally corrupt. Of course, they give no evidence or rational argument. And when sceptics do mount an effort to actually investigate, eg here, it falls in a heap. BEST was actually one such effort that was followed through, but ended up confirming the main indices. So of course that is corrupt too.

As linked, I do sometimes point out that I have been tracking for six years with an index, TempLS, which uses unadjusted GHCN and gets very similar results to GISS and others. I have posted the code, which is only about 200 lines, and I have posted monthly predictions (ahead of GISS and all) for about six years. But no, they say, GHCN unadjusted is corrupted too. All rigged by Hansen or someone before you see it.

The proper way to deal with this is for some such sceptic to actually follow through the quite transparent recording process, and try to find some error. But I see no inclination there to do that. Just shout louder.

So here I'll track through the process whereby readings in my country, from BoM, go through the WMO collection in CLIMAT forms, and so into the GHCN repository. That's partly to show how it can be done, if a sceptic ever was inclined to stop ranting and start investigating.

I'll illustrate from my home town, Melbourne. For the state (and others), BoM posts half-hourly AWS readings, within a few minutes of measurement. The statewide site is here. Here is the line (as of now) which shows current temps, and ringed min/max.



Actually, this isn't quite so useful for my demo, because while the max is OK for today, the min is basically the min since then; it will show the proper min in the morning. But in a day or so, you'll be able to check. If you drill down to the Melbourne page, you'll see the last few days of half hour readings (and also daily max/min). These are the numbers that are quoted in news reports, etc. If someone says it was 35° yesterday, that is what they are quoting. So firstly, on the "corrupted" issue, that doesn't work here. Firstly, it would go against other people's experience and measurement if it was fiddled. And second, there is just no way that the process could have human intervention. There are thousands of figures posted every half hour.

Those numbers are entered into the curent month file. Here is a brief extract:



Today's numbers aren't yet entered, but will appear in a few hours. When the month is done, a page for that month is posted - last 13 months available. Here is the page for October for Melbourne Airport. The summary numbers are here; I have red ringed the relevant numbers:



Now those are the numbers that are sent off on the Climat form to WMO. If you follow through there, you'll see 100 entries for Australia. Here is an extract for Melbourne Airport for October 2016. I have redringed the min/max which you can see corresponds to the BoM posted file. It adds a calc of the average (13.2), which I have brown-ringed.



If you are really hankering for authenticity, you can scroll down to see the actual code they send. And finally, you can find the unadjusted GHCN data here. It's a big file, and you have to gunzip and untar. You can also get a file for max and min. Then you have a text file, which, if you search for 501948660002016TAVG you see this line:



It has the monthly numbers for 2016 (as integer, multiplied by 100) for Melb Airport, with some letter flags. I have ringed the corresponding number 1320. These are the data I use in TempLS unadjusted.

So you can see for Australia at least, the numbers can be followed through from the posted reports every 1/2 hour to the GHCN unadjusted monthly. Of course, it is just one month. But the pattern is there, and anyone who wants to say the process is corrupted should follow through other months to see if they can find a discrepancy. I bet they won't.







25 comments:

  1. How public is the GTS data? BAS gets it (via the UKMO I think) and the web interface I wrote years ago still seems to be going (https://legacy.bas.ac.uk/met/metlog/).

    ReplyDelete
    Replies
    1. I haven't come across any way of accessing it. It looks as if it would take a lot of work to make it easy for home use.

      Delete
  2. Small caveat, there can be a difference between the monthly average you could compute from daily data from the near real-time GTS and monthly CLIMAT values. The meteorological GTS data has to go fast to be used for weather prediction. The CLIMAT messages for climatology undergo a better quality control, also often manual QC, which may flag values as faulty that could not be detected as such in the near real-time GTS data.

    ReplyDelete
  3. Nice work Nick. Very interesting. But not all countries are so well automated. I suspect quite a bit of data are still hand-coded in many areas. And if you want to check old data, you would have to look for paper copies or scans of paper copies to check the data. VERY TEDIOUS indeed.

    It's my understanding that only monthly averages are subjected to homogenization. I seriously doubt that the raw monthly data you use for TempLS is likely to have been tampered with, although I'm sure there is bound to be a very small percentage of coding errors in the older data (but that's a different issue).

    It was interesting to see the BoM data reports. It appears that the min/max temperatures are not temporally resolved at more than half hour intervals, which is probably not ideal, but should not likely lead to any substantial bias over longer averaging times. Similarly, with automated data systems, we can now easily calculate monthly averages directly from all of the raw 1-minute, 5-minute, half hour, and/or hourly data instead of computing the mean of the minimum and maximum each day and using that to calculate the monthly average. However, my guess is that the differences are small and probably random, so that little or no substantial bias is introduced by doing it the old fashioned way for monthly and annual averages.

    In my mind, the factors that contribute most to uncertainty in trying to estimate a global average surface temperature are consistency and representativeness of the constituent measurements over time as well as lack of spatial coverage over large areas. My guess is that how these issues are handled may have the greatest effect on the final estimates and derived trends. Oceans, deserts, remote mountains, and polar regions are probably the most problematic areas in this regard.

    Conceptually, in the future I suspect that using the initializations for global weather models will be the best way to estimate and track global and large regional surface temperatures and temperature anomalies, as well as other modeled met parameters. In practice, there is probably still plenty of room for improvement, but what we have now may be an adequate start with the CFSV2 and ERAI.

    Ideally, maybe one of these days there will be global weather models that work well enough to be used to forecast out to months and years, and maybe even decades with reasonable accuracy. It would also be interesting to look backward in time to peak glacial periods to use the same weather forecast models to look at hypothetical daily weather patterns as might have existed say 20,000 years ago or to model very warm periods like 50 million years ago. I'd really like to see a climate model that can predict the next glacial period. If humanity survives for a few more hundreds of thousands of years, maybe this will come to pass.

    ReplyDelete
  4. Nick wrote: "Of course, they give no evidence or rational argument. And when sceptics do mount an effort to actually investigate, eg here, it falls in a heap. BEST was actually one such effort that was followed through, but ended up confirming the main indices. So of course that is corrupt too."

    I'll agree with you that most of the complaints about temperature data are poorly informed. I'll suggest some that may have some validity.

    You've demonstrated that Australia sends raw data to the unadjusted GHCN database. That doesn't mean that other countries do the same. Victor notes that even Australia may do some QC, which is good.

    No one ever sees raw GHCN data. They see processed output without transparency or explanation. Complete transparency would show the global average for all stations and say that this number is worthless for monitoring long-term climate change because temperature also changes with the season, seasonal changes are much bigger than year-to-year changes and there are far more thermometers in the NH than SH. Then we could see average temperature for November for Station X and the anomaly for that month. And the average global temperature anomaly for that month - with a clear warning that this number is also not useful for monitoring climate change because stations are not spread equally around the global and the number and location of stations is constantly changing. Then you could show the average temperature in each grid cell (or whatever method one uses to deal with station location inhomogeneity) and the global average grid cell anomaly.

    Then we get to the most difficult problem, homogenization of apparent breakpoints - a process I think is scientifically dubious. Without metadata, one doesn't know if a breakpoint has been caused by a sudden change to new measuring conditions (TOB, equipment change or station move) OR by a gradual deterioration in measuring conditions that is corrected by maintenance (screen albedo, ventilation, changes near the station). If you have metadata - especially for a change in TOB (which can be corrected by a validated method in the US), correction is necessary. However, even a documented station move - say from a gradually urbanizing site to a large nearby park - may restore conditions similar to earlier measurements. Anytime you homogenize a breakpoint caused by correction gradually deteriorating conditions (maintenance), you introduce bias into the record. Since we have no idea why there are so many apparent undocumented breakpoints in our records, I suspect the best answer is to show both the unhomogenized and homogenized results. I suspect the best answer lies between these two values. That won't change the conclusion that the planet is warming.

    Because BEST splits records at breakpoints, they keep any bias in the trend that may arise from deteriorating conditions and discard the correction. The net result is probably the same as homogenization. I'd prefer to see them report a record with and without splitting records at undocumented breakpoints. By creating two records from one by splitting, that are discarding useful information.

    Transparency would show the following for a select group of stations, grid cells, countries, and the world:

    [Daily temps)
    Raw monthly averages
    Monthly anomalies.
    Grid cell anomalies and temperatures before homogenization.
    Grid cell anomalies and temperature after homogenization.

    Is this transparency worth the effort? I don't know. Given all of the revelations about fake news during the last election and how it was spread among Trump supporters via social media, I'm skeptical.

    Frank

    ReplyDelete
    Replies
    1. Frank,
      "No one ever sees raw GHCN data."
      Well, they could. It's there. But of course they want to see a calculated average. It's true that the published ones are homogenised. That's why I do one unhomogenised. Anyone else could do that too. I do it just for diversity - I think homogenisation is right in principle, but in this case makes little difference in practice.

      "Then we could see average temperature for November for Station X and the anomaly for that month."
      Well, you can. I have a lookup map facility here. NOAA doesn't give easily accessible numbers, but they do give a graph page for each station, with and without adjust and difference (example here). You need to know the station number to access directly, but I have given a names portal here. Just click the center button ("GHCN Stations").

      "I suspect the best answer is to show both the unhomogenized and homogenized results"
      GHCN does. It's true that the indices are all homogenised, but again, I do both. The thing about homogenising - yes, you may make adjustments when there was no problem. But you can test whether that is introducing a bias, with synthetic data. So you replace some data which may have been biased, with bias-removed data, with noise from spurious adjustments. But many redings go into the average. Unbiased noise is greatly attenuated - bias not.

      "By creating two records from one by splitting, that are discarding useful information. "

      I agree with that.

      I should also mention that NOAA lets you select stations on a form and see all sorts of data. I have to say that the NOAA site is clunky, and I currently don't have a link. But clunkiness does not mean lacking transparency. You just have to find where to look.

      Delete
    2. Anonymous alias Frank on December 8, 2016 at 3:42 PM

      Nick has already formulated a very good answer to Frank's comment. But some remarks I nevertheless miss in this answer so typically smooth for him.

      I'll start at the same place as Nick did:

      1. No one ever sees raw GHCN data.

      When looking at such sentences I really ask me: how is it possible for somebody to pretend that in 2016, instead of simply searching for what (s)he thinks not be visible yet alone available for homo sapiens illiteratus.

      Googling for "raw GHCN data" immediately gives you the most important link to sources informing at any depth required:
      - GHCN - National Climatic Data Center:
      https://www.ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets/global-historical-climatology-network-ghcn

      So here you are at the heart of the matter. And even if you haven‘t any knowledge about what you see, you intuitively understand these are the corners best suited to learn.

      It doesn‘t take you very much time to land here, by clicking on no more than three links:
      ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v3/

      You don‘t need to be a specialist of anything to understand the GHCNM-v3.2.0-FAQ.pdf and the README file, and to have a more and more specific look at the metadata and the data files it accurately describes, once you managed to unzip the data:
      - ghcnm.tavg.latest.qcu.tar.gz (unadjusted)
      - ghcnm.tavg.latest.qca.tar.gz (adjusted)

      And now you can see the differences existing between unadjusted and adjusted GHCN data, those differences about which most WUWT and other sites‘ commenters produce nonsense inversely proportional to theire real knowledge.

      2. Complete transparency would show the global average for all stations...

      This is exactly what you can see ad nauseam, by searching for country identifiers, station identifiers, or isolating data interesting you following various criteria, e.g. latitude, longitude, name, environmental characteristics etc or any mix of all them.

      It is evident that having some accurate tools to search for information is important. Best is of course the UNIX/LINUX swiss knife toolkit helping you for example to isolate in a few simple steps all stations present above 80° N (there are three), or all „very rural“ US stations within the CONUS landscape.

      Exactly such a toolkit definitely helps you in obtaining the differences, for any subset of the GHCN dataset, between unadjusted and adjusted data.

      But sometimes you first discover what „unadjusted“ really means, by processing the data you selected, using e.g. Excel or a similar tool. You suddenly see here and tremendous anomalies resulting from outliers you otherwise certainly wouldn’t have managed to discover. (Such tools allow you moreover for perfect graphical display of extracted data, and even help you in computing trend estimates of any kind.)

      And BTW you learn to trust in a system which does not correct invalid readings by overriding the values but solely marks them as invalid.

      Unfortunately, the „adjusted“ data also contains corrections due to bias not originating from reading errors. But it is not homogenized in the sense of complex outlier reductions like adapting readings to the altittude or to the environmental characteristics of the stations‘ neighborhood.

      The difference between „adjusted“ and „homogenized“ you best see when comparing the linear trend estimates for the two time series between 1880 and today:
      - GHCN V3 adjusted: 2.29 °C / century (unadjusted: 2.14)
      - GISS land only: 0.71 °C / century

      And that’s how the result of GISSTEMP looks like:
      http://fs5.directupload.net/images/161210/a2fz4aw4.jpg

      3. I conclude with the hope that Frank soon will be heavily busy in requesting from e.g. UAH the same transparency he expects from surface measurements :-)


      Delete
    3. Thanks for taking the time to reply, Nick.

      You wrote: "I think homogenisation is right in principle, but in this case makes little difference in practice."

      This is exactly why scientists should publish both: If there is little difference, show both answers. If there is a modest difference (an increase of 0.2 K in 20th-century warming is my understanding) show that answer too. It is still warming. Recent disclosures about social media suggest reliable information is being overwhelmed by trash, but relying on experts saying "trust me" hasn't been working for awhile. Transparency couldn't hurt.

      Which of your records is homogenized and which isn't?

      Homogenizing data is never appropriate when you don't know the cause of the inhomogeneity. You are making a hypothesis about the cause inhomogeneity - a sudden shift in measurement conditions at a station - and ignoring an equally likely hypothesis - that a gradual bias crept into observing conditions that was CORRECTED by maintenance. Without evidence, one never modifies data*. We have evidence about TOB, but most breakpoints aren't caused by TOB nor corrected by the method that is validated for US TOB. (TOB is a big deal in the US.)

      * Suppose I were running a clinical trial for a new blood pressure lowering drug and on day 10 all of the readings a one trial site averaged 2 psi than day 9 and the other sites showed no major change between day 9 and day 10. Someone obviously must have started using a new instrument to measure blood pressure at that site on day 10 and all the readings after day 9 at that site should be lowered by 2 psi. Right? Try submitting that corrected data to statisticians at the FDA! At best, they might let you show the data analyzed with and without correction. However, if your conclusion about efficacy depends on a correction that you can't PROVE is justified, your drug probably won't get approved. If you want to publish, the result abstract needs to include both possible analyses.

      Frank

      Delete
    4. "Homogenizing data is never appropriate when you don't know the cause of the inhomogeneity. "
      I think GHCN do confuse the issue by releasing a file showing altered records by station. It's a convenient way of recording the changes. But in fact homogenisation is a step on the way to compiling an index, which is a spatial integral. In that average, a station is used as a representative data point for a region. You assign its value to the region, multiply by area and add.

      In homogenising, you say that the station value has behaviour that you think makes it not a good representative of the region. Basically, the arithmetic says that for the duration of that period, you'll prefer data from other nearby stations to estimate that region. People think that NOAA is asserting that the value at that location was really something else. But really, it is about the region. That is why BP analogies, for example, aren't right.

      It's true that there is a possibility that the process will have a bias in recognising sudden drops while missing a gradual rise. I think that is what happened in the bally-hooed Reykjavik case. They can do some tests for that, provide they have a systematic algorithm, as they do. The effect can go both ways.

      In this context, the obligation on scientists is to provide their best estimate. Legalistic rules like you propound conflict with that. I do sometimes wonder why they don't provide both adjusted and unadjusted indices, as I do. But I think they are right. The unadjusted wouldn't be their best estimate.

      Delete
    5. Homogenizing data is never appropriate when you don't know the cause of the inhomogeneity.

      Frank, you should let the astronomy and the electron microscopy and X-ray crystallography communities know about this. We've clearly been doing it wrong. In the case of crystallography some Nobel prizes will need to be returned, including the recent one for the structure of the Ribosome.

      Delete
    6. Kevin: I found what you wrote here:

      http://www-users.york.ac.uk/~kdc3/papers/homogenization2015/review1.pdf

      I do know a little bit about X-ray crystallography and worked with antibiotic binding to the ribosome. The final product from an X-ray crystallography study is a structure - a set of coordinates for atoms in a molecule. That structure can be used to predict the diffraction pattern that should have been observed. In other words, the guesses that were made during homogenization can be validated - unlike temperature homogenization. As I understand it, the process of solving a protein crystal structure is partly a matter of trial and error in building a structure that fills the electron density map extracted from the diffraction data. The agreement between the predicted and observed diffraction patterns confirms that the guesses that were made along the way were correct. (And X-ray crystal structure of proteins occasionally do contain some mistakes because it is impossible to test all alternative bonding arrangements to see which one fits the data best.)

      Therefore I'm not sure there is a good analogy between the use of data homogenization in protein structure determination and climate science.

      Frank

      Delete
    7. Kevin and Nick: Let's take a set of stations whose data has been homogenized so they contain typical noise, but no breakpoints and then detrend each station so it has no long-term trend. Let's call this "the true trend". Now let's assume that the albedo of all stations gradually drops with time due to the accumulation of dirt. This adds a gradual upward bias to all station readings of 1-2 K/century (normally distributed with an average of 1.5 K/century). Washing the exterior of the station (maintenance) removes the bias and restores "normal" measurement conditions (the truth). About 1/2 of stations are randomly washed an average of once every 5 years and the other half are randomly washed once every 3 decades. Now what happens if you homogenize the data again to remove breakpoints caused by maintenance? Is the trend still zero?

      If read that algorithms are finding as many as one breakpoint per decade in many station records. This suggests to me that station moves and equipment changes may not be the cause of many breakpoints. Perhaps maintenance can be. I can imagine a variety of maintenance tasks that might restore original measuring conditions and cause breakpoints that shouldn't be corrected: Deteriorating screen albedo. Deteriorating screen ventilation cause by accumulating debris. Encroaching shadows from growing trees. Encroaching urbanization followed a move to a site very similar to the original site. Unfortunately, it is probably difficult, if not impossible, to detect the difference between a breakpoint caused by a sudden and permanent shift and a breakpoint caused by a gradual bias followed by correction of that bias.

      Frank

      Delete
    8. Just to jump in here, I'm one of the pioneers of inferring nanostructure from stochastic and/or deterministic patterns in diffraction data. e.g. my algorithms are being used at the Argonne labs x-ray server http://x-server.gmca.aps.anl.gov/TRDS_sl.html

      Since I have been doing this kind of forensic work my entire research career, I started working on picking out the patterns behind QBO and ENSO and presented those results yesterday at the AGU meeting.

      http://contextearth.com/2016/11/21/presentation-at-agu-2016-on-december-12/

      Went pretty well and its really only a matter of time before these ideas get picked up by the larger community.

      All this discussion of homogenizing and infilling data seems pretty mild milquetoast compared to the real progress in climate science that you guys can be making. And that goes to Frank especially -- its really pathetic that he thinks his punching-down criticisms matter at all.

      IMO, the key is to compensate for the fluctuations in the temperature data due to ENSO etc, and then go from there. I just don't understand why you don't agree that this as the path to follow.

      Yet I also agree with Kevin that these crystallographers are worthy. The reconstruction work done by the Cornell group is amazing http://uuuuuu.lassp.cornell.edu/

      Crystallographers don't think like other people. They can reason in reciprocal space which is a huge advantage in studying periodic and quasi-periodic data.

      Delete
    9. Web - congratulations on your presentation.

      Delete
    10. Yes, congratulations from me to. I'm reading the doc you linked to. Is there a poster?

      Delete
    11. One question - Is it possible that ENSO is impacting the earth's wobble through ocean and atmosphere motions and not vice-versa? Googling I see an old NASA pub suggesting this.

      Chubbs

      Delete
    12. "One question - Is it possible that ENSO is impacting the earth's wobble through ocean and atmosphere motions and not vice-versa? Googling I see an old NASA pub suggesting this."

      Chubbs, You are absolutely right. Richard Gross at NASA JPL suggested this several years ago.
      http://www.jpl.nasa.gov/releases/2000/chandlerwobble.html

      The thing we have to remember is that Newton's First Law is ultimately at work here. The wobble may be a cooperative phenomena in that the (mainly) solid earth responds in one way while the fluidic oceans can compensate in another way. It doesn't have to be a completely (1) ocean sloshing causing the wobble or the (2) wobble causing the sloshing, but rather some balanced mixture. The sole objective is that the overall angular momentum of the system has to be conserved.

      Now, what hasn't been acknowledged in the literature too much is the fact that the Chandler wobble frequency happens to be very close to a seasonally aliased frequency of the draconic (nodal) lunar tide. If that is truly the case, the argument is much simpler because the moon becomes the pacemaker for both the Chandler wobble and the ENSO sloshing (and of course QBO).

      Read this discussion here on my blog

      The Chandler wobble then no longer becomes a resonant phenomenon associated with a free nutation of a non-solid and non-spherical earth but a response function to a periodic driver described by the combined lunisolar orbit. Needless to say, this is a significant change of thinking from the consensus. You can find scores of papers trying to deduce the 433 day Chandler wobble from estimates of the dynamics of the molten state within the earth's interior, but virtually nothing based on the moon and sun, except for a recent unpublished piece by NASA's Robert Grumbine (who replaces the lunar influence with a planetary influence). I noticed that there are also some AGW deniers who have noticed this connection, which is quite quaint, imho.












      Delete
    13. Thanks

      " Is there a poster?"

      Didn't get a poster, only a 15 minute PowerPoint presentation. Here is a condensed 1 minute YouTube animation of the presentation

      Delete
  5. Hi Nick,

    My question seems to have disappeared. Glitch? Should I try again?

    ReplyDelete
    Replies
    1. Peter,
      Yes. I get email notification of comments, even spam, but nothing has arrived. Do try again. Obviously one comment got through

      Delete
    2. Hello Peter Green,

      I have a similar problem when trying to send comments to Nick in Firefox: you type pretty much in, select something under "Reply as", click on "Publish" and... nothing happens.

      Thus to communicate with moyhu I use Chrome and everything goes well.

      The same happens with Nick's Globe viewer:
      https://s3-us-west-1.amazonaws.com/www.moyhu.org/maps/webgl/grid.html
      Firefox displays nothing, Chrome does.

      I guess it has to do with one or more of my ad and spam blockers integrated as add-ons in Firefox.

      Delete
    3. "I use Chrome and everything goes well"
      Probably a safe choice, since Blogger is Google software (one could dream up a conspiracy theory).

      Delete
  6. OK, I am using Chrome, but most likely I did something silly (like inattentively click the button on the bottom right which looks like a submit button but is actually a sign out button).

    Anyway, the question related to processing the files from BOM specifically, in relation to possible missing records, or either duplicated or additional records, as to what mechanisms or formula you use for infilling or otherwise dealing with any missing records, or how you deal with additional or duplicated records (if you do). I have looked your R code (thanks) but do not know enough R to be able to answer that question (I am much more conversant with Perl).

    The typical AWS files look to have one record every half hour.

    Peter

    ReplyDelete
  7. Nick: When discussing Sheldon's trend viewer (with you and Sheldon) at WUWT, I spent a lot of time using your trend viewer. I've always been concerned that difference colors for trends don't mean that a statistically significant difference between trends exists, leading users to over-interpret differences in trend. And it is a great tool for cherry-picking. Nevertheless, I found it partially useful for what I thought was an extremely important question: Has the rate of warming slowed down since 1998 or 2001 or between 2001 and 2013? So I tried to use it for this.

    The starting point that produces the greatest long-term warming is about 1975 and the warming rate since then is currently 0.18 K/decade (0.16-0.20) for HadCRUT global. So, if I am looking for a slowdown or speedup in warming within this period, I can make use of the trend view to find the lowest warming rate (locally) and confidence interval. The appropriate next step would be to see if these differences in trend were meaningfully by using the standard formula for the statistical significance of the difference in two means (given their standard deviations). Then I'd like to select or highlight or code the regions of the triangle where the trend is significantly different from the overall trend for the triangle. That would tell us for what period the Pause was and was not statistically significantly different from the past six decades. That doesn't appear to be very often.

    However, I didn't want to stop there. If you are using a 95% confidence interval, one expects to see about 5% of the trends be significantly different from the overall trend by chance. So I'd like to know what fraction of the triangle of trends is statistically significantly different from the overall trend. (It might be interesting to be able to chose your confidence level. The Pause may be significant at 0.01 or even lower.

    In any case, this might be an interesting way to address the significance of The Pause. My level of skepticism about it was enhanced by my amateur efforts with your trend viewer. The 2015/16 El Nino raised the trend since 2001 from 0.02 K/decade to about 0.12 K/decade. There are some periods beginning in 2001 with negative trends, but I focus on the upper confidence interval for low trends, which gets down to 0.06 K/decades if you cherry-pick and below 0.10 K/decade over a reasonable area. However, that is exactly what one expects for normally distributed data - about 5% of the area significant different at 0.05.

    I hope this makes some sense. Frank

    ReplyDelete
  8. I think GHCN do confuse the issue by releasing a file showing altered records by station. It's a convenient way of recording the changes. But in fact homogenisation is a step on the way to compiling an index, which is a spatial integral. In that average, a station is used as a representative data point for a region. You assign its value to the region, multiply by area and add.

    In homogenising, you say that the station value has behaviour that you think makes it not a good representative of the region.


    Nick, this is correct.

    On the page
    https://www.ncdc.noaa.gov/ghcnm/v3.php?section=homogeneity_adjustment
    we clearly can see a twofold, really confusing use of the word 'homogenization'.

    But we should not forget how small the difference nevertheless is between GHCN's unadjusted and adjusted data.

    You explained that years ago (in 2012!) and I recently tried to do a similar job as you did, by computing, out of the two datasets, the linear trend for each of the 7,280 stations having contributed to the data, and the trend differences.

    The mean of these trend differences (adjusted minus unadjusted) is no more than 0.04 °C / decade.

    And it would be by far lower if we eliminated all the nonsense data produced by stations like Tocumen (Panama) or Elliott (Australia) during only about ten years of activity, most of it dropped off in the adjusted record.

    And the average trends computed over all stations for the period 1880-2016 show as follows:
    - unadjusted: 0.214 °C / decade;
    - adjusted: 0.229 °C / decade.

    Thus, despite the legitime critique applied to performing spatial homogenization within a set of single stations, the difference between unadjusted and adjusted record remains incredibly small when compared with further homogenization steps e.g. at GISS, with as effect a trend for the same 1880-2016 period:
    - GISTEMP: 0.071 °C / decade.

    And this latter trend shows how meaningless some criticisms against homogenization are anyway, as it in fact shows a dramatic downsizing in comparison with the GHCN trends.

    ReplyDelete