For a while now, I have been maintaining updated data tables and graphics. Many are collected in the latest data page, but there are also the trend viewer, GHCN stations monthly map, and the hi-res NOAA SST (with updated movies). These mostly just grew, with a patchwork of daily and weekly updates.
I'm now trying to be more systematic, and to test for new data every hour. My scheme downloads only if there is new data, and consumes few resources otherwise. My hope is that all data will be processed and visible within an hour of its appearance.
I have upgraded the log at the bottom of the latest data page. This is supposed to record new data on arrival, with some diagnostics. Column 1 is date, which is actually the date listed at origin, translated to Melbourne time. The column headed "Delay" is the difference between this date and the date when processing is finished and the result should be on the website. I'm using this to find bugs. The date in the first column isn't totally reliable; it is the outcome of various systems, and may predate the actual availability of the data on the web. The second column is the name with link to the actual data. For the bigger files (size, col 3) a dialog box will ask whether to download. The "Time taken" is the time used by my computer in processing (again, for my diagnostics). Where several datasets are processed in the same hourly batch, this time is written against each of them. Currently, only the top few most recent lines of the log are useful, but new data should be correctly recorded in future.
NOAA temperature is a special case. It doesn't have the files I use in a NOAA ftp directory, but serves them with the current time attached. I have to use roundabout methods to decide whether they are new and need to be downloaded (I use their RSS file). By default they show as new every hour - I have measures to correct this, but they may not be perfect. Anyway, the times in the log for NOAA are not meaningful.
I have a scheme for doing the hourly listening only when an update is likely (assuming approx periodicity). If data arrives unexpectedly, it will be caught in a nightly processing.
It is still a bit experimental - I can't conveniently test some aspects other than just waiting for new (monthly) data to appear and be processed. But I think the basics are working.