Wednesday, April 1, 2020

Covid19 - graphs of daily data - turning the corner?

Everyone seems to want to write about Covid-19 lately. Unlike most of the world, I am not an expert on epidemiology. But I have been anxiously looking at graphs of recent data to see if the social distancing measures are turning the tide. Like most people, I look at the Worldometer site. But I'd sometimes like to drill down a bit, and also to get all the graphs in one place. The main source of collected information seems to be the Johns Hopkins Github repository. So I looked into it.
My interest is in the point of inflection of the growth curves. So I have plotted here not the cumulative totals but the daily increments. They are noisier but give an earlier marker of change.
So here are the graphs. I hope to keep them updated daily. You can choose to see daily new cases or deaths. Just click on the radio button next to a country name. The buttons on the yellow backed line let you choose states or provinces of the named country. The bottom table (nations) entries are arranged in diminishing order of total cases as at the most recent day. Be aware that the y scale changes to fit each data displayed.

Some details:
Hong Kong is currently included with China, which is how the source does it. I'll probably separate it in the future. HK is mainly responsible for the recent rise in China cases - you can see it listed as a province of China.
Johns Hopkins separated US data from their global time series table, saying that they would post a corresponding US table. But AFAICS, they haven't yet done that. So I had to add up the US data from the daily reports (by county!), which may lead to some minor discrepancies. One is that I have omitted the numbers from the Princess cruise ships which were listed separately.
I have omitted some data that Johns Hopkins recorded for the Diamond Princess and Grand Princess cruises. They handled it in a messy way, splitting it up among countries and states. This will cause some minor discrepancies with WorldOMeter data. I have also not included in the US total some minor regions like Northern Marianas.

22 comments:

  1. Some of these are weirdly inverted. For new cases, hitting "US > Texas" gives me Czechia data while "World > Czechia" gives me Texas. On Firefox if that matters.

    ReplyDelete
    Replies
    1. Thanks, Josh.
      I had misalignment of the data on deaths and cases (one was more up to date). Should be fixed now.

      Delete
  2. Interesting graphs, but some of the buttons don't match the data shown. If I click China, I get Spain, if I click Spain I get China. There are a number of others.

    ReplyDelete
    Replies
    1. Yes, see above. Should be fixed now.

      Sorry to all about the moderation wait. Things should be better now.

      Delete
  3. only see blank image (image link broken)

    ReplyDelete
    Replies
    1. Phil,
      You need to click a button to get started.

      Delete
  4. Don't appease the AGW deniers anymore. Roy Spencer is blogging garbage that the deaths in Spain are caused by panicky people flooding the hospitals and causing chaos in the health care. Willis Eschenbach is marginalizing the positive impact of lockdown, etc, etc, etc.

    ReplyDelete
  5. Hi Nick,
    Thank you for doing this. I've discovered that Florida does not necessarily report deaths due Covid-91 as day of death. There can be a variable delay and it can be more than a week. I suspect this is the time it takes for the results of a post-mortem test to be returned and reported. Gauging the viscosity in testing program is difficult, but there continues to be anecdotal reports by medical staff having symptoms yet being denied tests due to not recognizing a contact with a "known to the government" infected person. It also appears that reports are being batched such that the March 31 reports for deaths and new infections show low numbers and April 1 shows high numbers.

    I have no idea whether reports from other states are as troubling in the sense of reliability as are ours in Florida, but one wonders how anyone can be comfortable with accuracy when the graphs are so suspect.

    Please keep posting this.

    john

    ReplyDelete
    Replies
    1. Thanks, John,
      Yes, I think everywhere the actual timing of reports is flaky, and sometimes a whole batch of deaths are reported on one day. Sometimes testing is weekend dependent etc.

      Delete
  6. Day 44, New Deaths - US: this total reads about 2300 right now. But if you look at each individual state's day 44 totals, they don't add up to 2300. What am I missing?

    ReplyDelete
    Replies
    1. " But if you look at each individual state's day 44 totals, they don't add up to 2300."
      The Johns Hopkins dataset gives a national total for the US, along with other countries. Then separately, it gives the county data, which I add to give the state totals. It's true that they don't then add to the posted national total. I think this is a question of the time of day cut off. The county numbers come in at varying times, and at some time they rule a line and say that is the national total, even though more counties may be posted later in the day (or evening), which presumably go into the next day's total.

      I did at one stage graph the sum of states total too. You could see the discrepancies, but the overall picture was similar.

      There is also a discrepancy arising from marine data, such as cases on cruise ships and Navy. These are in the national total, but not in states.

      Delete
    2. Thanks, Nick. I very much appreciate the work you do here.

      Delete
  7. Hi Nick,

    Here in Florida our data are even less reliable than I suspected. A person whose death is reported could have died more than a week earlier, the delay in posting owing to delay in getting the test run and the results reported. in addition there appear to be tens of thousands of samples either not yet tested or if tested results not yet reported. This sorry situation was gone over in some detail in the Tampa Times a few days ago.

    I suspect the mess is because the folks who are doing the work are overwhelmed in some part due to a significant reduction in force imposed by an earlier governor.

    This leaves us pretty much flying blind. We could be having a runaway here in St Petersburg and would be two weeks from discovering it.

    I thought a more useful metric might be number of tests in the queue by day and by county and the break in results by day and by county. Since one cannot now get tested without a doctor thinking you are likely infected, there might not be deafening noise in these numbers. There would also be the check of positive results as a function of total tests submitted. County results could be compared and unlikely percentages revealed.

    Tests are being done both privately and by the state. There seemed to be some resistance at the state level for revealing these numbers. I don't think this is because they are shy, but maybe because they know we'd discover that testing is a bigger disaster than we realized.

    At the same time, we've has 12 known cases in our postal zone for the last two weeks. I hope it's true.

    ReplyDelete
    Replies
    1. John,
      I hope that, with testing, even though they report late, they still record the right date. That should then show correctly in the graphs, eventually. I have a problem in that, for non-US, I plot differences in the cumulative totals, and sometimes they change the basis of counting which makes a sudden jump, which then shows up as a huge daily difference. France is such a case.

      For US I currently use the daily reports. They have now brought out a time series, like non-US, and I should switch to using that.

      In terms of metrics, I'm pretty limited here to what JH posts. They seem to be the most punctual.

      Yes, the diversity of testing authorities in the US makes it hard to count properly.

      Delete
    2. Hi Nick,
      My suggestion of an alternative metric was more a universal one, than of a shortcoming in what you are showing. Many thanks for continuing to provide this useful service.

      Delete
  8. Hi Nick. Have you looked at James’ model?
    http://julesandjames.blogspot.com/2020/04/blueskiesresearchorguk-model.html

    ReplyDelete
    Replies
    1. cce,
      Prompted by you, I re-read the paper. It's a fairly standard model (SEIR). I haven't had much interest in this modelling, because the SEIR aspect is meant to take account of the onset of herd immunity (non-linear with S), and I don't think that has any influence here. The main effect is of social distancing (infectivity R), and there is not much that can be done rigorously to quantify that. J&J model that with a step change in R following policy introduction (infectivity) and fit the parameters. That is pretty crude, but I don't see how to do better.

      Delete
  9. Climate scientists are way behind in compartmental modeling which is what the pandemic growth models use for SER, SEIR, etc. Same with fossil fuel resource models, which are perfectly suited for compartmental modeling yet rarely used. So it's surprising to see someone like James to suddenly declare him an expert on it. In any case, this is how it is applied to various disciplines based on published research:

    http://peakoilbarrel.com/the-oil-shock-model-and-compartmental-models/

    The issue with James is that he is getting overly worked up over something that is very hard to model predictively -- that of human decision-making. It's easy to show that quarantining will immediately flatten the curve but not so easy to estimate how well the citizens will follow instructions.

    ReplyDelete
  10. Hi Nick,
    Thank you for continuing to support and enhance this system.

    It appears that no one has yet done this, but a very informative method for conveying the spread might be a 3D perspective of an area of a country with the number of new infections shown as vertical columns located on the counties perhaps with color moving from pink to black as a function of death rate. These could then be animated with a frame per day. A benefit of this sort of display would be the greater clarity in representing spread across a region.

    best regards,

    john

    ReplyDelete
    Replies
    1. John,
      Thanks. I haven't used 3D representation much, but it would probably work well with WebGL. I'll look into it.

      I'm about to put up a new post describing enhancements. You'll see there is now a blue line showing a 7-day smooth (there is a lot of weekend effect) and a green line which is a measure of slope. It is worked out as doubling (or halving) time - basically the slope of the log, scaled. So a value of 1 means that it would double once a week at that rate; 2 means twice a week. -1 means halving once a week. I'm trying to identify down stages.

      Delete