Monday, May 11, 2020

Plotting COVID19 daily data - finding peak rate and halving time.

This post is an update to my earlier post showing daily new cases and new deaths of Covid19 data from Johns Hopkins. The details are as for that post, but I want to explain two additions to the plots, which I think contain useful data. But first I'll show the updated plot (same as shown on the April 1 post; you need to click a radio button to get started). The explanation will follow:

The new items are the blue and green lines. The blue lines are just a 7-day moving average of the daily data. The data needs smoothing, and the 7-day average is chosen because the reporting of both cases and deaths often has a strong weekly influence (weekends). Following the blue lines often seems to tell something about whether the daily values are going up or down, even where the daily data is too noisy to tell.

The green lines, with the associated green axis on the right, are a more ambitious diagnostic. Our ABC has been promoting a "one number you need to watch" which is a little bit like the R reproduction number of epidemiology. They plot the ratio of current value to the average of the last five, and say that progress is being made when this number is less than 1.

I think that is useful, but far too noisy. It is related to the slope of the logarithm, and I would like to calculate that more scientifically. The particular reason for log is that the curves usually start with an exponential rise, and often end with an exponential decay. These will be linear for the logs, and can be well estimated by regression.

You can take it, then, that the green line is that estimate of slope of log, but I have scaled it to fit another interpretation, which is the number of doublings per week, or the reciprocal of the time to doubling. So a value of 1 means doubling once a week; 2 means doubling twice (4x). Negative values are halving; -1 means halving once a week, which would be a very good situation. A value of -0.5 means it halves once every two weeks. A significant place to watch is where it crosses from positive (increasing) to negative.

I'll say a little about the numerics of this. Clearly smoothing is required, which means that each point represents information from a stretch of data. My basic technique for a smoothed derivative is LOESS - a regression weighted close to each point. But before differentiating the logarithm, it is necessary to smooth in the linear domain. The reason is that a main kind of error is in the day to which the data is ascribed. Error here should have only a moderate effect, because smoothing will conserve the data and diminish the effect of the displacement. But smoothing the log does not conserve. A worst, and rather common case, is where no data is recorded at all, so the log would be infinite. Of course it is also possible in the tails that there really were no cases/deaths that day.

A preliminary linear smoothing will mitigate that, and I use again a seven day moving average, to attenuate the weekend effect. I then take the logarithm (base 2) and do the weighted linear regression on that, with the trend coefficient treated as the derivative. The weighting is a binomial distribution half-width 9.

You'll see that the green curves start out generally positive. In fact they don't really mean much until the red data shows several/day. I have tried to remove meaningless derivative data, but it isn't very exact, so I err on the side of inclusion. Data often rises to an early peak, where the green line crosses the zero horizontal, and then may seem to bump along on a plateau, where it oscillates. If the green line remains decidedly negative, then one can say that the epidemic is receding, and the average value represents the halvings/week.

To give an example of the interpretation, here is the US death toll data data:

At About March 25, the blue curve (smooth) was curving steadily upward, and the green curve was at about 1, which means doubling once a week. Then the blue curve shows a peak at about April 14. That is where the green curve crosses zero, a stationary point, neither up nor down. Later in April, the blue has a slow bumpy decline, and the green curve shows a value of about 0.9, which means halving about every 10 weeks. The green curve looks better at the end, but unfortunately this is the most uncertain part, since the derivative is estimated with one sided (old) data only.

Here is the plot of new cases for Germany:

Again there is a near exponential increase at the beginning, with a doubling more than weekly (green>1). Again a peak in the blue at about March 28th, where the green crosses zero. But then the blue goes decidedly downward, and the green settles to about 0.7, which means halving about every two weeks. The daily values have a bump at the end, which brings the green up to zero, suggesting the decline might have ended. Again, unfortunately, this is the least reliable part of the curve.


  1. Your first pic doesn't work for me. I get:

    <Message>Access Denied</Message>

    (I'm probably behind a VPN, FWIW)

    1. William,
      The first pic is an active Javascript; it may be that your security settings don't allow it, although the message seems to be from the HTTP request level. It is the same graphic as shown on the previous post here. It just allows you to call a whole lot of plots like the US and German ones shown.

    2. Again FWIW I get the same problem on my phone, which has no VPN. And the same on your previous post, which other people can see, so it is just me :-(

  2. William you have updated Java i take it.
    I find the graphs change after a few nanoseconds.
    One aspect that stands out is Brazil and Russia have figures that show they are a long way from getting on top of this issue.

    john aussie

  3. display works fine for me.

    I have trouble appreciating the significance of rate of doubling at least when it starts to decrease. doubling rate seems susceptible to the same constraint that chain letters suffer, running out of opportunity. Notwithstanding, the most interesting sites include this construction, here and at the New York Times.

    Thanks, Nick, for keeping after this.

    1. John,
      Exponential increase is what you expect unless something intervenes to stop it, and that seems to be the observed pattern. Eventually as you say, if immunity develops, the virus will run out of susceptibles, but we are seeing a stop to growth long before that could be the cause. Basically people's behaviour changes (by law or spontaneously) to lower the reproduction number R. If that is below 1, infection will recede. If it is below 1 and stable, the attenuation will be exponential. The green curve is trying to trace that.

  4. Hi Nick,
    Thanks for keeping this going. I see you looked at identifying weekends with a color change on the bars, but apparently decided it didn't signify. In our county in Florida, the case rate has finally gone exponential, and the state's as well. The daily reports don't appear to be affected by weekend posting anymore.

    BTW, what are the blue dashes at the bottom of you charts?


    1. Hi John,
      The blue dashes mark weekends. I've de-emphasised them because they don't necessarily line up with the dips. I think in different places the days affected are different, depending on lags in reporting.

      Yes, there is a lot of exponential rise lately. It takes a while to turn it around.