Search This Blog

Friday, March 20, 2020

COVID-19 Data would matter...

... if we had decent data and knew what to do with it.

I'm not an epidemiologist so I'm not going to write a lot about COVID-19 on this blog but there are some interesting data lessons.

I'm also going to post only one link (edit: see the end of this post). There are many articles about COVID-19 that I could reference, but I'm not going to. Instead, I'm going to give you a single data source ( and comment on "things we're hearing about" instead of citing specific sources.

Lesson 1: Simple calculations don't work if you're using the wrong data.

As of this morning, the site above shows 247,400 total cases and 10,067 total deaths worldwide. For the moment, let's assume that both of those numbers are accurate. What's the fatality rate?

The simple calculation that many people are doing is 10,067/247,400 = 4.1%

The arithmetic is correct, but the number doesn't answer the question at all. Most of the 247,400 cases are still in-process. We don't know how they will end.

To answer the fatality rate, we need to look at cases that are resolved. That restricts us to cases that are recovered (86,037) or deceased (10,067). That's 96,104 cases and, tragically, 10.5% of them had ended in death.

But that's still not the fatality rate...

Lesson 2: Bad data can be worse the no data

Let's look at all three of those numbers:
  • 247,400 total cases
  • 86,037 recovered
  • 10,067 deaths
None of them are correct. NONE. Let's take them in order.

First, Total Cases: There have been multiple reports of people being denied tests because they weren't ill enough or they didn't show the right symptoms. However, we've also been told that many who get COVID-19 will show mild or no symptoms. I've read claims that more than half of those who get COVID-19 will be completely asymptomatic. 

Therefore, the total number of cases could be double the reported number. Tests are becoming more widely available, but they're still being reserved for people who actually have symptoms. We would need to test large samples of asymptomatic people in order to properly estimate total cases. This might eventually happen for research purposes, but it's not going to happen in the midst of the crisis.

The total cases number also has interpretive problems. As of yesterday, my county had four confirmed cases. One was a woman in her 50's with no known travel or contact with infected people (what they're calling "community spread"). The other three cases were all in one family that traveled together. To eventually compute fatality rates, those are four separate cases. In terms of contagion, I would say that they are just two cases. I think that's an important distinction but I doubt that we'll ever have solid data that allows us that distinction. 

Second, Recovered: The inaccuracy of total cases makes this number wrong too. If we never knew that you had it, then we'll never count you as recovered. 

But it's worse than that. One news story said that patients weren't cleared until they tested negative on two tests administered 24 hours apart. Remember the shortage of tests? How many people are currently recovered but not officially recovered? 

The source above shows zero recovered cases in either California or Washington. I've been watching this site for more than two weeks and not everyone who was active two weeks ago has died. By now they should be recovered so I don't know why there are no reported recoveries. It could be the issue in the prior paragraph or something else.

Third, Deaths: As with recoveries, the total cases number makes this number wrong too. If we never knew that you had COVID-19, then we won't attribute your death to it. Even if we know that you had COVID-19, death can be complicated. 

My 90-year old mother passed away last year 10 days after a bad fall. Did she die from "falling" or from "complications of a fall" or ...? Several months earlier, she nearly died from sepsis. But the sepsis was the result of an untreated urinary tract infection. The recurring urinary tract infections were a result of other medical complications. If she had died during the sepsis incident, what would the real cause be?

Consider someone who is completely healthy. Then they test positive for COVID-19 and they die fairly soon. It would be pretty clear that COVID-19 killed them. On the other hand, if someone has a myriad of health problems and COVID-19 becomes the tipping point, then maybe COVID-19 "sort-of" killed them.

When I say that "bad data can be worse than no data", it's not because the data shouldn't be collected. It's because people don't understand it, they make simple calculations with it, and then they push for public policy and private decisions based on incorrect numbers. However, we do want the data...

Lesson 3: Keep collecting the data and keep a level of skepticism

There are other issues that could be raised with all of these numbers. I touched on age and health but didn't dive deeply into those. I've also ignored differing data methods in different countries.

Still, we'll eventually have better data. It will never be perfect (no useful data is) but it will get better. I'm a big fan of statistical literacy and public access to raw data but "armchair" statisticians need to be careful about their own number crunching. This is a situation where you should listen to the experts but focus on
a) admitted uncertainties in their calculations - like margins of error - and 
b) disagreements among them.

Admitted uncertainties and disagreements from the experts will give you a good idea of how uncertain their conclusions are.

NOTE: The day after I wrote this, I ran across a great article on the data aspect of COVID-19 so I'm posting it here and not re-writing my post.