Data Matters: Wow. How did so many of us miss this?

I've tried to stick to my plan to write only one post about COVID-19 but it's been hard because there are so many data issues to talk about. As an aside - I think that StatNews is doing a pretty good job covering things.

But then I ran across this Wired story by Ferris Jabr. You should read it yourself but I'll be nice and quote the main point:

"Both newspapers and scientific journals frequently state three facts about the Spanish flu: it infected 500 million people (nearly one-third of the world population at the time); it killed between 50 and 100 million people; and it had a case fatality rate of 2.5 percent. This is not mathematically possible. Once a pandemic is over and all the numbers are tallied, its case fatality rate is simply the total number of deaths divided by the total number of recorded cases. Each country and city will have its own CFR, but it’s also common to calculate a global average. If the Spanish flu infected 500 million and killed 50 to 100 million, the global CFR was 10 to 20 percent. If the fatality rate was in fact 2.5 percent, and if 500 million were infected, then the death toll was 12.5 million. There were 1.8 billion people in 1918. To make 50 million deaths compatible with a 2.5 percent CFR would require at least two billion infections—more than the number of people that existed at the time."

Wow. How did we all miss this? Are we so innumerate that we didn't see 500 million and 50 million and immediately say "Hey, that's 10% not 2.5%"? Shame on us.

Beyond pointing out that none of us are paying careful attention, Jabr digs into the history behind these numbers and uncovers a lot of uncertainty about the Spanish Flu.

So here's where we stand.

COVID-19. My original post is still correct. The data would matter greatly if we had it. But we don't. It's getting better but it's still inconsistent and unclear and we're still facing extensive uncertainly.
Spanish flu. This data would also matter greatly if we had it. But we don't have good data and, at this point in history, we never will.

The take-away? We need to get more comfortable with margins of error and ranges of estimates. Data literacy should emphasize the need to look beyond simple point estimates.

Along those lines, I've just started a simulation unit in one of my classes. Simulation is a great tool for dealing with high levels of uncertainty. If you want to see my opening lesson, it's right here:

Data Matters

Search This Blog

Saturday, April 25, 2020

Wow. How did so many of us miss this?

No comments:

Post a Comment