Data Matters: Difficulty with data

Tuesday, April 8, 2014

Difficulty with data

In most introductory statistics classes, data magically appears (in the correct electronic format!) with no discussion about it's source or validity. Unintentional or not, this gives students the idea that they should accept data at face value. I frequently tell my students that number-crunching is simple compared to getting good data and that most statistical disagreements are about the data rather than the calculations.

They listen and some of them sort of get it, but the steady stream of magic data they receive can override anything I say. Therefore, I've started scattering examples of imperfect data throughout the class. By "imperfect", I don't mean "mistake". Instead, I want to demonstrate how hard some things are to measure or classify and, therefore, that no data set is perfect (at least not any interesting data set).

The 2000 Presidential election was a good example of how difficult it can be to simply determine whether a vote is for Candidate A, Candidate B, or no one but that's ancient history for today's students.

Since many of my students are athletes, I've covered controversial sports calls. There are reams of historical sports data available. We rarely question that data in spite of the fact that we often argue during the game while the data is being created.

"That's a charge, not a foul!" "What do you mean ball? That was clearly a strike!!"

Regardless of the fans' preferences or the rule book's definitions, in sports the official data is whatever call the officials make at the time.

Charge or a foul? It's whatever the official says it is.

Ball or a strike? It's whatever the official says it is.

Packer fans won't forget this call for many, many years. Nearly everyone said that it was not a Seattle touchdown. Unfortunately for the Packers, the operational definition of a touchdown has nothing to do with what "nearly everyone" says. A touchdown is whatever the officials say it is and NFL data will classify this as a touchdown and the game as a Packer loss forever.

Data Matters

Search This Blog

Tuesday, April 8, 2014

Difficulty with data

No comments:

Post a Comment