Now that the presidential election is a couple of weeks in our rear window, I thought it a good time to comment on some data aspects of the election. Driving to work the morning after the election I listened to national program announce that the election results signaled "the death of data" because the polls had been so terribly wrong. Others have written about the data's failure.
That's simply not true. First, some polls did a pretty good job predicting the outcome. Whether or not those polls should have been considered outliers will be debated in the polling community for many years.
However, I want to focus on another issues. For this discussion, let's concede that "most of the polls were wrong". That's still not the death of data. At worst, it's the death of survey data.
For decades, social scientists have known that self-reported activities often don't match behaviors. Perhaps the most famous example is the Tucson Garbage Project. Political pollsters, and those who report on them, should always keep in mind that they're merely getting people's statements about what they are going to do. They are not getting any information on actual actions. In other words, they don't have empirical data.
How would you use empirical data to predict a presidential outcome? Just ask Alan Lichtman. Lichtman's "13 Keys to the White House" model has correctly predicted every presidential election since 1984 and it doesn't use polling data. Instead, it was created by looking at historical voting data (in other words, empirical data) from 1860 through 1980. There are 13 true/false indicators. If enough of them come up "against", then it predicts the incumbent party in the White House will lose the popular vote to the challenging party. It says nothing about the electoral college.
That distinction between popular vote and electoral college is very important. Lichtmans' model correctly predicted Gore's popular vote win in 2000. It did not predict Bush's electoral college victory because it doesn't predict anything about the electoral college. It can't be considered either right or wrong on the electoral college.
Sometimes, the 13 Keys are clear fairly early the election cycle. At least once, Lichtman made his prediction two years ahead of the election. This time around, his prediction was later in the cycle and he predicted a Trump victory. After the election, he was hailed by some as being one of just a few who predicted correctly.
Now if you've been paying attention up to this point, you might say "Wait a minute. You said that Lichtman predicts the popular vote, not the electoral college and Clinton won the popular vote. Therefore he was wrong!".
On the surface it would appear that way. However, the final Key that turned "against" the incumbent party involved third party candidates. A significant third party vote was a signal against the incumbents. When Lichtman made his prediction, it looked very much like Gary Johnson would be a significant third party influence. In the end though, all the third party votes combined were under 5%. In other words, this Key actually went in favor of the incumbent party and, sure enough, Clinton won the popular vote.
Admittedly, calling this particular Key true or false involved polling data on the third parties, but the Key itself was developed with empirical data and the 13 empirically developed Keys once again predicted correctly. The worst you can say about Lichtman's prediction is the he predicted a Key wrong but the Keys themselves predicted correctly.
That's far from "the death of data".