Data Matters

Tuesday, October 16, 2018

Demographics is Destiny?

I just came across this link on cities and "peak Millennial" from posts by Digging Data.

We seem to like making broad generalizations when comparing generations. I usually cringe when I hear them because there's rarely much data behind the statements.

In particular, I'm getting tired of hearing how Millenials are so different from any prior generation. I don't see much of it. They show up in my classes. Some are smarter and some not so much. Some are lazy. Some are industrious. Some are liberal, some are conservative, and some don't even think about politics. I could go on but, essentially, they aren't that different from the students who came before them (or the ones before that or the ones before ...).

Still, there is data to support some generalizations about them. As post-college young adults, they - on average - seem more drawn to urban environments. Data supports that. However, some then predicted that their generation would completely revive the urban landscape for decades.

Maybe not. The article linked above says that, even though Millenials are marrying and having children later in life (which is data supported), their housing and community preferences for the married-with-children stage of life might not be all that different from their predecessors:

"But with a view of history and demographics, it’s not difficult to imagine a future where that love [of city life] fades with the years, and a different sort of life starts to seem appealing. Millennials have shown a tendency to delay marriage and children, and thus occupy their studio apartments in urban cores for longer. But that’s no reason not to be concerned that school quality and more space might factor into their choices as they age."

Saturday, September 29, 2018

K.I.S.S. in Graph Design

I ran across this post from Data to Viz. My title is misleading because their post is much more than a call for simplicity.

However, it's intriguing that their solution to many common problems comes down to "stop being fancy and make it a bar chart".

Friday, September 28, 2018

Is There Still Value in Political Polling?

A colleague sent me a link to Why Polling Can Be So Hard by Nate Cohen.

It's interesting and not very long so you should read it.

I'll summarize one of his major points: Voter registration data is important to pollsters but different states store different data for each voter. For example, Wisconsin is known for having minimal data. Of course, Wisconsin was pivotal in the 2016 elections.

However, I found the comments just as interesting as the article. They are largely negative. Some people refuse to participate in polls or intentionally lie. Again, you should read the comments yourself, but they don't look good for the future of polling. I recognized myself in the article and the comments.

I live in Wisconsin. I don't want my voter registration to have ANY data about me beyond the minimum legal need. Information privacy matters and I don't care if our minimal data makes pollsters' jobs harder.

I don't answer a call when I don't already know the caller. If you're not in my contacts and your call is important then you can leave a message.

I'm suspicious that excessive polling and reporting on polls is no longer predicting what will happen as much as it's changing what will happen. Whether it's the band-wagon effect or the Hawthorne effect, I think it's a problem.

Speaking of the Hawthorne effect, campaigns now use their own extensive polling to craft their message. Polling doesn't just change voter behavior, it changes politician behavior.

Perhaps polling is a victim of its own success. When it was new and not overly intrusive, it provided useful information (value). Economic theory says that value attracts new participants and will continue to attract participants until there is no longer any value available. In perfect competition, there are zero long-term profits.

Early polling methodology was pretty standard and easy to replicate - perfect competition. To break out of perfect competition, organizations need something to differentiate their output. If a poll is supposed to accurately predict the vote, how can one poll differentiate itself from the others?

Accuracy? There should be value in being more accurate but you need to come up with better, non-standard methodologies. You also need access to different or better data. Then it's still hard to show that you're more accurate.
Speed? Is there value in being the first to publish results? If you're the first by days then there could be value. Maybe even being first by hours. But minutes?
Frequency? If one group publishes a weekly poll, then you might gain value by publishing a daily poll. But how far can this go?

I think that all three of these approaches have been tried but they lead to the problems that voters complain about: information privacy and bombardment with polls.

The pessimist in me fears that polling and reporting on polling have become so ubiquitous that we're nearing the point of zero value. Worse, we might have reached negative value and polls are doing more harm than good.

A less pessimistic view suspects that the value of polling isn't gone (or negative): it's just changed. Maybe polls no longer tell us what we think they are. That creates new opportunities for the Nate Cohen's and FiveThirtyEight's of the world to find that new value.

Wednesday, August 29, 2018

Classic Probability Applied (or Not)

My wife and I are watching the America's Got Talent results show. Twelve acts performed last night and the audience voted. Based on those votes (sort of), seven acts get to stay. If we assume equally likely outcomes, every act as a 7/12 chance of going forward.

The first thing they do is pull aside "three acts in danger" for the Dunkin' Save. Out of this group, the audience re-votes to save one. Of the two left, the judges vote to save one. If the judge vote is a tie, then the audience vote from the previous night determines who stays. Either way, two of these three acts get to stay. If we assume equally likely outcomes in that group, then they have a 2/3 chance of going forward.

Since only seven acts go forward, there are five slots for the remaining nine acts. In other words, they have a 5/9 change of going forward.

Let's recap. Before the results show starts, each act as a 7/12 ≈ 0.58 chance of staying. After this first sort, each act is in one of two situations:
* Dunkin' Save where they have a 2/3 ≈ 0.67 chance of staying.
* Still on stage where they have a 5/9 ≈ 0.56 chance of staying.

It appears that acts are unhappy to be in the Dunkin' Save group, but the probabilities suggest otherwise. Which group would you rather be in? Don't answer right away. Think about it.
.
.
.
.
.
.
Think a little more.
.
.
.
.
.
.
Ok. Which group? Why?

Does the equally likely outcomes assumption required by classical probability makes any sense? We know that they're not really equally likely because the acts going forward are not randomly selected. But how does this play out.

All twelve acts are ordered on the the viewers' votes and the Dunkin' Save acts are the 6th, 7th, and 8th place. Therefore, if you're in this group you know that you're "on the bubble" with the audience. The Save acts could be grouped tightly based on votes. A 2/3 probability night be reasonable and it's a little better than what you had when the show started.

What about the other nine acts? Now you know that you're either in the top five votes or you're at the end of the pack. There's no middle ground left. Would you feel better in this group? If you think you did a great job, then you're really confident. If you think you blew your performance, then you think you're done. The 5/9 probability is probably useless in your mind.

Therefore, the Dunkin' Save group might be neither bad nor good. It's just different. Once the Save group is set aside, the remaining acts probably have a good idea where they stand while the Save group is still in suspense.

Note: There is a potential problem in my use of "probability". Consider a fair coin. If I'm about to flip the coin, the probability of a head is 50%. What if I've already flipped the coin but it's hidden under the couch and no one knows what side us facing up? What's the probability that it's a head? Some would say that it's still 50%. Others would say that the coin flip is already done and, therefore, the probability of a head is either 0 or 1. Our lack of knowledge regarding the outcome doesn't change the fact that it's already done.

If you interpret probabilities the second way, then that could change your preference for being in or out of the Dunkin' Save group. Being put in the Save group puts you into an uncertain outcome where probabilities matter. Being out of the Save group means that your outcome has already been determined (it's 0 or 1) even if you don't know what it is yet.

Search This Blog