Search This Blog

Tuesday, November 13, 2018

If you doubt that data matters...

What was Amazon looking for with their widely publicized search for an HQ2 location?

One could answer by looking at their list: proximity to international airport, mass transit, regional population, ...

However, some are saying that Amazon was actually looking for DATA. Here are two articles:

Neither article is very long so you should read them yourself, but I'll provide a couple of interesting quotes:

From Bloomberg "But it kept hundreds of millions of dollars worth of free information from the cities to create the biggest corporate site location database in the world, according to Richard Florida, an urban studies professor at the University of Toronto."

From Reason: "Amazon is now privy to information about where different municipalities are going to direct investment and infrastructure in the near future. The company can exploit this information. ...  Maybe Amazon just happens to purchase a new fulfillment center right around a soon-to-be-developed locale which would see increased demand for Amazon products. Maybe it simply decides to squat on land for a while, knowing that it will soon be smack dab in a hive of activity. A new brick-and-mortar store? They'll have the option. Or maybe knowing where news roads will be built will make it easier for Amazon to plan transit routes. There's profit to be extracted from this data that you and I could not even conceive."

Whether Amazon played a game just to obtain data or the data is a side benefit of an honest search, it's clear that data matters.

By the way - while not the same level and volume of data that Amazon got, ALL of us have access to a great deal of government data for free. Check out IPUMS.

Tuesday, October 16, 2018

Demographics is Destiny?

I just came across this link on cities and "peak Millennial" from posts by Digging Data.

We seem to like making broad generalizations when comparing generations. I usually cringe when I hear them because there's rarely much data behind the statements.

In particular, I'm getting tired of hearing how Millenials are so different from any prior generation. I don't see much of it. They show up in my classes. Some are smarter and some not so much. Some are lazy. Some are industrious. Some are liberal, some are conservative, and some don't even think about politics. I could go on but, essentially, they aren't that different from the students who came before them (or the ones before that or the ones before ...).

Still, there is data to support some generalizations about them. As post-college young adults, they - on average - seem more drawn to urban environments. Data supports that. However, some then predicted that their generation would completely revive the urban landscape for decades.

Maybe not. The article linked above says that, even though Millenials are marrying and having children later in life (which is data supported), their housing and community preferences for the married-with-children stage of life might not be all that different from their predecessors:

"But with a view of history and demographics, it’s not difficult to imagine a future where that love [of city life] fades with the years, and a different sort of life starts to seem appealing. Millennials have shown a tendency to delay marriage and children, and thus occupy their studio apartments in urban cores for longer. But that’s no reason not to be concerned that school quality and more space might factor into their choices as they age."

Saturday, September 29, 2018

K.I.S.S. in Graph Design

I ran across this post from Data to Viz. My title is misleading because their post is much more than a call for simplicity.

However, it's intriguing that their solution to many common problems comes down to "stop being fancy and make it a bar chart".

Friday, September 28, 2018

Is There Still Value in Political Polling?

A colleague sent me a link to Why Polling Can Be So Hard by Nate Cohen.

It's interesting and not very long so you should read it.

I'll summarize one of his major points: Voter registration data is important to pollsters but different states store different data for each voter. For example, Wisconsin is known for having minimal data. Of course, Wisconsin was pivotal in the 2016 elections.

However, I found the comments just as interesting as the article. They are largely negative. Some people refuse to participate in polls or intentionally lie. Again, you should read the comments yourself, but they don't look good for the future of polling. I recognized myself in the article and the comments.
  • I live in Wisconsin. I don't want my voter registration to have ANY data about me beyond the minimum legal need. Information privacy matters and I don't care if our minimal data makes pollsters' jobs harder.
  • I don't answer a call when I don't already know the caller. If you're not in my contacts and your call is important then you can leave a message.
  • I'm suspicious that excessive polling and reporting on polls is no longer predicting what will happen as much as it's changing what will happen. Whether it's the band-wagon effect or the Hawthorne effect, I think it's a problem.
  • Speaking of the Hawthorne effect, campaigns now use their own extensive polling to craft their message. Polling doesn't just change voter behavior, it changes politician behavior.
Perhaps polling is a victim of its own success. When it was new and not overly intrusive, it provided useful information (value). Economic theory says that value attracts new participants and will continue to attract participants until there is no longer any value available. In perfect competition, there are zero long-term profits. 

Early polling methodology was pretty standard and easy to replicate - perfect competition. To break out of perfect competition, organizations need something to differentiate their output. If a poll is supposed to accurately predict the vote, how can one poll differentiate itself from the others?
  • Accuracy? There should be value in being more accurate but you need to come up with better, non-standard methodologies. You also need access to different or better data. Then it's still hard to show that you're more accurate.
  • Speed? Is there value in being the first to publish results? If you're the first by days then there could be value. Maybe even being first by hours. But minutes?
  • Frequency? If one group publishes a weekly poll, then you might gain value by publishing a daily poll. But how far can this go?
I think that all three of these approaches have been tried but they lead to the problems that voters complain about: information privacy and bombardment with polls.

The pessimist in me fears that polling and reporting on polling have become so ubiquitous that we're nearing the point of zero value. Worse, we might have reached negative value and polls are doing more harm than good.

A less pessimistic view suspects that the value of polling isn't gone (or negative): it's just changed. Maybe polls no longer tell us what we think they are. That creates new opportunities for the Nate Cohen's and FiveThirtyEight's of the world to find that new value.