Search This Blog

Wednesday, August 14, 2019

Qualitative Data is Data Too (Part 1)

To data professionals, "data" implies some sort of structure with defined records and variables. In this context, quantitative variables are numbers (such as price, income, age) and qualitative variables are non-numbers (color, country, gender, payment method). Depending on the lingo in your particular world, qualitative data could also be called nominal data or categorical data but it's still structured data.

However, some fields use the term "qualitative data" differently. Their text or observations aren't easily codified into records and variables but they still examine large segments of data to discern patterns and themes. Philosophers might read Plato, Hobbes, Smith, and Marx to generate theories and find text passages to support, or refute, those theories.

Modern technology can be used to bridge these approaches (see the digital humanities). There are tools available to process online comments or customer reviews and determine how many are "positive" but some questions and some data simply don't lend themselves to any sort of structured data methods.

Let's use the Bible as an example. One could ask "How many times is the word money in the Bible?" Since the Bible wasn't written in English, we'd first have to agree on which translation we're going to use. Then it wouldn't be difficult to process every single word and count the number of times "money" occurs. Of course, this has already been done. I suppose it's somewhat interesting that the King James version uses "money" 140 times, but I'm not sure that this mini-fact is particularly informative.

There are other words for "money". The Bible might mention payment, wages, debt, inheritance, silver, gold, ...  This source tells us that there are over 2300 verses in Bible that mention "money, wealth, or possessions".

But are "possessions" and "money" really the same thing? This analysis requires another step. As before, we'd have to agree on which translation to use but we'd also have to agree on a list of synonyms for "money". We might even come up with an ordinal scale for whether a word is a true equivalent or simply related. Then, as with the previous question, a program could process every word in the text and count how many times "money" and each synonym occurred.

In both of these examples, it's possible to process the Biblical text as more or less traditional data but the results aren't all that useful. A much more interesting question is "What does the Bible teach about money?" and neither example answers that. Many have attempted to answer that question, but none of them were able to do so with traditional statistical or data analysis tools. Sure, the first two examples could be modified to "flag" text segments that might be useful in answering the question, but (so far*) a person still needs to read through those segments and evaluate them.

Data professionals often aren't comfortable with this type of qualitative analysis, It's too fuzzy or too touchy feely. However, it's likely that the people data professionals report to are using all kinds of "fuzzy" analysis so it might be wise to study some fields where qualitative analysis is commonly used.

In Part 2, I'll tell a story of qualitative analysis and decision making. In Part 3, I'll do my own qualitative analysis.

*AI tools are advancing rapidly. If you know of tools that can do this without any human intervention, then please tell me about them.

Wednesday, July 31, 2019

A new article

Twenty years ago, I wrote about the role of spreadsheet modeling in Operations Research/Management Science (OR/MS) education. It got a fair amount of attention. This month, I'm taking on another potential controversy: the interplay of "analytics" and "OR/MS".

Tuesday, March 26, 2019

Small Data (really small) and Expectation Management

Years ago, my wife and I went to see the movie Romancing the Stone. We were visiting a small town and it was the only movie playing. It was new in theaters and we didn't know anything about it.

We loved it. We told lots of people how good it was.

We loved it so much that we went to see it again about a month later. It was still good, but it wasn't great. We realized that we had zero expectations the first time - we were just looking for something to do - and high expectations the second time.

Years later, our friends were raving about My Big Fat Greek Wedding. They insisted that we see it. Really insisted. We were told that we needed to see it.

We finally went and it was a disappointment. It wasn't a bad movie, but no movie could live up the hyped reviews we heard.

Over the years we've referred back to those movies when we find ourselves reacting differently than expected. We recently went to a restaurant that someone close to us insisted that we try. It was disappointing. Then one of us said "I guess this was a Big Fat Greek Wedding instead of Romancing The Stone". It was actually a nice restaurant but it couldn't possibly live up to the expectations we were given

In related news, I just finished reading The Undoing Project by Michael Lewis. I learned some of this material in graduate school but, as usual, Lewis does a great job telling the story. The discoveries of Kahneman and Tversky explain our experience with the movies and the restaurant.

Small data, such as word-of-mouth reviews from a few friends, are poor statistical samples but people still give it significant weight in forming judgments. With social media, small data can get repeated and amplified so that it looks like much larger data and, again, people will give it significant weight in forming judgments.

We all want good reviews for our endeavors, but we should also want accurate reviews. What if I do good work, but my good work merely meets your expectations (or even falls slightly short). I'd rather be judged against an accurate expectation than an inflated one. However, in the world of small data where people aren't completely rational, I'm not sure how to make that happen.

Tuesday, March 12, 2019