Feeling a little lonely? Want to be in a relationship? Then join Match.com because you are 3X more likely to find a relationship if you do. Because, clearly, they are a much better program than any other dating program out there. Click on the image to watch their commercial and see for yourself.
But just hearing that “3X more” phrase makes me think:
- Does their system have three times as many people signed up?
- Are they comparing themselves to people who are using a crappy dating service?
- Are they comparing themselves to people who aren’t using a dating service at all?
- Have they considered that people who sign up for a service are more serious about finding a relationship?
The ad implies a causal link but there are so many correlational links that all I can do is completely discount the commercial.
I’m a big fan of dating services but not if they’re going to mess with statistics. Just as nobody puts baby in a corner, NOBODY messes with statistics.
It’s true. I read it in today’s newspaper. The article clearly stated that people are more likely to drown on hot days. Which obviously means that:
- People forget how to swim on hot days
- It’s more difficult to swim on hot days
- It’s harder to hold your breath on hot days
- People are less buoyant on hot days
- No one drowns on cold days
Of course, the article did fail to mention a few things. That people go in the water more often on hot days. That more people go in the water on hot days. Perhaps even that people are less likely to wear life jackets on hot days because “i’m just going in for a second.”
The media is fabulous at taking correlational relationships and presenting them as causation. Don’t be fooled. And wear a life jacket.
It’s probably safe to assume that every single research report you’ve ever written has been followed up with a single word – why.
Why did this result happen? Why did people give this answer? Why is this the winning option?
It’s easy enough to read through any report and be faced with lots of interesting questions. I can usually think of three or four contradictory answers for every question coming out of a report. And I can usually make any of them match up with the data. Data in, preferred answer out. Want an insight? I’ll make one up for you.
But which why is the right why? The problem is simple. Market research is rarely designed to answer the question why. Market research is usually designed to measure what. Surveys tell us what. Focus groups tell us what. Social media research tells you what. You see, even when you outright ask people to tell you why, you’re usually getting a why that has been massively skewed by deceiving memories and a variety of life experience. That’s not why.
Most market research is only designed to discover correlations which, I shouldn’t have to tell you, aren’t causation. Just because someone says they buy six cans of beans each week and they have six kids and they tell us they buy six cans because they have six kids does not mean that they buy six cans of beans because they have six kids.
The only way to measure why is with test control research. In the strictest sense, you must randomly create families with random numbers of random children. Randomly assign people to random families such that some of the families are two kid families while others are six or three kid families. Now you’ve got the correct conditions to observe whether families with more kids do indeed buy more beans. And then you’ll legitimately be able to say that having six kids causes families to buy six cans.
So until we can randomly assign people to families, to product offerings, to price differences, to political candidates, and more, we’re stuck with correlational results.
So keep on guessing why.
- Gamification of Surveys In The Real World #MRX (lovestats.wordpress.com)
- Does your market research supplier offer a conjoint product? I hope not. #MRX (lovestats.wordpress.com)
- In defense of research participants #MRX (lovestats.wordpress.com)
- A Cynic Ponders at the AMA Research Summit #AMAresearch #MRX (lovestats.wordpress.com)
Welcome to Really Simple Statistics (RSS). There are lots of places online where you can ponder over the minute details of complicated equations but very few places that make statistics understandable to everyone. I won’t explain exceptions to the rule or special cases here. Let’s just get comfortable with the fundamentals.
** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **
Oh my, what a gorgeous heteroscedasticity you have! You mean other than a really cool eight syllable statistics word that you can show off with in front of friends?
This long and lovely word comes into play when you’re dealing with pairs of variables – perhaps height and weight, or grades and time spent studying, or voting behaviour and time spent reading the political section of the paper. It has mean and nasty effects on correlation coefficients and regression models so pay attention!
Specifically, it refers to the distribution of numbers for one variable in relation to the distribution of numbers for another variable. Homoscedasticity refers to a spread that is very even and regular no matter which section of the chart you look at. This is what you see in the first chart.
- We all know that shorter people weigh less and taller people weigh more. But, what if most 5 foot tall women
- weigh between 90 and 100 pounds while most 6 foot tall women weigh between 130 and 170
points. The range of 10 pounds at 5 feet is very different from the range of 40 pounds at 6 feet. That’s a lot of heterobebijicty!
- We also know that people who study a lot tend to get higher grades. Now, what if people who studied 1 hour per week got a D while people who studied 2 hours per week got a C, B, or A? Once again, 1 hour resulted in one possible grade while 2 hours resulted in three possible grades. That’s even more heteroihjusdfgicty.
- And, what if jogging for 30 minutes burns 200 to 250 calories while jogging for 60 minutes burns 400 to 500 calories. Half an hour resulted in a range of 50 calories while a full hour resulted in a range of…. also 50 calories per half hour. That’s a lot of…. homoscedasticity!
So the next time you’re wondering why your correlation coefficient or regression equation isn’t as nice as what you had hoped for, have at look for heteroscedasticity. And make it a habit to look before you statisticize.
- Really Simple Statistics: T-Tests
- Really Simple Statistics: p values
- Really Simple Statistics: Nominal Ordinal Interval and Ratio Numbers
- Really Simple Statistics: What is Ratio Data
- Really Simple Statistics: What is Ordinal Data?
- Really Simple Statistics: What is Nominal Data?
- Really Simple Statistics: What is Interval Data?
- Really Simple Statistics: What is a standard deviation?
- Really Simple Statistics: Sample Sizes
- The forgotten side of segmentation
- Your survey questions are all wrong
- What sample size do I need?
- Why do people like marketing research surveys?
Just like that news article, misunderstandings about correlation and causation are rampant. We all know that correlations reflect things that tend to occur together whereas causation reflects one thing that makes the other thing happen. Well bah humbug. Enjoy these “correlations.”
- Mass umbrella openings cause rain to fall
- Putting on shorts and t-shirts causes it to be warm outside
- Opening a door causes a person to walk through it
- Buying popcorn causes movies to be shown
- Buying plus sized clothing causes you to gain weight
- Flying your own jet causes you to be a multi-millionaire
[Fans of bacon should read this version instead.]
Do a quick search on Twitter or Google and you’ll instantly find 412 653 ways to encourage people to engage more with your product. Reply to people, ask questions, use polls, give a call to action, request videos and photos, give them user accounts. The list goes on and on.
Why do we do this? Because research says that when users are more engaged with something, they spend more money on it. And we all like money.
But wait. Something i paid little attention to in my introductory statistics class is nagging me. It’s telling me not to leap to assumptions. It’s telling me that just because someone spends more time on something does not mean they will consequently spend more money on something. It’s telling me that correlation does not equal causation.
You see, people who are more engaged with something, a website, a shoe company, a kitchen supply shop are already invested in it, more than people who are less engaged with something. These people are more engaged because they like the company. They buy the product because they like the company. They do not buy the product because they are engaged with the company.
It’s very easy to forget the direction of relationships among variables. Sure, you can convince a bunch of people to become more engaged, and sure some of them will grow to like the company, and sure some of them will make a purchase. But don’t fool yourself into thinking you can get a bunch of vegetarians to eat bacon by convincing them to create a user profile on your bacon website and share bacon recipes with them. The common denominator isn’t engagement. It’s the bacon. It’s always the bacon.