On the Minitab Blog, Carly Barry listed a number of common and basic statistics errors. Most readers would probably think, “I would never make those errors, I’m smarter than that.” But I suspect that if you took a minute and really thought about it, you’d have to confess you are guilty of at least one. You see, every day we are rushed to finish this report faster, that statistical analysis faster, or those tabulations faster, and in our attempts to get things done, errors slip in.
Number 4 in Carly’s list really spoke to me. One of my pet peeves in marketing research is the overwhelming reliance on data tables. These reports are often hundreds of pages long and include crosstabs of every single variable in the survey crossed with every single demographic variable in the survey. Then, a t-test or chi-square is run for every cross, and carefully noted for which differences is statistically significant. Across thousands and thousands of tests, yes, a few hundred are statistically significant. That’s a lot of interesting differences to analyze. (Let’s just ignore the ridiculous error rates of this method.)
But tell me this, when was the last time you saw a report that incorporated effect sizes? When was the last time you saw a report that flagged the statistically significant differences ONLY if that difference was meaningful and large? No worries. I can tell you that answer. Never.
You see, pretty much anything can be statistically significant. By definition, 5% of differences are significant. Tests run with large samples are significant. Tests of tiny percents are significant. Are any of these meaningful? Oh, who has time to apply their brains and really think about whether a difference would result in a new marketing strategy. The p-value is all too often substituted for our brains. (Tweet that quote)
It’s time to redo those tables. Urgently.
Read an excerpt from Carly’s post here and then continue on to the full post with the link below.
Statistical Mistake 4: Not Distinguishing Between Statistical Significance and Practical Significance
It’s important to remember that using statistics, we can find a statistically significant difference that has no discernible effect in the “real world.” In other words, just because a difference exists doesn’t make the difference important. And you can waste a lot of time and money trying to “correct” a statistically significant difference that doesn’t matter.
- Are There Perils in Changing the Way We Sample our Respondents by Inna Burdein #CASRO #MRX (lovestats.wordpress.com)
- 11 signs that you don’t have a research objective #MRX (lovestats.wordpress.com)
- Do Google Surveys use Probability Sampling? #MRX #MRMW (lovestats.wordpress.com)
What affects survey responses?
– color of the page
– images on the page
– wording choices
– question length
– survey length
– scale choices
All of these options, plus about infinity more, mean that confidence intervals and point estimates from any research are pointless.
And yet, we spew out out significance testing at every possible opportunity. You know, when sample sizes are in the thousands, even the tiniest of differences are statistically significant. Even meaningless differences. Even differences caused by the page colour not the question content.
So enough with the stat testing. If you don’t talk about effect sizes and meaningfulness , then I’m not interested.
Nearly every day, I see a really cool statistic on TV or the interweeb. Everyone gets all excited about losing 312 pounds in four days, curing cancer, or eliminating measles forever. Candy is good for you! Coffee increases your memory! Drink more wine! Eat more Doritos! But if we paid ANY attention to the research methodology, you’d ignore the entire study. Here are a few of the biggest problems I see.
1) Significantly increased memory!!! Yes, when the sample size is large enough or the difference is large enough, anything is significant. So if 5 people in the control group remembered 5 things and 5 people in the test group remembered 8 things, the difference might be statistically significant. Or, if 1000 people in the control remembered 5 things and 1000 people in the test group remembered 5.2 things, the difference might be statistically signficant. Do you trust the results based on 10 people? Do you care about a difference of 0.1 points? I don’t. Get back to me when your sample sizes and effect sizes go beyond pre-test methodology sizes.
2) Cancer rates decreased by 75%!!! Yes, very nice finding. Especially when the cancer rate of the control group was 0.04% and the cancer rate of the test group was 0.01%. That is indeed a 75% decrease but will that massive decline of 0.03 points mean that you stop eating chocolate or start drinking wine? Doubt it. It’s not a meaningful difference when it comes to one single person. Get back to me when the rate decreases by 75% AND the base rate can be measured without any decimal places.
3) Chocolate makes you thin!!!! I’m sure it did. In that one, single study. That has never been replicated. Remember how we compare all our findings against a 5% chance rate? Well, that’s what you just discovered. The 5% chance where the finding occurred randomly. Run the research another 19 times and then get back to me when 19 of them say that chocolate makes you thin.
There are about 423 other cautions to watch out for, but today has been brought to you by the number three.
p-values are the backbone of market research. Every time we complete a study, we run all of our data through a gazillion statistical tests and search for those that are significant. Hey, if you’re lucky, you’ll be working with an extremely large sample size and everything will be statistically significant. More power to you!
But what if you didn’t calculate p-values? What if you simply looked at the numbers and decided if the difference was meaningful? What if you calculated means and standard deviations, and focused more on effect sizes and less on p<0.05? Instead of relying on some statistical test to tell you that you chose a sample size large enough to make the difference significant, what if you used your brain to decide if the difference between the numbers was meaningful enough to warrant taking a decision?
Effect sizes are such an underused, unappreciated measure in market research. Try them. You’ll like them. Radical?
- Radical Market Research Idea #4: Make your spouse take your survey #MRX (lovestats.wordpress.com)
- Radical Market Research Idea #5: Drop the decimals #MRX (lovestats.wordpress.com)
- Radical Market Research Idea #1: Banish probability sampling #MRX (lovestats.wordpress.com)
Everybody deserves a little recognition once in a while. Many of us do good deeds on a regular basis and are never recognized for those accomplishments. Today, however, is your chance to shine.
Do you qualify for any of these awards? Then be proud and claim them! Copy the link and post the awards on your website. But be honest. Falsely claiming awards will result in your name being published on Santa Clause’s naughty list. Just sayin.
[tweetmeme source=”lovestats” only_single=false]You are a dork. The proof is that you thought a blog post about statistics might be interesting. I admire your strange interest in statistics.
- I like how trend lines become straighter and straighter as you increase the sample size
- I like how box plots convey everything you need to know about a distribution
- I like 3 dimensional maps where the axes take you hours to name and then you go “duh, of course!”
- I like mahalanobis distance, cronbachs alpha, and factor analysis.
- I like how effect sizes can make statistical significance irrelevant.
Again, your turn!
Read these too
If you’re in social or market research, you’ve heard about this topic before. Statistical significance is the one thing you wait to hear in order to know how likely your findings were the result of pure chance. If you are so lucky to have a p value smaller than 0.05, you raise your hands with excitement and exclaim Eureka! But, when your p value is huge, something like 0.06 or 0.7, you drag your heels in defeat. Why do we do this? Let me offer a couple scenarios.
Option 1: 40% of people purchase regular oreos and 50% of people purchase double stuff oreos. Statistically not significant. Say what?
Option 2: 40% of people purchase regular oreos and 41% of people purchase double stuff oreos. Statistically significant. Yer kidding me…
How can this be? Well, the way you figure out if two numbers are significantly different involves sample size. Let’s say in option 1 you had 10 people. Statistics say that’s not enough people to be absolutely sure it’s not just chance. But, option 2 was calculated from 10 000 people! That’s more than enough people to know that this different isn’t just pure luck.
But wait again….. I think 40% is quite different from 50%. So, maybe even though i got those numbers with just 10 people, I would be inclined to try again and see if that number happened in a larger sample. Just because it wasn’t statistically significant doesn’t mean I would let it go.
On the hand, do I REALLY care about the difference between 40% and 41%? Is 41% more actionable than 40%? I just don’t think so. I couldn’t care less if the difference was statistically significant. It’s just not meaningful. The ‘effect size’ is just too tiny for me to care that it was statistically significant.
Here’s another way of looking at it. Let’s say you ran your study and achieved statistical significance of p<0.06. Darn it, you say, i’m just going to give up on all my research now. But what if you did that study 5 more times and every single time, you got another p value of <0.06. Doesn’t this suggest to you that there really just might be a real difference happening, but just not as large as what you thought? I’d hate to be the person who quit that research and didn’t end up being the person to discover penicillin. (Let’s forget meta-analysis for now.)
So, my advice to you… significance is interesting, but size definitely matters