Nearly every day, I see a really cool statistic on TV or the interweeb. Everyone gets all excited about losing 312 pounds in four days, curing cancer, or eliminating measles forever. Candy is good for you! Coffee increases your memory! Drink more wine! Eat more Doritos! But if we paid ANY attention to the research methodology, you’d ignore the entire study. Here are a few of the biggest problems I see.
1) Significantly increased memory!!! Yes, when the sample size is large enough or the difference is large enough, anything is significant. So if 5 people in the control group remembered 5 things and 5 people in the test group remembered 8 things, the difference might be statistically significant. Or, if 1000 people in the control remembered 5 things and 1000 people in the test group remembered 5.2 things, the difference might be statistically signficant. Do you trust the results based on 10 people? Do you care about a difference of 0.1 points? I don’t. Get back to me when your sample sizes and effect sizes go beyond pre-test methodology sizes.
2) Cancer rates decreased by 75%!!! Yes, very nice finding. Especially when the cancer rate of the control group was 0.04% and the cancer rate of the test group was 0.01%. That is indeed a 75% decrease but will that massive decline of 0.03 points mean that you stop eating chocolate or start drinking wine? Doubt it. It’s not a meaningful difference when it comes to one single person. Get back to me when the rate decreases by 75% AND the base rate can be measured without any decimal places.
3) Chocolate makes you thin!!!! I’m sure it did. In that one, single study. That has never been replicated. Remember how we compare all our findings against a 5% chance rate? Well, that’s what you just discovered. The 5% chance where the finding occurred randomly. Run the research another 19 times and then get back to me when 19 of them say that chocolate makes you thin.
There are about 423 other cautions to watch out for, but today has been brought to you by the number three.
How do you create a survey question measuring frequency of behaviour that will generate the most accurate responses? Experience tells us to consider things like:
- Should I include a zero or incorporate that into the smallest value?
- Should I use use whole numbers like ‘2 to 4’ or partial numbers like ‘2 to less than 4’?
- Should I use 4 break points or go all out with 10 break points?
These are all smart considerations and will help you collect more precise data. But seriously? How accurate, how precise, how valid are these data anyways? Do you really think that survey responders are going to carefully and precisely calculate the exact number of days or minutes they do something?
When we ask responders to choose between these two options…
- 2 to 4.99
- 2 to less than 5
… do you really think that one or the other option will help responders think of estimates that are any more accurate? Let’s face it. They won’t. Yes, there’s statistical precision in the answer options we’ve provided, but we are manufacturing a level of accuracy that does not exist. It’s no different than reporting 10 decimals places where 1 is more than sufficient.
So what do I recommend? Make things simple for your responder. Use real language not hoity toity, decades of academic research language. Use language that makes responders want to come and take another survey.
- Hot weather causes drownings #MRX (lovestats.wordpress.com)
- It’s just a keyword search #mrx (lovestats.wordpress.com)
- The Fourth Dimension of Research by Gregg Archibald #ACEI_CO #InvestigAction2013 #MRX (lovestats.wordpress.com)
Truncating doesn’t get the respect it deserves.
Like rounding, truncating gets rid of those pesky decimal places that imply a higher degree of accuracy than truly exists. When you’re talking about a ten point scale or 100 percent ranges, 56.85637328 is identical to 56.
Like rounding, truncating makes numbers that are nine places away from each other appear to be equal. 7.5 and 8.4 are 9 points apart but both get rounded to 8. Just as 8.0 and 8.9 are 9 places away from each other but both get truncated to 8.
The only time when rounding has a very slight advantage over truncating is when you’re using scales with a very small range. Where rounding retains the five points in a five point scale, truncating essentially reduces a five point scale to a four point scale. Now that isn’t inherently bad, but when you haven’t got a lot of variability in your results to begin with, every box counts. That is, afterall, why we love decimal places.
Personally, I prefer truncating over rounding. It’s a great sound numerical karate chop. And It just sounds cooler.
Written on the go
Pull out the last research report you wrote. Flip through until you get to a paragraph full of percentages. You know, 43.76% of consumers like this, 28.382% of consumers prefer that. Look at and admire the decimal places. Aren’t they lovely.
But tell me, if all of the following examples are based on a survey of 1000 people, in which case would you make the business decision to produce the blue product?
- 40.001% of people prefer the pink version whereas 40.002% of people prefer the blue version
- 40.01% of people prefer the pink version whereas 40.02% of people prefer the blue version
- 40.1% of people prefer the pink version whereas 40.02% of people prefer the blue version
I certainly hope that none of those situations would lead you to conclude that the blue version is preferred over the pink version. Now try a few more options.
- 40% of people prefer the pink version whereas 41% of people prefer the blue version
- 40% of people prefer the pink version whereas 43% of people prefer the blue version
- 40% of people prefer the pink version whereas 45% of people prefer the blue version
So here’s my request. Enough already with the decimal places. They give us a false sense of precision and they are useless at clarifying major business decisions. If two percentages are so similar that decimal places are required, you’ve got nothing. Move on. Try again. Stop wasting your time. Radical?
What the heck is that?
I’m not a statistician. I’ve taken undergrad and grad courses in statistics but they certainly were not my main focus. But, I understand statistics and I like using them. Well, if you’ve read some of my previous posts about statistics, you have an idea of what i mean.
I don’t believe in decimal places.
I don’t place all my trust in statistical significance.
I recognize the lack of probability sampling when doing research with people.
I think that statistics are great at pointing you in a direction, giving you ideas, and making you think about things in a different way than you’re used to. When you see statistical significance in place you’ve never seen it before, it catches your attention and makes you wonder what’s going on. I think decimal places take your focus away from what really matters. I think the fight to identify what research methodology uses the best approximation of probability sampling.
So what am i really saying? Statistics are a tool, one device I can use when I need it. Let the statistics be what they may, significant or not, sufficient decimal places or not, and a passable attempt at probability sampling. Your brain is the best tool for determining whether a number is of any interest.
I love stats.
Is this not the most boring topic ever? Not to me!
Which number do you trust the most:
In the research world, decimal places are everything. One decimal place just won’t cut it, two will usually suffice, three is sometimes a delightful occurrence. Yikes! So let’s see, let’s go back to the be all and end all question in market research. “How likely are you to purchase double stuffed oreo cookies?” Let’s also say your answer can range from 0% where there’s no way you’ll ever buy that wasteful crap. Or, you can rate it as high as 100% which means I live for sugar. (And to be clear, my answer would be 98%.)
Now, let’s say I asked that question of 300 people. If memory services me correct, that’s a confidence interval of a few points. So I did my survey and I got an answer of 37%. Was that 37% or 37.2% or 37.24%? Does it REALLY matter? It’s another question of meaningful findings vs statistical significance. If my confidence interval is 3 points, does that decimal place REALLY matter? I would suggest to you that it does not.
Statistics says that 37% plus or minus 3 points includes one decimal place, two decimal places, and even, shockingly, up to infinity decimal places. So, unless your confidence interval is 0.004, stop wasting your time with decimal places and starting thinking about what really matters – MEANINGFUL numbers.