Statistics are just numbers. 1 + 2 is always 3 even if the 2 was written in a disgusting colour. People, on the other hand, have crappy days all the time. It could be because a lunch was packed without cookies or because horrible tragedy has struck.
So why does it matter? Because crappy days mean someone:
- doesn’t answer a phone survey
- lies on their taxes
- makes a mistake on the census survey
- accidentally skips page 2 on a paper survey
- drips sarcasm all over their facebook page
You recognize these. We call them data quality issues.
Statistics lull us into a false sense of accuracy. Statistics are based on premises which do not hold true for beings with independent thought. Statistics lead us to believe that representative samples are possible when theory dictates it is impossible. Though a million times better than the humanities can ever dream of achieving, even “real” science can’t achieve representative samples. The universe is just far too big to allow that.
In other words, even when you’ve done everything statistically possible to ensure a rep sample, humans and their independent thought have had a crappy day somewhere in your research design.
There is no such thing as a rep sample. There are only good approximations of what we think a rep sample would look like.
And because I AM CANADIAN, I apologize if I have crushed any notions.
Read these too
Wouldn’t it be great it you could just read and interpret a number, and then be confident about your interpretation? If that was the case, you wouldn’t be able to buy 23 different books called “How to lie with statistics.”
Here are a few common problems I see when people try to interpret numbers.
Dislike matters just as much as like. Don’t get so focused on top box scores that you forget about bottom box scores. Brands can easily have identical top box scores and ridiculously different bottom scores.
How many times have you seen huge inexplicable spikes in your charts? Spikes are a key indicator that your sample size is too small. Be extremely nervous about numbers based on only 30 people. Be cautious of numbers based on fewer than 100 people. Check first and avoid embarrassing conclusions.
Everything on the planet is governed by rules. And one of those rules is randomness. When you’ve determined that a small sample size is not the cause of the spike, and there is no discernible explanation for the spike, consider that it may in fact be a random number. Random happens. Deal with it.
Just because a test came out significant today doesn’t mean it will with new data next week. See previous point. You will know you’ve really got something when its significant when it occurs on several unique occasions.
Have a look here too
Data tables – ten thousand pages filled with eight billion numbers and four trillion significance tests. Some might think that’s a slight exaggeration, but to me it feels bang on.
Data tables have some great features they make it really easy to forget the basics and stretch beyond the true validity of the data. Here are a few things to try to remember.
1) Data tables show chi-squares and t-tests on every single combination not because those comparisons are important but rather because the software is capable of plugging numbers into equations. Human beings are the only ones who can say which comparisons make sense. (I assume you are human.)
2) Data tables will show significance testing even when the sample sizes are too small. The software will still calculate the test, and it might even provide a warning that the sample size is very small, but once again a human must intervene to verify that doing the test actually makes sense with that sample size. I don’t care if the test comes out statistically significant in spite of a small sample size. You must use your brain and decide for yourself if the sample size is still just too small.
3) We usually use a p value of 5% to decide whether a test result is significant. Using this threshold means there is a 5% chance that your conclusion will be wrong. On * every * single * solitary * test. What this means is that across your datatables of thousands of tests, there’s a frickin huge chance that lots of the significant findings are pointing you in the wrong direction. Wondering which ones are misleading you? Read #4.
4) Running a billion t-tests on a set of datatables isn’t something to brag about. What it really means is that you haven’t thought about why you’re doing the research and what you want to focus on. You’re basically doing tests in a stepwise regression style and waiting for anything to drop in your lap. It’s called exploratory research for a reason. If you do the exact same study again, you’ll probably get a completely different set of significant results. So whatever significant number you so eloquently explained to your client is going to disappear next time and you’re going to have to eloquently explain it away. Again. If you like looking dumb, this is the tactic for you.
The moral of the story is don’t let your stats software think for you. Take the time to decide what’s important. Talking to clients will be a whole lot easier and you’ll look a whole lot smarter.
If you’re in social or market research, you’ve heard about this topic before. Statistical significance is the one thing you wait to hear in order to know how likely your findings were the result of pure chance. If you are so lucky to have a p value smaller than 0.05, you raise your hands with excitement and exclaim Eureka! But, when your p value is huge, something like 0.06 or 0.7, you drag your heels in defeat. Why do we do this? Let me offer a couple scenarios.
Option 1: 40% of people purchase regular oreos and 50% of people purchase double stuff oreos. Statistically not significant. Say what?
Option 2: 40% of people purchase regular oreos and 41% of people purchase double stuff oreos. Statistically significant. Yer kidding me…
How can this be? Well, the way you figure out if two numbers are significantly different involves sample size. Let’s say in option 1 you had 10 people. Statistics say that’s not enough people to be absolutely sure it’s not just chance. But, option 2 was calculated from 10 000 people! That’s more than enough people to know that this different isn’t just pure luck.
But wait again….. I think 40% is quite different from 50%. So, maybe even though i got those numbers with just 10 people, I would be inclined to try again and see if that number happened in a larger sample. Just because it wasn’t statistically significant doesn’t mean I would let it go.
On the hand, do I REALLY care about the difference between 40% and 41%? Is 41% more actionable than 40%? I just don’t think so. I couldn’t care less if the difference was statistically significant. It’s just not meaningful. The ‘effect size’ is just too tiny for me to care that it was statistically significant.
Here’s another way of looking at it. Let’s say you ran your study and achieved statistical significance of p<0.06. Darn it, you say, i’m just going to give up on all my research now. But what if you did that study 5 more times and every single time, you got another p value of <0.06. Doesn’t this suggest to you that there really just might be a real difference happening, but just not as large as what you thought? I’d hate to be the person who quit that research and didn’t end up being the person to discover penicillin. (Let’s forget meta-analysis for now.)
So, my advice to you… significance is interesting, but size definitely matters