Tag Archives: sample size

Data tables are the root of all evil #li

Stepwise regression

Image via Wikipedia

Data tables – ten thousand pages filled with eight billion numbers and four trillion significance tests. Some might think that’s a slight exaggeration, but to me it feels bang on.

Data tables have some great features they make it really easy to forget the basics and stretch beyond the true validity of the data. Here are a few things to try to remember.

1) Data tables show chi-squares and t-tests on every single combination not because those comparisons are important but rather because the software is capable of plugging numbers into equations. Human beings are the only ones who can say which comparisons make sense. (I assume you are human.)

2) Data tables will show significance testing even when the sample sizes are too small. The software will still calculate the test, and it might even provide a warning that the sample size is very small, but once again a human must intervene to verify that doing the test actually makes sense with that sample size. I don’t care if the test comes out statistically significant in spite of a small sample size. You must use your brain and decide for yourself if the sample size is still just too small.

3) We usually use a p value of 5% to decide whether a test result is significant. Using this threshold means there is a 5% chance that your conclusion will be wrong. On * every * single * solitary * test. What this means is that across your datatables of thousands of tests, there’s a frickin huge chance that lots of the significant findings are pointing you in the wrong direction. Wondering which ones are misleading you? Read #4.

4) Running a billion t-tests on a set of datatables isn’t something to brag about. What it really means is that you haven’t thought about why you’re doing the research and what you want to focus on. You’re basically doing tests in a stepwise regression style and waiting for anything to drop in your lap. It’s called exploratory research for a reason. If you do the exact same study again, you’ll probably get a completely different set of significant results. So whatever significant number you so eloquently explained to your client is going to disappear next time and you’re going to have to eloquently explain it away. Again. If you like looking dumb, this is the tactic for you.

The moral of the story is don’t let your stats software think for you. Take the time to decide what’s important. Talking to clients will be a whole lot easier and you’ll look a whole lot smarter.

Numb3rs5‘ by waltimo via Flickr
Image is licenced under a Creative Commons Attribution-ShareAlike licence

Related Posts


  • Tagxedo for colorful shapeable saveable word clouds
  • Kama Sutra Statistics
  • What’s Your IQ?
  • It’s your turn to discover the nuances of social media research
  • The Joy of Stats with Hans Rosling #MRX
  • Size Matters in Statistics

    Banana-flavored Banana Split Creme Oreos

    Image via Wikipedia

    If you’re in social or market research, you’ve heard about this topic before. Statistical significance is the one thing you wait to hear in order to know how likely your findings were the result of pure chance. If you are so lucky to have a p value smaller than 0.05, you raise your hands with excitement and exclaim Eureka! But, when your p value is huge, something like 0.06 or 0.7, you drag your heels in defeat. Why do we do this? Let me offer a couple scenarios.
    Option 1: 40% of people purchase regular oreos and 50% of people purchase double stuff oreos. Statistically not significant. Say what?
    Option 2: 40% of people purchase regular oreos and 41% of people purchase double stuff oreos. Statistically significant. Yer kidding me…
    How can this be? Well, the way you figure out if two numbers are significantly different involves sample size. Let’s say in option 1 you had 10 people. Statistics say that’s not enough people to be absolutely sure it’s not just chance. But, option 2 was calculated from 10 000 people! That’s more than enough people to know that this different isn’t just pure luck.
    But wait again….. I think 40% is quite different from 50%. So, maybe even though i got those numbers with just 10 people, I would be inclined to try again and see if that number happened in a larger sample. Just because it wasn’t statistically significant doesn’t mean I would let it go.
    On the hand, do I REALLY care about the difference between 40% and 41%? Is 41% more actionable than 40%? I just don’t think so. I couldn’t care less if the difference was statistically significant. It’s just not meaningful. The ‘effect size’ is just too tiny for me to care that it was statistically significant.
    Here’s another way of looking at it. Let’s say you ran your study and achieved statistical significance of p<0.06. Darn it, you say, i’m just going to give up on all my research now. But what if you did that study 5 more times and every single time, you got another p value of <0.06. Doesn’t this suggest to you that there really just might be a real difference happening, but just not as large as what you thought? I’d hate to be the person who quit that research and didn’t end up being the person to discover penicillin. (Let’s forget meta-analysis for now.)
    So, my advice to you… significance is interesting, but size definitely matters

    Related Articles

    Buy The Listen Lady Book

    Annie Pettit Buy Book listen lady amazon kindle ibookstore
    Now in the iBookstore!

    Learn about social media the quick and easy way. Buy your copy on Amazon.com the iBookstore or smashwords.

    Enter your email address to subscribe to this blog.

    Join 9,547 other followers

    LoveStats on Twitter

    All Top

    Featured in Alltop

    Get every new post delivered to your Inbox.

    Join 9,547 other followers

    %d bloggers like this: