Data tables are the root of all evil #li

Stepwise regression

Image via Wikipedia

Data tables – ten thousand pages filled with eight billion numbers and four trillion significance tests. Some might think that’s a slight exaggeration, but to me it feels bang on.

Data tables have some great features they make it really easy to forget the basics and stretch beyond the true validity of the data. Here are a few things to try to remember.

1) Data tables show chi-squares and t-tests on every single combination not because those comparisons are important but rather because the software is capable of plugging numbers into equations. Human beings are the only ones who can say which comparisons make sense. (I assume you are human.)

2) Data tables will show significance testing even when the sample sizes are too small. The software will still calculate the test, and it might even provide a warning that the sample size is very small, but once again a human must intervene to verify that doing the test actually makes sense with that sample size. I don’t care if the test comes out statistically significant in spite of a small sample size. You must use your brain and decide for yourself if the sample size is still just too small.

3) We usually use a p value of 5% to decide whether a test result is significant. Using this threshold means there is a 5% chance that your conclusion will be wrong. On * every * single * solitary * test. What this means is that across your datatables of thousands of tests, there’s a frickin huge chance that lots of the significant findings are pointing you in the wrong direction. Wondering which ones are misleading you? Read #4.

4) Running a billion t-tests on a set of datatables isn’t something to brag about. What it really means is that you haven’t thought about why you’re doing the research and what you want to focus on. You’re basically doing tests in a stepwise regression style and waiting for anything to drop in your lap. It’s called exploratory research for a reason. If you do the exact same study again, you’ll probably get a completely different set of significant results. So whatever significant number you so eloquently explained to your client is going to disappear next time and you’re going to have to eloquently explain it away. Again. If you like looking dumb, this is the tactic for you.

The moral of the story is don’t let your stats software think for you. Take the time to decide what’s important. Talking to clients will be a whole lot easier and you’ll look a whole lot smarter.

Numb3rs5‘ by waltimo via Flickr
Image is licenced under a Creative Commons Attribution-ShareAlike licence

Related Posts


  • Tagxedo for colorful shapeable saveable word clouds
  • Kama Sutra Statistics
  • What’s Your IQ?
  • It’s your turn to discover the nuances of social media research
  • The Joy of Stats with Hans Rosling #MRX
  • %d bloggers like this: