Data Tables: The scourge of falsely significant results #MRX


Histogram of sepal widths for Iris versicolor ...

Image via Wikipedia

Who doesn’t have fond thoughts of 300 page data tabulation reports! Page after page of crosstab after crosstab, significance test after significance test. Oh, it’s a wonderful thing and it’s easy as pie (mmmm…. pie) to run your fingers down the rows and columns to identify all of the differences that are significant. This one is different from B, D, F, and G. That one is different from D, E, and H. Oh, the abundance of surprise findings!

But let me take you back to your introductory statistics class in college or university. Significance testing is a process we use to determine the likelihood that Number A is different from Number B at level that is different than what would be expected by chance. As an industry, we have generally agreed that we are willing to put up with a 5% chance of error, a 5% chance that the difference we see was just random chance. And each individual test has a 5% chance of error.

Now let’s think about those lovely tabulation reports that contain thousands of  individual significance tests. Did you realize that each of those significance tests has a 5% chance of error? So, that’s 5% plus 5% plus 5% plus….. I can’t bare to do the math and I’m not even sure I can do the math.

If you’re a researcher who wants to understand why you are making the decisions you are making and why sometimes your results don’t pan out in the marketplace, this is something you need to know. Words like Post hoc tests, multiple comparisons, family wise error, Bonferonni, Scheffe, and Tukey aren’t just cool sounding statistical terms. They are processes that ensure the error rate across an entire study is kept at your desired level, whether 5%, 10% or 1%.

There are many things about market research that aren’t perfect but it’s better to know and work with them, than not know and fight them.

3 responses

  1. […] Lovestats once again hits the nail on the head when saying: Who doesn’t have fond thoughts of 300 page data tabulation reports! Page after page of crosstab after crosstab, significance test after significance test. Oh, it’s a wonderful thing and it’s easy as pie (mmmm…. pie) to run your fingers down the rows and columns to identify all of the differences that are significant. This one is different from B, D, F, and G. That one is different from D, E, and H. Oh, the abundance of surprise findings! […]

  2. but…if they do it properly all the data would be……non significant…hence no hypothesis wise error rate. i gave up explaining this years ago…nice article !

    1. gee…. i didn’t think of that. so let’s just forget i said this so that we will get tons of significant results. 🙂