Really Simple Statistics: Sample Sizes #MRX

really simple statisticsWelcome to Really Simple Statistics (RSS). There are lots of places online where you can ponder over the minute details of complicated equations but very few places that make statistics understandable to everyone. I won’t explain exceptions to the rule or special cases here. Let’s just get comfortable with the fundamentals.

** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **

I’m going to guess that the #1 question every researcher has when they start a research project is this: How many people do I need to measure? If you want a simple answer, then you need to measure 1000 people per group. Unfortunately, that’s not the answer most people like. Unlike the hope the post title gave you, there really isn’t a simple answer.

Do you plan to look at subgroups?


If you plan to split out subgroups in your data, then you need to make sure each group will have a large enough sample size. Do you plan to compare men and women? Do you want to see if older people generate different results than younger people? Are you comparing TV commercial #1 with TV commercial #2?

If you only have the budget to measure 100 people but you plan to split that group into people aged 18 to 34 and 35+, and then by gender, then you will only have 25 people in each group. That simply isn’t enough to be sure that the results you find are more likely to be real than due to chance. Every single group you look at needs to have a large enough sample size to ensure the results aren’t due to chance. And if that means each of your 15 groups needs to have at least 100 people each, then you’ll need to increase your budget or decrease the number of groups you look at.

How big of a difference are you expecting?

If you think an important difference between your groups will be small, then you will need a large sample. For instance, if you’re testing the effectiveness of a health and wellness campaign, any small difference will make a big improvement in people’s lives. You don’t care if the improvement is small, perhaps an increase in effectiveness of 1% or 2%. You care that 1% or 2% of people are doing better. We know pure chance can easily give us numbers that are 2% different. To try to counter random chance, we need to use very large sample sizes. Hundreds or thousands is probably the more appropriate number.

And vice versa – if you think an important difference will be big, then you can get away with a smaller sample. Perhaps you’re testing a new scent of air freshener. Really, you don’t care if 1% of people like it more than the existing scents. You only care if 10% or 15% of people like it more than the existing scent. It’s much harder for random chance to create sets of numbers that are 10% different so this time, we don’t need to use such a large sample size. You might be able to get away with just a couple of hundred.

Are you measuring once or several times?

If you are measuring something more than once, perhaps tracking it on a weekly or monthly basis, each sample size can be smaller. For instance, you might determine that a one time measure should be 300 but a weekly measure need only be 100 per week for 6 weeks. As before, it’s hard for random chance to produce similar results every single week for 6 weeks so we don’t need as large a sample size each time.

If you’re looking for some specific direction, then check out this list of statistical calculators. Be prepared for some very technical terms though!


One response

  1. Great post on a topic that is always a little bit ambiguous and intimidating. I’d be curious to know how often the data requirements aren’t fully known until after it’s collected – i.e. how often a market researcher plans on ignoring subgroups only to find out later that subgroups would actually be useful, for example.

%d bloggers like this: