# Really Simple Statistics: What is sampling error? #MRX

Welcome to Really Simple Statistics (RSS). There are lots of places online where you can ponder over the minute details of complicated equations but very few places that make statistics understandable to everyone. I won’t explain exceptions to the rule or special cases here. Let’s just get comfortable with the fundamentals.

** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **

What is sampling error? First, you need to understand what sampling is. Sampling is choosing a smaller set of data/people/things to reflect the entire population. For instance, instead of measuring the height of everyone in your office, you might just measure the height of ten people. Or, instead of asking every person in Canada who they intend to vote for, you choose a sample of 2000 people to ask.

Image via Wikipedia

In the process of sampling, you gather 10 heights instead of 100 heights, or you gather 100 opinions instead of 1000 opinions. Either way, you don’t gather every possible data point and that means the summary numbers you generate will probably not be exactly the same  had you measured every data point.  The process of sampling introduces error and it cannot be avoided.

In addition to sampling error, most research studies are affected by other errors that also take place during the sampling process. This includes coverage errors, non-response errors, self-selection errors, and more. Consider these obvious sampling biases:

• The ten tallest people in your office were away at a “Retreat for tall people” and you didn’t wait to include them in your height sample.
• The ten Asian people in your office were away at a “Retreat for Asian people” and therefore couldn’t be part of your height sample (hm…. aren’t Asian people know for being shorter than average?”
• When you were gathering opinions on voting intentions, you only asked people who were attending a gala for a particular political candidate

Running a survey and you’re positive your sampling plan is perfect?

• Does everyone have a telephone in order to respond to your telephone survey?
• Does everyone have a home where they can receive a mail survey?
• Does everyone have a computer where they can receive an email survey?

Running social media research and you’re positive your sampling plan is perfect?

• Does everyone feel comfortable leaving comments on blogs?
• Does everyone have a public facebook page?
• Does everyone use Twitter?

Of course, these are the obvious errors taking place during the sampling process. Tiny mistakes are always made in the sampling process, particularly when you must first decide from where to gather opinions. The trick is to ALWAYS assume that your sampling plan includes error.

# A Cornucopia of Complicated Communications #MRX

Research reports are a cornucopia of complicated statistical representations. Or, if I may say in simpler terms, a lot of fancy numbers. As researchers, we get so engrossed in the statistical analyses, visual representations, and factual reporting that we forget how our experience with research and statistics idiffers from the experiences of other people. We forget that our readers may not have take years of statistics and research methods class and therefore don’t always understand how statistics work, why sample size matters, why effect size matters, or even what these terms mean. We forget that our world is very different from the world of a brand manager, a marketer, a consumer, a CEO. We speak in researchese not peoplese.

With that in mind, consider these excellent and accurate definitions of a t-test.

1. t-test is any statistical hypothesis test in which the test statistic follows a Student’s t distribution if the null hypothesis is supported.
2. A statistical examination of two population means. A two-sample t-test examines whether two samples are different and is commonly used when the variances of two normal distributions are unknown and when an experiment uses a small sample size.
3. In statistics, a t-test is what the distribution will be if a student’s null hypothesis is true. The usual form for t-test statistics is T=Zls.
4. The ttest is a simple test of the separation of two sets of data, and is used to determine significance of experimental results.
5. The t-test tells you if the average number for one group is different from the average number for another group. (e.g., the average height of women vs the average height of men)

So tell me, which definition made the most sense to you? I’m going to guess it’s option number 5. This is the only option that avoided standard statistical terms and brought the language into the sphere of a regular person in the regular world. Most people should feel confident in their ability to share that information with other people no matter what their experience with statistics is.

The researcher’s goal is to share information, to communicate clearly, and to help other people understand what we are saying. So my suggestion to you is this.

Speak simply. Write simply. Be understood.

# A Box Score Lesson for Psychology Students

Image via Wikipedia

I was in school for a bunch of years, and took a bunch of research design courses and a bunch of statistical analysis courses. Easy ones, hard ones, and a few really interesting ones. Surprisingly, one thing I never learned about was box scores, a statistical staple in the market research world.

Box scores are a way of talking about and working with Likert scales or other types of categorical scales so that everyone knows whether you are talking about the positive end of the scale (top box, top 2 box), the middle of the scale (middle/neutral box), or the negative end of the scale (bottom box, bottom 2 box).

Instead of calculating average scores from the Likert scale responses, box scores are reported as the percentage out of the total number of people who answered the question.(If 10 out of 50 people chose strongly agree, top box score is 20%) Box scores let you clearly identify how many people fall into a subgroup – people who are happy, unhappy, or just don’t care about your product.

Why do box scores matter? In a sense, they do report the same type of information as average scores. But, unless standard deviations are near and dear to you, average scores often appear very similar between groups. It’s hard to explain to a client why scores of 3.6 and 3.9 are very different because there is no intuitive difference between those numbers.

But, let’s think about box scores now. Can you intuitively understand the difference between 30% of people liking your brand and 40% of people liking your brand? I’m pretty sure you can. And you don’t need to understand what a standard deviation is either. I’m not in favour of dumbing down statistics but I am in favour of people understanding them.

Here’s another reason box scores are good. The average score calculated for a result that is 10% top box, 10% bottom box and 80% middle box is exactly the same average score you would get for a result that is 40% top box, 40% bottom box, and 20% middle box. I’d certainly like to know if 10% or 40% of people hated my product. That’s a pretty important difference to be aware of and I wouldn’t want it getting lost because someone had a weak understanding of what a SD is.

So now, psychology/sociology/geography majors, go forth and prosper as market researchers!

# Common Mis-Numbers

Image via Wikipedia

Wouldn’t it be great it you could just read and interpret a number, and then be confident about your interpretation? If that was the case, you wouldn’t be able to buy 23 different books called “How to lie with statistics.”

Here are a few common problems I see when people try to interpret numbers.

Dislike matters just as much as like. Don’t get so focused on top box scores that you forget about bottom box scores. Brands can easily have identical top box scores and ridiculously different bottom scores.

How many times have you seen huge inexplicable spikes in your charts? Spikes are a key indicator that your sample size is too small. Be extremely nervous about numbers based on only 30 people. Be cautious of numbers based on fewer than 100 people. Check first and avoid embarrassing conclusions.

Everything on the planet is governed by rules. And one of those rules is randomness. When you’ve determined that a small sample size is not the cause of the spike, and there is no discernible explanation for the spike, consider that it may in fact be a random number. Random happens. Deal with it.

Just because a test came out significant today doesn’t mean it will with new data next week. See previous point. You will know you’ve really got something when its significant when it occurs on several unique occasions.

Happy Pi Day (to the 36th digit)!‘ by Mykl Roventine via Flickr
Image is licenced under a Creative Commons Attribution licence

Have a look here too

• 1 Topic 5 Blogs: DIY surveys suck or save the day
• Paul the Octopus, Phd in Statistics, Lettered in football
• Pie Charts – Our Evil Friend
• Why market researchers can never be marketers
• Survey Design Tip #3: Do You Encourage Straightlining?
• Laugh at yourself and then cry at our flailing industry
• # Cronbach’s Alpha, My Favourite Statistic

Image via Wikipedia

Is it strange that I have a favourite statistic? Is it strange that as part of every hiring interview I ask the person what THEIR favourite statistic is? (Warning to all of you who may suffer through one of my future interviews.)
.
My favourite statistic is cronbach’s alpha. I like seeing the item-total correlations, popping variables in and out of exploratory testing, and crossing my fingers that my final decision gives me a value greater than 0.8, with both positively and negatively keyed items from the full range of categories and with as few items as possible. Yikes. That sounds impossible. But I’ll always try.

Related Articles

• Your Research Budget Just Went Down the Toilet #li
• Hear me out: Let’s ban boring surveys
• Why do marketing research surveys always ask you to buy stuff?
• #MRA_FOC #MRX Karaoke by Jim Longo. I dare you.
• If Frozen Meat Isn’t a Bribe, What About Raw Meat?