Welcome to Really Simple Statistics (RSS). There are lots of places online where you can ponder over the minute details of complicated equations but very few places that make statistics understandable to everyone. I won’t explain exceptions to the rule or special cases here. Let’s just get comfortable with the fundamentals.
** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **
What is sampling error? First, you need to understand what sampling is. Sampling is choosing a smaller set of data/people/things to reflect the entire population. For instance, instead of measuring the height of everyone in your office, you might just measure the height of ten people. Or, instead of asking every person in Canada who they intend to vote for, you choose a sample of 2000 people to ask.
In the process of sampling, you gather 10 heights instead of 100 heights, or you gather 100 opinions instead of 1000 opinions. Either way, you don’t gather every possible data point and that means the summary numbers you generate will probably not be exactly the same had you measured every data point. The process of sampling introduces error and it cannot be avoided.
In addition to sampling error, most research studies are affected by other errors that also take place during the sampling process. This includes coverage errors, non-response errors, self-selection errors, and more. Consider these obvious sampling biases:
- The ten tallest people in your office were away at a “Retreat for tall people” and you didn’t wait to include them in your height sample.
- The ten Asian people in your office were away at a “Retreat for Asian people” and therefore couldn’t be part of your height sample (hm…. aren’t Asian people know for being shorter than average?”
- When you were gathering opinions on voting intentions, you only asked people who were attending a gala for a particular political candidate
Running a survey and you’re positive your sampling plan is perfect?
- Does everyone have a telephone in order to respond to your telephone survey?
- Does everyone have a home where they can receive a mail survey?
- Does everyone have a computer where they can receive an email survey?
Running social media research and you’re positive your sampling plan is perfect?
- Does everyone feel comfortable leaving comments on blogs?
- Does everyone have a public facebook page?
- Does everyone use Twitter?
Of course, these are the obvious errors taking place during the sampling process. Tiny mistakes are always made in the sampling process, particularly when you must first decide from where to gather opinions. The trick is to ALWAYS assume that your sampling plan includes error.
- Really Simple Statistics: T-Tests
- Really Simple Statistics: p values
- Really Simple Statistics: Nominal Ordinal Interval and Ratio Numbers
- Really Simple Statistics: What is Ratio Data
- Really Simple Statistics: What is Ordinal Data?
- Really Simple Statistics: What is Nominal Data?
- Really Simple Statistics: What is Interval Data?
Research reports are a cornucopia of complicated statistical representations. Or, if I may say in simpler terms, a lot of fancy numbers. As researchers, we get so engrossed in the statistical analyses, visual representations, and factual reporting that we forget how our experience with research and statistics idiffers from the experiences of other people. We forget that our readers may not have take years of statistics and research methods class and therefore don’t always understand how statistics work, why sample size matters, why effect size matters, or even what these terms mean. We forget that our world is very different from the world of a brand manager, a marketer, a consumer, a CEO. We speak in researchese not peoplese.
With that in mind, consider these excellent and accurate definitions of a t-test.
- A t-test is any statistical hypothesis test in which the test statistic follows a Student’s t distribution if the null hypothesis is supported.
- A statistical examination of two population means. A two-sample t-test examines whether two samples are different and is commonly used when the variances of two normal distributions are unknown and when an experiment uses a small sample size.
- In statistics, a t-test is what the distribution will be if a student’s null hypothesis is true. The usual form for t-test statistics is T=Zls.
- The t–test is a simple test of the separation of two sets of data, and is used to determine significance of experimental results.
- The t-test tells you if the average number for one group is different from the average number for another group. (e.g., the average height of women vs the average height of men)
So tell me, which definition made the most sense to you? I’m going to guess it’s option number 5. This is the only option that avoided standard statistical terms and brought the language into the sphere of a regular person in the regular world. Most people should feel confident in their ability to share that information with other people no matter what their experience with statistics is.
The researcher’s goal is to share information, to communicate clearly, and to help other people understand what we are saying. So my suggestion to you is this.
Speak simply. Write simply. Be understood.
Wouldn’t it be great it you could just read and interpret a number, and then be confident about your interpretation? If that was the case, you wouldn’t be able to buy 23 different books called “How to lie with statistics.”
Here are a few common problems I see when people try to interpret numbers.
Dislike matters just as much as like. Don’t get so focused on top box scores that you forget about bottom box scores. Brands can easily have identical top box scores and ridiculously different bottom scores.
How many times have you seen huge inexplicable spikes in your charts? Spikes are a key indicator that your sample size is too small. Be extremely nervous about numbers based on only 30 people. Be cautious of numbers based on fewer than 100 people. Check first and avoid embarrassing conclusions.
Everything on the planet is governed by rules. And one of those rules is randomness. When you’ve determined that a small sample size is not the cause of the spike, and there is no discernible explanation for the spike, consider that it may in fact be a random number. Random happens. Deal with it.
Just because a test came out significant today doesn’t mean it will with new data next week. See previous point. You will know you’ve really got something when its significant when it occurs on several unique occasions.
Have a look here too
Is it strange that I have a favourite statistic? Is it strange that as part of every hiring interview I ask the person what THEIR favourite statistic is? (Warning to all of you who may suffer through one of my future interviews.)
My favourite statistic is cronbach’s alpha. I like seeing the item-total correlations, popping variables in and out of exploratory testing, and crossing my fingers that my final decision gives me a value greater than 0.8, with both positively and negatively keyed items from the full range of categories and with as few items as possible. Yikes. That sounds impossible. But I’ll always try.