Really Simple Statistics: What is sampling error? #MRX

really simple statistics

Welcome to Really Simple Statistics (RSS). There are lots of places online where you can ponder over the minute details of complicated equations but very few places that make statistics understandable to everyone. I won’t explain exceptions to the rule or special cases here. Let’s just get comfortable with the fundamentals.

** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **

What is sampling error? First, you need to understand what sampling is. Sampling is choosing a smaller set of data/people/things to reflect the entire population. For instance, instead of measuring the height of everyone in your office, you might just measure the height of ten people. Or, instead of asking every person in Canada who they intend to vote for, you choose a sample of 2000 people to ask.

A row of Asian Short claws!

Image via Wikipedia

In the process of sampling, you gather 10 heights instead of 100 heights, or you gather 100 opinions instead of 1000 opinions. Either way, you don’t gather every possible data point and that means the summary numbers you generate will probably not be exactly the same  had you measured every data point.  The process of sampling introduces error and it cannot be avoided.

In addition to sampling error, most research studies are affected by other errors that also take place during the sampling process. This includes coverage errors, non-response errors, self-selection errors, and more. Consider these obvious sampling biases:

  • The ten tallest people in your office were away at a “Retreat for tall people” and you didn’t wait to include them in your height sample.
  • The ten Asian people in your office were away at a “Retreat for Asian people” and therefore couldn’t be part of your height sample (hm…. aren’t Asian people know for being shorter than average?”
  • When you were gathering opinions on voting intentions, you only asked people who were attending a gala for a particular political candidate

Running a survey and you’re positive your sampling plan is perfect?

  • Does everyone have a telephone in order to respond to your telephone survey?
  • Does everyone have a home where they can receive a mail survey?
  • Does everyone have a computer where they can receive an email survey?

Running social media research and you’re positive your sampling plan is perfect?

  • Does everyone feel comfortable leaving comments on blogs?
  • Does everyone have a public facebook page?
  • Does everyone use Twitter?

Of course, these are the obvious errors taking place during the sampling process. Tiny mistakes are always made in the sampling process, particularly when you must first decide from where to gather opinions. The trick is to ALWAYS assume that your sampling plan includes error.


4 responses

  1. Hi Annie — sorry to be a pain and with all due respect but I think what you’ve described above is actually non-sampling error. Sampling error is the error associated with drawing of a sample. The underlying concept says that if I draw 10 samples in the same way and of 100 people from the same frame I will get 10 slightly different results. There is no way for the 100 people in one sample to be exactly like the 100 people in the other samples. There will be small but measurable differences. Most of the examples in your post are either coverage error or nonresponse error, both of which are different from sampling error and are generally lumped in with measurement error as non-sampling errors.

    1. Thanks for your comment Reg. I clarified the post. 🙂

  2. Hey Annie! Nice “RSS”. The only comment I would make is that your examples imply that sampling error only applies when sampling is poor. That is not the case; the very act of sampling introduces sampling error, no matter how “good” (i.e., random) the sample is.

    1. Thanks Kerry. Hopefully my revisions make it more clear. Cheers!

%d bloggers like this: