# Top Ten Posts of 2012 #MRX

In memory of what turned out to be a great 2012, here are the top ten LoveStats posts of the year as determined by reader counts. Keep it simple, scholars!

- Really Simple Statistics: Nominal Ordinal Interval and Ratio Numbers
- Really Simple Statistics: p values
- My Tastebuds are Leptokurtic, How About Yours?
- A Pie Chart of my Favourite bars…
- Survey Design Tip #3: Do You Encourage Straightlining?
- Really Simple Statistics: What is Nominal Data?
- Gay lunch and gay parking
- Really Simple Statistics: What is Interval Data?
- The 6 Worst Market Research Mistakes
- Really Simple Statistics: T-Tests

# You know what a leading question is, right? #MRX

Welcome to Really Simple Surveys (RSS), the younger sibling of Really Simple Statistics. There are lots of places online where you can ponder over the minute details of complicated survey designs but very few places that make survey design quickly understandable to everyone. I won’t explain exceptions to the rule or special cases here. Let’s just get comfortable with the fundamentals.

~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~

Sometimes, leading questions are easy to spot. You’d recognize one if you saw one, right? Like in the title of this post, right? Or like the following question, right?

What is your opinion about the current government?

a) They are doing a good job

b) They are doing pretty well

c) **They are disgustingly horrid**

In most cases though, leading questions slip into the mix with little notice. The questions and answers are written in such a way that people almost unconsciously focus on and choose one particular answer. It’s like leading a horse to water.

Leading questions cause responders to choose answers that don’t necessarily reflect their true feelings and for that reason they must be avoided at all costs. They come in many different formats and I’ve given examples of just of few of them below. See if you can spot the answer I’m leading you to and then figure out why you focused on that answer.

1. What is your opinion about the current government?

a) They are doing a good job

b) They need more resources

c) They are doing fairly well but they need to spend more time improving healthcare options

2. How much money has our government wasted?

a) 1 billion dollars

b) 2 billion dollars

c) 47.38 billion dollars

3. What does our government need to work harder on?

a) handling the country’s debt

b) passing the bill to save babies from being killed

c) dealing with various important healthcare issues

4. Complete the following sentence. I really wish my government would:

a) decrease my tax rate so I can afford better housing

b) debt issues that keep the country from growing

c) healthcare issues and other similar topics

5. Should the government work harder to improve the lives of the people?

a) yes

b) no

And the answers are…

1) Answers that are much longer or shorter than others draw extra attention

2) Unusually precise answers draw extra attention

3) Unusually loaded emotive answers draw extra attention

4) Answers that follow grammatically correct rules are easier understand

5) Some questions have no logical answer other than to agree

Now that you know, you can keep your eyes peeled for other formats. No more leading questions!

###### Related articles

# Talk doesn’t cook rice #MRX

Welcome to Really Simple Surveys (RSS), the younger sibling of Really Simple Statistics. There are lots of places online where you can ponder over the minute details of complicated survey designs but very few places that make survey design quickly understandable to everyone. I won’t explain exceptions to the rule or special cases here. Let’s just get comfortable with the fundamentals.

~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~

I have always thought the actions of men the best interpreters of their thoughts. – John Locke

Never mistake motion for action. – Ernest Hemingway

Well done is better than well said. – Benjamin Franklin

Talk doesn’t cook rice.- Chinese Proverb

Actions speak louder than words – Gersham Bulkeley

I know what you’re saying. You’re saying, “Hey Annie, those people are all dead. I have no idea what you’re trying to tell me.” Ok, ok, yes they are dead. Let me give you a few more examples that might be to your liking.

- If gas prices go any higher, I’m going to walk to work.
- If bus passes get any more expensive, I’m going to drive to work.
- I’m going to lose thirty pounds this year.
- I’m going to say ‘No’ from now on.
- I’m going to eat healthier from now on.
- I’m going to call my mom more often.

As I’m sure you have, I’ve heard ALL of these comments from people and not one single comment was associated with follow through. People love to talk and complain and whine and promise, but they really don’t like to DO. And that brings us to today’s survey design suggestion.

**Where ever possible, ask behaviour based questions instead of intention or theoretical questions. **Here are a few examples:

**Instead of**: Do you plan to buy Frootloops in the next 7 days?

**Try**: Did you buy Frootloops in the last 7 days?

**Instead** **of**: Do you plan to switch to the subway train next month?

**Try**: Did you buy a transit pass this month?

**Instead** **of**: Do you call your mother more often now?

**Try**: How many times did you call your mother in the last 2 weeks?

**Instead** **of**: Are you eating healthier now?

**Try**: Over the last 7 days, how many times did you eat at a restaurant?

You see, the first type of question makes it really easy for people to sneak around the truth and make themselves feel better with a socially desirable or hopeful or acquiescent answer. But the only way to get around the truth of the second type of question is with an outright lie – and that’s just harder for most people to do. So if you’re looking for honesty, even in the face of seemingly easy questions, always try to come up with a behavioural based question. And call your mother!

It’s that simple!

###### Related articles

# Really Simple Statistics: What is heteroscedasticity? #MRX

Welcome to Really Simple Statistics (RSS). There are lots of places online where you can ponder over the minute details of complicated equations but very few places that make statistics understandable to everyone. I won’t explain exceptions to the rule or special cases here. Let’s just get comfortable with the fundamentals.

** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **

Oh my, what a gorgeous heteroscedasticity you have! You mean other than a really cool eight syllable statistics word that you can show off with in front of friends?

This long and lovely word comes into play when you’re dealing with pairs of variables – perhaps height and weight, or grades and time spent studying, or voting behaviour and time spent reading the political section of the paper. It has mean and nasty effects on correlation coefficients and regression models so pay attention!

Specifically, it refers to the distribution of numbers for one variable in relation to the distribution of numbers for another variable. Homoscedasticity refers to a spread that is very even and regular no matter which section of the chart you look at. This is what you see in the first chart.

- We all know that shorter people weigh less and taller people weigh more. But, what if most 5 foot tall women
- weigh between 90 and 100 pounds while most 6 foot tall women weigh between 130 and 170

points. The range of 10 pounds at 5 feet is very different from the range of 40 pounds at 6 feet. That’s a lot of heterobebijicty! - We also know that people who study a lot tend to get higher grades. Now, what if people who studied 1 hour per week got a D while people who studied 2 hours per week got a C, B, or A? Once again, 1 hour resulted in one possible grade while 2 hours resulted in three possible grades. That’s even more heteroihjusdfgicty.
- And, what if jogging for 30 minutes burns 200 to 250 calories while jogging for 60 minutes burns 400 to 500 calories. Half an hour resulted in a range of 50 calories while a full hour resulted in a range of…. also 50 calories per half hour. That’s a lot of…. homoscedasticity!

So the next time you’re wondering why your correlation coefficient or regression equation isn’t as nice as what you had hoped for, have at look for heteroscedasticity. And make it a habit to look before you statisticize.

###### Related articles

- Really Simple Statistics: T-Tests
- Really Simple Statistics: p values
- Really Simple Statistics: Nominal Ordinal Interval and Ratio Numbers
- Really Simple Statistics: What is Ratio Data
- Really Simple Statistics: What is Ordinal Data?
- Really Simple Statistics: What is Nominal Data?
- Really Simple Statistics: What is Interval Data?
- Really Simple Statistics: What is a standard deviation?
- Really Simple Statistics: Sample Sizes
- The forgotten side of segmentation
- Your survey questions are all wrong
- What sample size do I need?
- Why do people like marketing research surveys?

# Really Simple Statistics: Sample Sizes #MRX

Welcome to Really Simple Statistics (RSS). There are lots of places online where you can ponder over the minute details of complicated equations but very few places that make statistics understandable to everyone. I won’t explain exceptions to the rule or special cases here. Let’s just get comfortable with the fundamentals.

** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **

I’m going to guess that the #1 question every researcher has when they start a research project is this: How many people do I need to measure? If you want a simple answer, then you need to measure 1000 people per group. Unfortunately, that’s not the answer most people like. Unlike the hope the post title gave you, there really isn’t a simple answer.

### Do you plan to look at subgroups?

If you plan to split out subgroups in your data, then you need to make sure each group will have a large enough sample size. Do you plan to compare men and women? Do you want to see if older people generate different results than younger people? Are you comparing TV commercial #1 with TV commercial #2?

If you only have the budget to measure 100 people but you plan to split that group into people aged 18 to 34 and 35+, and then by gender, then you will only have 25 people in each group. That simply isn’t enough to be sure that the results you find are more likely to be real than due to chance. Every single group you look at needs to have a large enough sample size to ensure the results aren’t due to chance. And if that means each of your 15 groups needs to have at least 100 people each, then you’ll need to increase your budget or decrease the number of groups you look at.

### How big of a difference are you expecting?

If you think an important difference between your groups will be small, then you will need a large sample. For instance, if you’re testing the effectiveness of a health and wellness campaign, any small difference will make a big improvement in people’s lives. You don’t care if the improvement is small, perhaps an increase in effectiveness of 1% or 2%. You care that 1% or 2% of people are doing better. We know pure chance can easily give us numbers that are 2% different. To try to counter random chance, we need to use very large sample sizes. Hundreds or thousands is probably the more appropriate number.

And vice versa – if you think an important difference will be big, then you can get away with a smaller sample. Perhaps you’re testing a new scent of air freshener. Really, you don’t care if 1% of people like it more than the existing scents. You only care if 10% or 15% of people like it more than the existing scent. It’s much harder for random chance to create sets of numbers that are 10% different so this time, we don’t need to use such a large sample size. You might be able to get away with just a couple of hundred.

### Are you measuring once or several times?

If you are measuring something more than once, perhaps tracking it on a weekly or monthly basis, each sample size can be smaller. For instance, you might determine that a one time measure should be 300 but a weekly measure need only be 100 per week for 6 weeks. As before, it’s hard for random chance to produce similar results every single week for 6 weeks so we don’t need as large a sample size each time.

If you’re looking for some specific direction, then check out this list of statistical calculators. Be prepared for some very technical terms though!

Simple!

###### Related articles

# Really Simple Statistics: What is a standard deviation? #MRX

Welcome to Really Simple Statistics (RSS). There are lots of places online where you can ponder over the minute details of complicated equations but very few places that make statistics understandable to everyone. I won’t explain exceptions to the rule or special cases here. Let’s just get comfortable with the fundamentals.

** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **

Standard deviations are massively popular in all aspects of market research reporting. Any time someone tells you an average number, they’ll probably tell you what the standard deviation is at the same time, even if you didn’t ask for it. At it’s most basic level, a standard deviation is a number that tells you how similar a set of numbers is.

For now though, let’s forget about all the technical language and think about a casual application. In your immediate family, most of the women are probably similar to each other in terms of their height. If your mom is 5 foot 3, chances are that many other women in your family are somewhere around 5 foot 3, and in fact most of them are probably within an inch or two of 5 foot 3. The “normal” woman is about 5 foot 3 and there is very little differentiation or deviation among the heights. The deviation is small.

On the other hand, get out the wooden ruler you’ve saved since public school, the one with your 4ever true love engraved on it, and hold it up to their hair. Some of the women have really long hair, others have shoulder length hair, while still others have short and snazzy hair. There’s a lot of differentiation, a lot of disagreement, a lot of deviation in their hair lengths. Sure the average or normal length might be 8 inches, but the deviation from the norm could easily be 8 inches. The deviation is large.

In the market research space, you can look at standard deviations in a similar way. It can be interpreted as the amount of disagreement among people’s opinions. Let’s consider 100 answers to a purchase intent question asked on a five point scale from Definitely Will Buy all the way to Definitely Will Not Buy.

- If 50 people answered definitely will buy and 50 answered definitely will NOT buy, that’s a big difference among the answers, a lot of disagreement, a lot of differentiation. Half of the people are checking off the 5 and half of the people are checking off the 1. People haven’t come to any consensus on whether they agree or disagree. In technical words, that clear disagreement indicates a wide or large standard deviation. These wide standard deviations make our work as market researchers more difficult. It’s hard to recommend a new product when people can’t agree on whether they would buy it.
- But, if 90 people answered definitely will buy and 10 people answered probably will buy, there’s a lot of agreement there. 90% of the people are checking off the 5 and 10% of people are checking off the 4. People are generally agreeing with each other. They pretty much all intent to buy though some are a little more sure about that purchase than others are. That agreement reflects, inversely, very little differentiation, very little disagreement. It indicates a very narrow or small standard deviation. This is what market researchers love to see. We have a clear answer to our question and can proceed to recommend a product that most people would like to buy.

So here’s the general scoop:

- Small standard deviation = Lots of agreement among the opinions
- Large standard deviation = Lots of disagreement among the opinions

It’s that simple!

###### Related articles

- Really Simple Statistics: T-Tests
- Really Simple Statistics: p values
- Really Simple Statistics: Nominal Ordinal Interval and Ratio Numbers
- Really Simple Statistics: What is Ratio Data
- Really Simple Statistics: What is Ordinal Data?
- Really Simple Statistics: What is Nominal Data?
- Really Simple Statistics: What is Interval Data?

# Really Simple Statistics: What is sampling error? #MRX

** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **

What is sampling error? First, you need to understand what sampling is. Sampling is choosing a smaller set of data/people/things to reflect the entire population. For instance, instead of measuring the height of everyone in your office, you might just measure the height of ten people. Or, instead of asking every person in Canada who they intend to vote for, you choose a sample of 2000 people to ask.

In the process of sampling, you gather 10 heights instead of 100 heights, or you gather 100 opinions instead of 1000 opinions. Either way, you don’t gather every possible data point and that means the summary numbers you generate will probably not be exactly the same had you measured every data point. The process of sampling introduces error and it cannot be avoided.

In addition to sampling error, most research studies are affected by other errors that also take place during the sampling process. This includes coverage errors, non-response errors, self-selection errors, and more. Consider these obvious sampling biases:

- The ten tallest people in your office were away at a “Retreat for tall people” and you didn’t wait to include them in your height sample.
- The ten Asian people in your office were away at a “Retreat for Asian people” and therefore couldn’t be part of your height sample (hm…. aren’t Asian people know for being shorter than average?”
- When you were gathering opinions on voting intentions, you only asked people who were attending a gala for a particular political candidate

Running a survey and you’re positive your sampling plan is perfect?

- Does everyone have a telephone in order to respond to your telephone survey?
- Does everyone have a home where they can receive a mail survey?
- Does everyone have a computer where they can receive an email survey?

Running social media research and you’re positive your sampling plan is perfect?

- Does everyone feel comfortable leaving comments on blogs?
- Does everyone have a public facebook page?
- Does everyone use Twitter?

Of course, these are the obvious errors taking place during the sampling process. Tiny mistakes are always made in the sampling process, particularly when you must first decide from where to gather opinions. The trick is to **ALWAYS assume that your sampling plan includes error**.

###### Related articles

- Really Simple Statistics: T-Tests
- Really Simple Statistics: p values
- Really Simple Statistics: Nominal Ordinal Interval and Ratio Numbers
- Really Simple Statistics: What is Ratio Data
- Really Simple Statistics: What is Ordinal Data?
- Really Simple Statistics: What is Nominal Data?
- Really Simple Statistics: What is Interval Data?

# Really Simple Statistics: What is Ratio Data #MRX

Last in the series of 4 types of data is ratio data. Ratio numbers have all the features of the previous numbers we’ve talked about plus one more. So, with ratio numbers, we know that certain numbers are bigger than other numbers (ordinal), and we know that the difference between numbers is meaningful (interval). The single feature that separates ratio numbers from the other numbers is that the number zero is relevant. Here are some examples.

- I bought 5 chocolate bars today. That’s two chocolate bars plus three chocolate bars. Five chocolate bars is five times as many as 1 chocolate bar.
- My buddy Justin Bieber had 1 chocolate bar but he gave it to me. He now has zero chocolate bars and I have six.
- 100% of the treats in my hand are chocolate bars. If I give two of them to Justin, Justin has forty percent of the chocolate bars. And, if I give all six of them to Justin, I now have 0% of the chocolate bars and he has 100% of them.

So here are the important distinctions:

- Most importantly, the zero makes sense. It is an absence of all things chocolate. It’s not less chocolate or smaller chocolate. It’s zero chocolate. 😦
- The spaces between the numbers make sense. 4 bars is exactly 1 more than 3 bars.
- We can tell when I have more bars than Justin. If i can hold bars in both of my hands and Justin only has a bar in one hand, I obviously have more than he does.

It’s that simple!

###### Related articles

- Really Simple Statistics: What is Interval Data? #MRX (lovestats.wordpress.com)
- Really Simple Statistics: What is Nominal Data? #MRX (lovestats.wordpress.com)
- Really Simple Statistics: What is Ordinal Data? #MRX (lovestats.wordpress.com)

# Really Simple Statistics: What is Interval Data? #MRX

Ready? Today we tackle interval data. Unlike nominal data which has no real relationship with numbers, and ordinal data which shows orders among numbers, interval data can show specific relationships among numbers.

Examples of interval data include:

- Yesterday is exactly one day away from today and exactly two days away from tomorrow. A day is the same no matter which day of the week you look at. But, you can’t say that tomorrow is twice as far from yesterday as it is from today. (Woh! That’s confusing!)
- 20 degrees Celsius is exactly one degree different from 21 degrees and exactly 10 degrees different from 30 degrees. A degree is the same no matter where you measure it. But, you can’t say that 30 degrees is 50% hotter than 20%.

The important distinction with interval data is this:

- The numbers have real meaning
- The numbers have a real order
- The differences between the numbers are measurable

And that’s all there is to it!

.

###### Related articles

# Really Simple Statistics: What is Ordinal Data? #MRX

Today we tackle another kind of number. Unlike nominal numbers, ordinal numbers have real meaning behind them. The name itself hints at the meaning. Ordinal numbers portray ordered numbers.

But, the only thing we know about the numbers is that there is an order to them. For example, there are more cookies in the first picture than there are in the second. But, we can’t see the whole picture, so we don’t know how many more cookies are in the first picture. We could assign a a 2 to the first picture and a 1 to the second picture, but we wouldn’t be able to say that there are twice as many cookies in the first picture. Just that there are more. Here are some examples of ordinal data.

- A big handful of rice vs a small handful of rice. Why: We don’t know how much rice is in each hand but we can see there is more in one than the other.
- Someone who is a bit shy vs someone who is really shy. Why: We don’t how much more shy the really shy person is, but we know they are more shy.
- Questions on surveys where the answers look like: Strongly agree, somewhat agree, somewhat disagree, strongly disagree. Why: We don’t know how much more “strongly” is compared to “somewhat” but we do know it’s more.
- This is more than that. This is lighter than that. This is heavier than that. This is taller than that. This is bluer than that. This is tastier than that. This feels more rough than that. This smells worse than that. This is longer than that. This is earlier than that. This is faster than that.

- Something is more or less than the other thing
- We don’t know how much more or less it is

.