Tag Archives: statistics

SPSS Releases A New Extraordinary Version #MRX

spss screen capspss screen capspss screen capspss screen capIt’s always a happy day when you get your hands on the newest version of software. Today, I get to show you some of the features that will be upcoming on the next version of SPSS. Hold on tight because us researchers are in for a real treat.

First of all is a new feature in the Edit section. Normally, we find ourselves going through an entire set of data, an entire analysis of the data, only to realize we discovered absolutely no insights. Have no fear. Just choose the “Insert Insights” button before you start your analysis and you’ll be sure to discover a variety of valuable insights throughout your analysis.

Second, though a lot of research projects are very interesting to the client and the brand manager, they’re often not very interesting to the researcher. Sure, I use toilet paper, but I just don’t find it to be an exciting category except when I unpleasantly discover we’re all out. Well, SPSS has solved that problem too. Just choose the “Recode Into Interesting” button under Transform, and the data will now be fun and interesting to analyze. I do not believe, however, that it will restock the toilet paper.

Third, we know that a picture is worth a thousand words. We also know how much effort and design skill is required to turn pages and pages of numbers into interesting and meaningful images. Thanks to SPSS, now all you have to do is choose the “Cool Infographic” button under Analyze/Descriptives and all that hard work is done for you.

Fourth, finally, and probably the best feature of all. We all know the disappointment of conducting an entire research project only to find that the most important research result was only significant at p=0.06. Alas, never worry about this terrible problem again. Just choose the “Make Significant” button and the result will become significant to the next closest break point, whether that means 0.05, 0.01, or 0.001.

Thank you SPSS for letting me review your next release. I can’t wait until everyone gets to use it!

Through the Eyes of A Market Research Methodologist #MRX

My name is Annie and I’m a market research methodologist. What does that mean you ask? Well, it means I pay more attention to the research design of a project than the actual product being researched. I didn’t realize until recently how it impacts the way I perceive the world around me but here are a few examples that just might explain.

Actual message: There is currently a nutrition supplement commercial on TV that proudly proclaims “Every drop has Vitamin D.”
What they’re saying: If you aren’t getting enough Vitamin D, you can use our product to compensate.
What I process: Jeez, this soda cracker in my hand also contains Vitamin D. Doesn’t the nature of the universe dictate that every ingestible product has vitamin D? Isn’t the issue how much vitamin D and whether that amount is sufficient to over-come my lack of Vitamin D? Get back to me when you can quantify what you intend to give me and what I actually need.

Actual message: In a laboratory study with 34 mice, excess doses of sugar were found to cause severe and untreatable cancer.
What they’re saying: Stop eating so much sugar or it’ll kill you.
What I process: n=34? What kind of statistical power is that? You can’t conclude anything from that other than you need to finish up your exploratory research and conduct some confirmatory research on humans. Get back to me when n=300 humans.

Actual Message: In a poll of 300 Americans, Obama leads in voting intentions with 49% favouring him and 45% favouring Romney (+/- 5.6%, CI=95).
What they’re saying: Obama is winning.
What I process: Plus or minus 5.6%? In other words, 49% equals 45%. There is no winner here! Why are you telling people there is? Why are you misleading everyone? Tell me someone is leading when your sample size is large enough to make those numbers significantly different.

3d florescent pie chart

Actual message: This pie chart indicates that 29% of people die from cardiovascular death while 23% of people die from infectious or parasitic disease.
What they’re saying: Cardiovascular and infectious disease are major causes of death.
What I process: How are you unable to see that 29 + 23 does not add up to 100? How do you not know that  pie chart data needs to add up to 100%? Is this why people have such trouble understanding the news? Because it’s not presented properly?

So now I’m curious. Are your blinders on the research method or on the research conclusions?

Really Simple Statistics: What is heteroscedasticity? #MRX

really simple statisticsWelcome to Really Simple Statistics (RSS). There are lots of places online where you can ponder over the minute details of complicated equations but very few places that make statistics understandable to everyone. I won’t explain exceptions to the rule or special cases here. Let’s just get comfortable with the fundamentals.

** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **

Oh my, what a gorgeous heteroscedasticity you have! You mean other than a really cool eight syllable statistics word that you can show off with in front of friends?

This long and lovely word comes into play when you’re dealing with pairs of variables – perhaps height and weight, or grades and time spent studying, or voting behaviour and time spent reading the political section of the paper. It has mean and nasty effects on correlation coefficients and regression models so pay attention!

Specifically, it refers to the distribution of numbers for one variable in relation to  the distribution of numbers for another variable.  Homoscedasticity refers to a spread that is very even and regular no matter which section of the chart you look at. This is what you see in the first chart.

Heteroscedasticity refers to a spread that is uneven and irregular – like the second scatterplot you see here. The datapoints are very close to each other in the bottom left but then they are spread out a lot in the top right.More examples please!
  1. We all know that shorter people weigh less and taller people weigh more. But, what if most 5 foot tall women
  2. weigh between 90 and 100 pounds while most 6 foot tall women weigh between 130 and 170
    points. The range of 10 pounds at 5 feet is very different from the range of 40 pounds at 6 feet. That’s a lot of heterobebijicty!
  3. We also know that people who study a lot tend to get higher grades. Now, what if people who studied 1 hour per week got a D while people who studied 2 hours per week got a C, B, or A? Once again, 1 hour resulted in one possible grade while 2 hours resulted in three possible grades. That’s even more heteroihjusdfgicty.
  4. And, what if jogging for 30 minutes burns 200 to 250 calories while jogging for 60 minutes burns 400 to 500 calories. Half an hour resulted in a range of 50 calories while a full hour resulted in a range of…. also 50 calories per half hour. That’s a lot of…. homoscedasticity!

So the next time you’re wondering why your correlation coefficient or regression equation isn’t as nice as what you had hoped for, have at look for heteroscedasticity. And make it a habit to look before you statisticize.

Really Simple Statistics: Sample Sizes #MRX

really simple statisticsWelcome to Really Simple Statistics (RSS). There are lots of places online where you can ponder over the minute details of complicated equations but very few places that make statistics understandable to everyone. I won’t explain exceptions to the rule or special cases here. Let’s just get comfortable with the fundamentals.

** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **

I’m going to guess that the #1 question every researcher has when they start a research project is this: How many people do I need to measure? If you want a simple answer, then you need to measure 1000 people per group. Unfortunately, that’s not the answer most people like. Unlike the hope the post title gave you, there really isn’t a simple answer.

Do you plan to look at subgroups?


If you plan to split out subgroups in your data, then you need to make sure each group will have a large enough sample size. Do you plan to compare men and women? Do you want to see if older people generate different results than younger people? Are you comparing TV commercial #1 with TV commercial #2?

If you only have the budget to measure 100 people but you plan to split that group into people aged 18 to 34 and 35+, and then by gender, then you will only have 25 people in each group. That simply isn’t enough to be sure that the results you find are more likely to be real than due to chance. Every single group you look at needs to have a large enough sample size to ensure the results aren’t due to chance. And if that means each of your 15 groups needs to have at least 100 people each, then you’ll need to increase your budget or decrease the number of groups you look at.

How big of a difference are you expecting?

If you think an important difference between your groups will be small, then you will need a large sample. For instance, if you’re testing the effectiveness of a health and wellness campaign, any small difference will make a big improvement in people’s lives. You don’t care if the improvement is small, perhaps an increase in effectiveness of 1% or 2%. You care that 1% or 2% of people are doing better. We know pure chance can easily give us numbers that are 2% different. To try to counter random chance, we need to use very large sample sizes. Hundreds or thousands is probably the more appropriate number.

And vice versa – if you think an important difference will be big, then you can get away with a smaller sample. Perhaps you’re testing a new scent of air freshener. Really, you don’t care if 1% of people like it more than the existing scents. You only care if 10% or 15% of people like it more than the existing scent. It’s much harder for random chance to create sets of numbers that are 10% different so this time, we don’t need to use such a large sample size. You might be able to get away with just a couple of hundred.

Are you measuring once or several times?

If you are measuring something more than once, perhaps tracking it on a weekly or monthly basis, each sample size can be smaller. For instance, you might determine that a one time measure should be 300 but a weekly measure need only be 100 per week for 6 weeks. As before, it’s hard for random chance to produce similar results every single week for 6 weeks so we don’t need as large a sample size each time.

If you’re looking for some specific direction, then check out this list of statistical calculators. Be prepared for some very technical terms though!


Correlation vs Causation: Give me popcorn and a movie will be shown #MRX

Church makes you fat and news stories make you stupid.

Just like that news article, misunderstandings about correlation and causation are rampant. We all know that correlations reflect things that tend to occur together whereas causation reflects one thing that makes the other thing happen. Well bah humbug. Enjoy these “correlations.”

  • Mass umbrella openings cause rain to fall
  • Putting on shorts and t-shirts causes it to be warm outside
  • Opening a door causes a person to walk through it
  • Buying popcorn causes movies to be shown
  • Buying plus sized clothing causes you to gain weight
  • Flying your own jet causes you to be a multi-millionaire

Really Simple Statistics: What is sampling error? #MRX

really simple statistics

Welcome to Really Simple Statistics (RSS). There are lots of places online where you can ponder over the minute details of complicated equations but very few places that make statistics understandable to everyone. I won’t explain exceptions to the rule or special cases here. Let’s just get comfortable with the fundamentals.

** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **

What is sampling error? First, you need to understand what sampling is. Sampling is choosing a smaller set of data/people/things to reflect the entire population. For instance, instead of measuring the height of everyone in your office, you might just measure the height of ten people. Or, instead of asking every person in Canada who they intend to vote for, you choose a sample of 2000 people to ask.

A row of Asian Short claws!

Image via Wikipedia

In the process of sampling, you gather 10 heights instead of 100 heights, or you gather 100 opinions instead of 1000 opinions. Either way, you don’t gather every possible data point and that means the summary numbers you generate will probably not be exactly the same  had you measured every data point.  The process of sampling introduces error and it cannot be avoided.

In addition to sampling error, most research studies are affected by other errors that also take place during the sampling process. This includes coverage errors, non-response errors, self-selection errors, and more. Consider these obvious sampling biases:

  • The ten tallest people in your office were away at a “Retreat for tall people” and you didn’t wait to include them in your height sample.
  • The ten Asian people in your office were away at a “Retreat for Asian people” and therefore couldn’t be part of your height sample (hm…. aren’t Asian people know for being shorter than average?”
  • When you were gathering opinions on voting intentions, you only asked people who were attending a gala for a particular political candidate

Running a survey and you’re positive your sampling plan is perfect?

  • Does everyone have a telephone in order to respond to your telephone survey?
  • Does everyone have a home where they can receive a mail survey?
  • Does everyone have a computer where they can receive an email survey?

Running social media research and you’re positive your sampling plan is perfect?

  • Does everyone feel comfortable leaving comments on blogs?
  • Does everyone have a public facebook page?
  • Does everyone use Twitter?

Of course, these are the obvious errors taking place during the sampling process. Tiny mistakes are always made in the sampling process, particularly when you must first decide from where to gather opinions. The trick is to ALWAYS assume that your sampling plan includes error.

8 Christmas Gifts for Market Researchers #MRX

This isn’t necessarily my Christmas list but I have to admit a few of these would make me giddy. So, in no particular order, I give you geek gifts for market researchers.

1. Math Clock

mathematics clock

2. Pie Shirt

3.14 pi tshirt

3. Statistics Mug

statistics mug

4. Normal Distribution Ornaments

normal distribution ornament

5. Statistics Apron

statistics apron

6. Statistician Bib

statistician baby bib

7. Made Up Statistics

made up statistics

8. iPhone Cover

statistics iphone cover

Engage Your Users and Bring Home the Bacon #MRX

A cooked rasher. Raw bacon rashers are an esse...

Image via Wikipedia

[Fans of bacon should read this version instead.]

Do a quick search on Twitter or Google and you’ll instantly find 412 653 ways to encourage people to engage more with your product. Reply to people, ask questions, use polls, give a call to action, request videos and photos, give them user accounts. The list goes on and on.

Why do we do this? Because research says that when users are more engaged with something, they spend more money on it. And we all like money.

But wait. Something i paid little attention to in my introductory statistics class is nagging me. It’s telling me not to leap to assumptions. It’s telling me that just because someone spends more time on something does not mean they will consequently spend more money on something. It’s telling me that correlation does not equal causation.

You see, people who are more engaged with something, a website, a shoe company, a kitchen supply shop are already invested in it, more than people who are less engaged with something. These people are more engaged because they like the company. They buy the product because they like the company. They do not buy the product because they are engaged with the company.

It’s very easy to forget the direction of relationships among variables. Sure, you can convince a bunch of people to become more engaged, and sure some of them will grow to like the company, and sure some of them will make a purchase. But don’t fool yourself into thinking you can get a bunch of vegetarians to eat bacon by convincing them to create a user profile on your bacon website and share bacon recipes with them. The common denominator isn’t engagement. It’s the bacon. It’s always the bacon.

Statistics Poetry by Geeks and Nerds #MRX #Statistics

I couldn’t help myself. The creative juices started to flow and poetry spewed out of me like warm icing from pastry bag. And then, one poem let to another and another and the twittersphere united in poetry goodness. Do enjoy, then tweet your own poem about statistics, charts, or research methods. I’ll add it here. Enjoy!

Pumpkin Pi for Halloween #Math #Stats

pumpkin pie halloweenpumpkin pie halloween

You think I’m a nerd? Check out the creativity of other pi fans…

pumpkin pi 
pumpkin pi pumpkin pipumpkin pi pumpkin pi     

%d bloggers like this: