Weighting – Is it all it’s cracked up to be?

Weighting is a very common process in the research process. You might even use it every single time you run a study. I’m going to go against the grain here and challenge you to think about it more carefully.

Let’s look at a simple example. Let’s take a data set that is made of 40% men and 60% women. The men produced an average score of 38% and the women an average score of 48%. That gives a raw score of 44%. But, because the population is 50% male and 50% female, we want to weight the results back to that target. That gives us a weighted score of 43%. So, the raw score is 44% and the weight score is 43%. Is it really all that different? Does that really change the business decision? The answer to this question should be “Absolutely not because my confidence interval is 3 points.” Makes sense, doesn’t it. If the raw score is basically equal to the weighted score, what are you doing weighting data?

.    Gender Sample Population Score
.    Male 40% 50% 38.0%
.    Female 60% 50% 48.0%
.    RAW

.    WEIGHTED 43.0%

Now, i’m not saying don’t weight your data. I’m just saying think twice before you weight your data. UNDERSTAND how weighting works before you use it. Here are some thoughts in relation to weighting:

1) Do not expect your scores to change very much.
2) If your scores are changing a lot, your sample is too different from the population and your weighted scores are probably not very reliable. You probably have tiny sample sizes that should be thrown out, not weighted.
3) If your scores aren’t changing very much, why are you weighting? Data varies and comes with confidence intervals. You’re probably just shifting the score around within it’s confidence interval. So why bother.
4) If you are using weighting, do not weight because you didn’t get enough of a particular demographic group. Weight because one group was too large.

Moral of the story: Use the largest sample you can afford, and pull it so that it will be as representative as possible when you are done.

Related Posts


  • Conversation Is Overrated The Psychological Theory of Positive Reinforcement
  • Qual or Quant – Pick one! LoveStats: Now in Alltop
  • #MRA_FOC #MRX MRA Articles of Incorporation, no longer 53 years old
  • Art, History, and Culture in Chicago Two Balloons and a Great Song
  • Survey Panel Questions – Enough Already! Word Cloud of my Resume – Wicked Cool!
  • Survey Design Tip #4: Brands are people too
  • Advertisements

    7 responses

    1. I have definitely seen weighting abused. As a general rule of thumb, I would say that if, after doing the survey, you decide you need to do weighting, you are fooling yourself about the reliability of your data. If you are going to use weighting, you should decide that at the sample design stage, not later.

    2. I wrote a book called “Why Are You Weighting?” (subtitle “It’s Not The Food That’s Making You Fat!”) so your blog popped up for me because of your use of “why are you weighting”?

      Different type of weighting! LOL


    3. You’re not just shifting the score within the confidence interval because each time the score changes, the confidence interval around it shifts as well, so a change of 1% in the score shifts the confidence interval by 1%. But yeah, your 4th point makes a lot of sense: you can’t use weighting to improve how well one of the subgroups in your sample is represented, because you’re just fooling yourself.

    4. Hi Annie,

      You seem to be assuming that people are using a simple random sample, and weighting based on post-stratification. Maybe that’s generally true in your specific field.

      What you’re describing is a small design effect. What I have mostly encountered is truly stratified samples, where the stratification is built into the sampling design, but data analysts completely ignore it (usually unintentionally, out of ignorance). I’ve seen results change drastically.

      While I agree that you should check the design effect and drop weights if it’s small, I think you need to be careful about saying weights are usually not necessary. They (and using appropriate survey functions that accommodate complex survey designs) very often are.


    5. Yeah, makes sense for very simple weightings (such as the example you provided), but in an applied setting weighting is usually conducted on multiple variables simultaneously. Even the most simple examples would involve age and gender and region, where it is much more difficult to assess intuitively what the impact will be. Generally many samples in an applied setting also involve either qoutas on specific sub- groups and/or boost samples of particular sub samples of interest. In these cases weighting is absolutely vital to ensure that the overall sample results are more reflective of the population of interest.

      As with any weighting exercise, thinking about the weighting at the sample design stage is absolutely vital to ensure that you don’t end up massively up/down weighting particular respondents.

      Really weighting is used in order that we can adopt the most parsimonious (read cost effective) sample possible that will reflect the population of interest.

      While your thoughts might be relevant for simple random sample surveys, not weighting the type of samples we are dealing with in applied market research is likely to lead to less reliable estimates.

      Just some thoughts from the coalface!

    6. My sentiments exactly! I think we often use models that are much more complicated than necessary to impress our peers (i.e., other statisticians) when it does very little to improve the accuracy of our conclusions and unnecessarily confuses the people we are trying to inform.

      I am glad to see I am not the only contrarian in the room.

      1. Yup, I often get the feeling that some researchers/statisticians forget the difference between statistics in the lab and statistics in the real world. Variables are not truly random in the real world so applying weighting to adjust an odd combination of age and gender and income and education and household size and religion and region and language (phew) can end up giving far too much weight to that strange, completely unrepresentative person. I’m exaggerating, of course, but you get the point.

    %d bloggers like this: