Live blogged from Nashville. Any errors or bad jokes are my own.
Third author is Peter Kwok
we moved many offline sampling techniques to online sampling. now we have river and dynamic sourcing and routers.
– should we use one or other of both of outgo quotas or return quotas
– balancing quotas are set from sampling frames. usually region, age, gender, household size, often based on US census.
– survey quotas are determined by respondent profiles or subject category.
– some populations are really hard to find. not everyone is simply looking for genpop
– sample frames may not reflect the target populations
– females can respond 20 or more points higher than men
– with river or dynamic sampling, you don’t even know the demos that you’re getting
– router selection is efficient use of respondents but there’s not as much quota control compared to traditional sampling that uses outgo and return balancing
– traditional sampling focused on a specific person for a specific study
carried out a study using various sampling techniques. used interlocking age and gender, plus region.
– 10 minutes, grocery shopping habits, census quotas
cell 1 – 4 balancing variables including income, quotas for outgo
cell 2 – only used age, gender, region quotas on outgo
then weighted to census
– better weights on cell 1, better weight efficiency, minimum weights, maximum weights.
– every type of sample has skews [yer darn right! why do people forget this?]
– controlling for age, gender, region just wasn’t enough
– income and household size did not represent well when they weren’t initially balanced for, marital status also didn’t work well
– some of the profiling questions showed differences as well – belonging to a warehouse club showed differences, using a smartphone to help with chopping showed differences
– quotas do not guarantee a representative sample. additional controls are necessary on outgo. with more controls, weighting can even be unnecessary
– repetition is good. repetition is good. repetition is good (i.e., test-retest reliability is good!)
we need to retain our sample expertise. be smart. learn about sampling and do it well. keep the good things about the traditional ways.
[please please control on the outgo and returns if you can. weighting as a strategy is not the way to think about this. get the sample you need and fuss with it as little as possible through weighting]
create a sampling plan based on the required characteristics for the project
- go into field and gather data from research participants such that the returns look similar to the sampling plan
- if the difference between the sampling plan and your returns is large, go back to step 2
- if the difference is small, weight the returns to match your sampling plan
- if upon analysis of the results you determine that your weights were calculated incorrectly, reweight the data
Did you catch that? You can’t reweight data unless you’ve already weighted it at least once before. So, if you catch yourself using the word reweight in your next report, ask yourself if you really did reweight the data. You probably didn’t. So don’t say it.
Today’s grammar and statistics rant is brought to by the number pink and the letter seventeen.
Few research projects skip the weighting process. It’s an important component that ensures the results are representative of a population. Whether the population is offline census rep, online internet rep, mothers of infants rep, or some other kind of rep, weighting lets us adjust the demographic and psychographic characteristics back to population rates so that the research results will generalize properly to the population.
We could argue forever about the best way to weight. Is it okay to weight data times 3 if the size of that group is 1000? But not ok to weight data times 1.5 if the size of the group is only 25? Where do we draw the line? At what weight and what sample size?
I challenge you, however, to stop weighting your data. Weighting isn’t the solution. Weighting simply gives fewer people a louder voice. If we wanted fewer people, we should have just sampled fewer people.
I challenge you to sample properly. We know the response rates of men and women, young and old, more educated and less educated. We even know the response rates of 21 year old male hispanics with 2 years of college. We know this. So sample for it. If you know you need 50 completes from a low responding group of people, then you’ll have to sample 2000 of them. That’s just how it works.
Sample first. Weight less. Weighting is for dummies.
- SAS vs SPSS: Pick one and forever hold your peace #MRX (lovestats.wordpress.com)
Weighting is a very common process in the research process. You might even use it every single time you run a study. I’m going to go against the grain here and challenge you to think about it more carefully.
Let’s look at a simple example. Let’s take a data set that is made of 40% men and 60% women. The men produced an average score of 38% and the women an average score of 48%. That gives a raw score of 44%. But, because the population is 50% male and 50% female, we want to weight the results back to that target. That gives us a weighted score of 43%. So, the raw score is 44% and the weight score is 43%. Is it really all that different? Does that really change the business decision? The answer to this question should be “Absolutely not because my confidence interval is 3 points.” Makes sense, doesn’t it. If the raw score is basically equal to the weighted score, what are you doing weighting data?
Now, i’m not saying don’t weight your data. I’m just saying think twice before you weight your data. UNDERSTAND how weighting works before you use it. Here are some thoughts in relation to weighting:
1) Do not expect your scores to change very much.
2) If your scores are changing a lot, your sample is too different from the population and your weighted scores are probably not very reliable. You probably have tiny sample sizes that should be thrown out, not weighted.
3) If your scores aren’t changing very much, why are you weighting? Data varies and comes with confidence intervals. You’re probably just shifting the score around within it’s confidence interval. So why bother.
4) If you are using weighting, do not weight because you didn’t get enough of a particular demographic group. Weight because one group was too large.
Moral of the story: Use the largest sample you can afford, and pull it so that it will be as representative as possible when you are done.