“A Model-Based Approach for Achieving a Representative Sample”
Although the enterprise of online research (with non-probability samples) has witnessed remarkable growth worldwide since its inception about 15 years ago, the US public opinion research community has not yet embraced it, partly because of concerns over data reliability and validity. The aim of this project is to rely on data from a recent, large-scale ARF study to develop an optimal model for achieving a representative sample. By that, we mean one that reduces or eliminates the bias associated with non-probability sampling. In addition to the presenter, this paper was authored byJohn Bremer, (Toluna) and Carol Haney (Toluna).
- George Terhanian, Group Chief Strategy & Products Officer ,Toluna
- The key is representativeness. This topic is not new, we talked about it 15 years ago. Criticisms are not new – Warren Mitofksy said the willingness to discard sampling frames and feeble attempts at manipulating the resulting bias undermines the credibility of the research process
- SLOP – Self, selected, opinion, panel. [Funny!]
- Growth of online research remains strong as it has since the beginning.
- 2011 – AAPOR needs to promote flexibility not dogmatism, established a task force on non-probability methods. Identified sample matching as most promising non-probability approach. Did not offer next steps or an agenda.
- Study with 17 different companies in FOQ study
- Researchers should use the ARF’s FOQ2 data to test on-probability sampling and representativeness
- Used a multi-directional search algorithm (MSA)
- Bias is difference between what respondents report and what we know to be true. e.g., Do you smoke? Benchmark vs panel scores.
- Reduce bias through 1) respondent selection or sampling and 2) post hoc adjustment or weighting [I always prefer sampling]
- FOQ2 suggests weighting needs to include additional variables such as demographics, secondary demographics (household characteristics), behaviours, attitudes
- [If you read my previous post on the four types of conference presenters, this one is definitely a content guru 🙂 ]
- Using only optimal demographics, panel and river sample were reasonably good, reduced bias by 20 to 25%. Time spent online helps to reduce bias and is a proxy for availability in terms of how often they take surveys
- Ten key variables are age gender region, time spent online, race, education [sorry, missed the rest]
- Other variables like feeling hopeful, , concern about privacy of online information were top variables [sorry, missed again, you really need to get the slides!]
- Need to sample on all of these but don’t need to weight on all of them
- [I’m wondering if they used a hold-back sample and whether these results are replicable, the fun of step-wise work is that random chance makes weird things happen]