Probability and Non-Probability Samples in Internet Surveys #AAPOR #MRX


AAPOR… Live blogging from beautiful Boston, any errors are my own…

Probability and Non-Probability Samples in Internet Surveys
Moderator: Brad Larson

Understanding Bias in Probability and Non-Probability Samples of a Rare Population John Boyle, ICF International

  • If everything was equal, we would choose a probability sample. But everything is not always equal. Cost and speed are completely different. This can be critical to the objective of the survey.
  • Did an influenza vaccination study with pregnant women. Would required 1200 women if you wanted to look at minority samples. Not happening. Influenza data isn’t available at a whim’s notice and women aren’t pregnant at your convenience. Non-probability sample is pretty much the only alternative.
  • Most telephone surveys are landline only for cost reasons. RDD has coverage issues. It’s a probability sample but it still has issues.
  • Unweighted survey looked quite similar to census data. Looked good when crossed by age as well. Landline are more likely to be older and cell phone only are more likely to be younger. Landline more likely to be married, own a home, be employed, higher income, have insurance from employer.
  • Landline vs cell only – no difference on tetanus shot, having a fever. Big differences by flu vaccination though.
  • There are no gold standards for this measure, there are mode effects,
  • Want probability samples but can’t always achieve them

A Comparison of Results from Dual Frame RDD Telephone Surveys and Google Consumer Surveys

  • PEW and Google partnered on this study; 2 question survey
  • Consider fit for purpose – can you use it for trends over time, quick reactions, pretesting questions, open-end testing, question format tests
  • Not always interested in point estimates but better understanding
  • RDD vs Google surveys – average different 6.5 percentage points, distribution closer to zero but there were a number that were quite different
  • Demographics were quite similar, google samples were a bit more male, google had fewer younger people, google was much better educated
  • Correlations of age and “i always vote” was very high, good correlation of age and “prefer smaller government”
  • Political partisanship was very similar, similar for a number of generic opinions – earth is warming, same sex marriage, always vote, school teaching subjects
  • Difficult to predict when point estimates will line up to telephone surveys

A Comparison of a Mailed-in Probability Sample Survey and a Non-Probability Internet Panel Survey for Assessing Self-Reported Influenza Vaccination Levels Among Pregnant Women

  • Panel survey via email invite, weighted data by census, region, age groups
  • Mail survey was a sampling frame of birth certificates, weighted on nonresponse, non-coerage
  • Tested demographics  and flu behaviours of the two methods
  • age distributions were similar [they don’t present margin of error on panel data]
  • panel survey had more older people, more education
  • Estimates differed on flu vaccine rates, some very small, some larger
  • Two methods are generally comparable, no stat testing due to non-prob sample
  • Trends of the two methods were similar
  • Ppanel survey is good for timely results

Probability vs. Non-Probability Samples: A Comparison of Five Surveys

  • [what is a probability panel? i have a really hard time believing this]
  • Novus and TNS Sifo considered probability
  • YouGov and Cint considered non-probability
  • Response rates range from 24% to 59%
  • SOM institute (mail), Detector (phone), LORe (web) – random population sample, rates from 8% to 53%
  • Data from Sweden
  • On average, three methods differ from census results by 4% to 7%, web was worst; demos similar expect education where higher educated were over-represented, driving licence over-rep
  • Non-prob samples were more accurate on demographics compared ot prob samples; when they are weighted they are all the same on demographics but education is still a problem
  • The five data sources were very similar on a number of different measures, whether prob or non-prob
  • demographic accuracy of non-prob panels was better. also closer to political atittudes. No evidence that self recruited panels are worse.
  • Need to test more indicators, retest

Modeling a Probability Sample? An Evaluation of Sample Matching for an Internet Measurement Panel

  • “construct” a panel that best matches the characteristics of a probability sample
  • Select – Match – Measure
  • Matched on age, gender, education, race, time online, also looked at income, employment, ethnicity
  • Got good correlations and estimates from prob and non-prob.
  • Sample matching works quite well [BOX PLOTS!!! i love box plots, so good in so many ways!]
  • Non-prob panel has more heavy internet users

2 responses

  1. […] sampling (sta pa o tem poročala Annie Pettit and Reg Baker). Poleg tega je bil v tednu konference izdan AAPOR report on non-probability […]

  2. This application is interesting to me for two reasons. First, it reminds us that people are motivated to participate in surveys for reasons besides monitory incentives. There is no cash, reward, or prize for answering these questions. Yet, I can see a user who answered my question, who has answered over 20,000 questions. So why do they do it? I would speculate that it is to see what types of questions are being asked by their peers, to express their own opinions to the community, see what the results are to these questions (which you get as soon as you answer), and perhaps they want to make a community contribution so that their own questions will be answer when posed. This last point is related to the second interesting thing about this application, which is the implication it could have on market researcher’s use of social media for survey research.

%d bloggers like this: