Tag Archives: Nonprobability

New Math For Nonprobability Samples #AAPOR 

Moderator: Hanyu Sun, Westat

Next Steps Towards a New Math for Nonprobability Sample Surveys; Mansour Fahimi, GfK Custom Research Frances M. Barlas, GfK Custom Research Randall K. Thomas, GfK Custom Research Nicole R. Buttermore, GfK Custom Research

  • Neuman paradigm requires completes sampling frames and complete response rates
  • Non-prob is important because those assumptions are not met, sampling frames are incomplete, response rates are low, budget and time crunches
  • We could ignore that we are dealing with nonprobability samples, find new math to handle this, try more weighting methods [speaker said commercial research ignores the issue – that is absolutely not true. We are VERY aware of it and work within appropriate guidelines]
  • In practice, there is incomplete sampling frames so samples aren’t random and respondents choose to not respond and weighting has to be more creative, uncertainty with inferences is increasing
  • There is fuzz all over, relationship is nonlinear and complicated 
  • Geodemographic weighting is inadequate; weighted estimates to benchmarks show huge significant differences [this assumes the benchmarks were actually valid truth but we know there is error around those numbers too]
  • Calibration 1.0 – correct for higher agreement propensity with early adopters – try new products first, like variety of new brands, shop for new, first among my friends, tell others about new brands; this is in addition to geography
  • But this is only a Université adjustment, one theme, sometimes it’s insufficient
  • Sought a Multivariate adjustment
  • Calibration 2.0 – social engagement, self importance, shopping habits, happiness, security, politics, community, altruism, survey participation, Internet and social media
  • But these dozens of questions would burden the task for respondents, and weighting becomes an issue
  • What is the right subset of questions for biggest effort
  • Number of surveys per month, hours on Internet for personal use, trying new products before others, time spend watching TV, using coupons, number of relocations in past 5 years
  • Tested against external benchmarks, election, BRFSS questions, NSDUH, CPS/ACS questions
  • Nonprobability samples based on geodemogarphics are the worst of the set, adding calibration improves them, nonprobability plus calibration is even better, probability panel was the best [pseudo probability]
  • Calibration 3.0 is hours on Internet, time watching TV, trying new products, frequency expressing opinions online
  • Remember Total Research Error, there is more error than just sampling error
  • Combining nonprobability and probability samples, use stratification methods so you have resemblance of target population, gives you better sample size for weighting adjustments
  • Because there are so many errors everywhere, even nonprobability samples can be improved
  • Evading calibration is wishing thinking and misleading 

Quota Controls in Survey Research: A Test of Accuracy and Inter-source Reliability in Online Samples; Steven H. Gittelman, MKTG, INC.; Randall K. Thomas, GfK Custom Research Paul J. Lavrakas, Independent Consultant Victor Lange, Consultant

  • A moment of silence for a probabilistic frame 🙂
  • FoQ 2 – do quota controls help with effectiveness of sample selections, what about propensity weight, matching models
  • 17 panels gave 3000 interviews via three sampling methods each; panels remain anonymous, 2012-2013; plus telephone sample including cell phone; English only; telephone was 23 minutes 
  • A – nested region, sex, age
  • B – added non nested ethnicity quotas
  • C – add no nested education quotas
  • D – companies proprietary method
  • 27 benchmark variables across six government and academic studies; 3 questions were deleted because of social desirability bias
  • Doing more than A did not result in reduction of bias, nested age and sex within region was sufficient; race had no effect and neither did C and those made the method more difficult; but this is overall only and not looking at subsamples
  • None of the proprietary methods provided any improvement to accuracy, on average they weren’t powerful and they were a ton of work with tons of sample
  • ABC were essentially identical; one proprietary methods did worse;  phone was not all that better
  • Even phone – 33% of differences were statistically significant [makes me think that benchmarks aren’t really gold standard but simply another sample with its own error bars]
  • The proprietary methods weren’t necessarily better than phone
  • [shout out to Reg Baker 🙂 ]
  • Some benchmarks performed better than others, some questions were more of a problem than others. If you’re studying Top 16 you’re in trouble
  • Demo only was better than the advanced models, advanced models were much worse or no better than demo only models
  • An advanced model could be better or worse on any benchmark but you can’t predict which benchmark
  • Advanced models show promise but we don’t know which is best for which topic
  • Need to be careful to not create circular predictions, covariates overly correlated, if you balance a study on bananas you’re going to get bananas
  • Icarus syndrome – covariates too highly correlated
  • Its’ okay to test privately but clients need to know what the modeling questions are, you don’t want to end up with weighting models using the study variables
  • [why do we think that gold standard benchmarks have zero errors?]

Capitalizing on Passive Data in Online Surveys; Tobias B. Konitzer, Stanford University David Rothschild, Microsoft Research 

  • Most of our data is nonprobability to some extent
  • Can use any variable for modeling, demos, survey frequency, time to complete surveys
  • Define target population from these variables, marginal percent is insufficient, this constrains variables to only those where you know that information 
  • Pollfish is embedded in phones, mobile based, has extra data beyond online samples, maybe it’s a different mode, it’s cheaper faster than face to face and telephone, more flexible than face to face though perhaps less so than online,efficient incentives
  • 14 questions, education, race, age, location, news consumption, news knowledge, income, party ID, also passive data for research purposes – geolocation, apps, device info
  • Geo is more specific than IP address, frequency at that location, can get FIPS information from it which leads to race data, with Lat and long can reduce the number of questions on survey
  • Need to assign demographics based on FIPS data in an appropriate way, modal response wouldn’t work, need to use probabilities, eg if 60% of a FIPS is white, then give the person a 60% chance of being white
  • Use app data to improve group assignments
Advertisements

Representativeness of surveys using internet-based data collection #ESRA15 #MRX 

Live blogged from #ESRA15 in Reykjavik. Any errors or bad jokes are my own.

Yup, it’s sunny outside. And now i’m back inside for the next session. Fortunately, or unfortunately, this session is once again in a below ground room with no windows so I will not be basking in sunlight nor gazing longingly out the window. I guess I’ll be paying full attention to another really great topic.

 

conditional vs unconditional incentives: comparing the effect on sample composition in the recruitment of the german internet panel study GIP

  • unconditional incentives tend to perform better than promised incentives
  • include $5 with advance letter compared to promised $10 with thank you letter; assuming 50% response rate, cost of both groups is the same
  • consider nonresponse bias, consider sample demo distribution
  • unconditional incentive had 51% response rate, conditional incentive had 42% response rate
  • didn’t see a nonresponse bias [by demographics I assume, so many speakers are talking about important effects but not specifically saying what those effects are]
  • as a trend, the two sets of data provide very similar research results, yes differences in means but always fairly close together, confidence intervals always overlap

https://twitter.com/ialstoop/status/622001573481312256

evolution of representativeness in an online probability panel

  • LISS panel – probability panel, includes households without internet accesst, 30 minutes per month, paid for every completed questionnaire
  • is there systematic attrition, are core questionnaires affected by attrition
  • normally sociademographics only which is restrictive
  • missing data imputed using Mice
  • strongest loss in panel of sociodemographic properties
  • there are seasonal drops in attrition, for instance in June which is lots of holidays
  • has more effects for survey attitudes and health traits, less so for political and personality traits which are quite stable even with attrition
  • try to decrease attrition through refreshement based on targets

https://twitter.com/ialstoop/status/622004420314812417

moderators of survey representativeness – a meta analysis

  • measured single mode vs multimode surveys
  • R-indicators – single measure from 0 to 1 for sample representativeness, based on logistic regression models for repsonse propensity
  • hypothesize mixed mode surveys are more representative than single mode surveys
  • hypothesize cross-sectional surveys are more representative than longitudinal survyes
  • heterogeneity not really explained by moderators

setting up a probability based web panel. lessons learned fromt he ELIPSS pilot study

  • online panel in france, 1000 people, monthly questionnaires, internet access given to each member [we often wonder about the effect of people being on panels since they get used to and learn how to answer surveys, have we forgotten this happens in probability panels too? especially when they are often very small panels]
  • used different contact mdoes including letters, phone, face to face
  • underrepresented on youngest, elderly, less educated, offline people
  • reason for participatign in order – trust in ELIPSS 46%, originality of project 37%, interested in research 32%, free internet access 13%
  • 16% attiriont after 30 months (that’s amazing, really low and really good!), response rate generally above 80%
  • automated process – invites on thursday, sustematic reminders, by text message, app message and email
  • individual followups by phone calls and letters [wow. well that’s how they get a high response rate]
  • individual followups are highly effective [i’d call them stalking and invasive but that’s just me. i guess when you accept free 4g internet and a tablet, you are asking for that invasiveness]
  • age becomes less representative over time, employment status changes a lot, education changes the most but of course young people gain more education over time
  • need to give feedback to panel members as they keep asking for it
  • want to broaden use of panel to scientific community by expanding panel to 3500 people

https://twitter.com/nicolasbecuwe/status/622009359082647552

https://twitter.com/ialstoop/status/622011086783557632

the pretest of wave 2 of the german health interview and examination survey for children and adolescents as a mixed mode survey, composition of participant groups

  • mixed mode helps to maintain high response, web is prefered by younger people, representativeness could be increased by using multiple modes
  • compared sequential and simultaneous surveys
  • single mode has highest response rate, mixed mode simultaneous was extremely close behind, mixed mode multi-step had the lowest rate
  • paper always gave back the highest porportion of data even when people had the choice of both, 11% to 43% chose the paper among 3 groups
  • sample composition was the same among all four groups, all confidence intervals overlap – age, gender, nationality, immigration, education
  • metaanalysis – overall trend is the same
  • 4% lower response rate in mixed mode – additional mode creates cognitive burden, creates a break in response process, higher breakoffs
  • mixed mode doesn’t increase sample composition nor response rates [that is, giving people multiple options as opposed to just one option, as opposed to multiple groups whereby each groups only knows about one mode of participation.]
  • current study is now a single mode study

https://twitter.com/oparnet/status/622015032231075840


 

Comparing probability and nonprobability samples #AAPOR #MRX 

prezzie #1: how different and probability and nonprobability designs

  • nonprobability samples often get the correct rresults and probability samples are sometimes wrong. maybe they are more similar than we realize
  • nonprobability sampling may have a sample frame but it’s not the same as a census population
  • how do you choose, which factors are important
  • what method does the job that you require, that fits your purpose
  • is the design relevant, does it meet the goals with the resources, does the method gives you results in the time you need, accessability, can you find the people you need, interpretability and reliability, accuracy of estimates with acceptable mean square error, coherance in terms of results matching up with other data points from third parties [of course, who’s to say what the right answer is, everyone could be wrong as we’ve seen in recent elections]
  • nonprobability can be much faster, probability can be more relevant
  • nonprobability can get you right to the people you want to listen to
  • both methods suffer from various types of error, some more than others, must consider total survey error [i certainly hope you’ve been considering TSE since day 1]
  • driver will decide the type of study you end up doing
  • how can nonprob methods help prob methods, because they do offer much good stuff
  • [interesting talk, nice differentiation between prob and nonprob even though I did cringe at a few definitions, eg I dont see that quality is the differentiator between prob and nonprob]

prezzie #2: comparison of surveys based on prob and nonprob

  • limbo – how low can you go with a nonprob sample
  • bandwagon – well everyone else is doing nonprob sample [feelings getting hurt here]
  • statistical adjustment of nonprob samples helps but it is only a partial solution
  • nonprob panel may have an undefined response rate
  • need to look at point estimates and associations in both the samples, does sampling only matter when you need population point estimates
  • psychology research is often done all with college students [been there, done that!]
  • be sure to weight and stratify the data
  • education had a large effect between prob and nonprob sample [as it usually does along with income]
  • point estimates were quite different in cases, but the associations were much closer so if you don’t need a precise point estimate a nonprob sample could do the trick

prezzie #4: sample frame and mode effects

  • used very similar omnibus surveys, included questions where they expected to find differences
  • compared point estimates of the methods as well as to benchmarks of larger census surveys
  • for health estimates, yes, there were differences but where the benchmark was high so were the point estimates, similarly low or moderate point estimates, total raw differences maxed out around ten point
  • there was no clear winner for any of the question types though all highs were highs and lows were low
  • no one design is consistently superior
%d bloggers like this: