Tag Archives: sampling

How the best research panel in the world accurately predicts every election result #polling #MRX 

Forget for a moment the debate about whether the MBTI is a valid and reliable personality measurement tool. (I did my Bachelors thesis on it, and I studied psychometric theory as part of my PhD in experimental psychology so I can debate forever too.) Let’s focus instead on the MBTI because tests similar to it can be answered online and you can find out your result in a few minutes. It kind of makes sense and people understand the idea of using it to understand themselves and their reactions to our world. If you’re not so familiar with it, the MBTI divides people into groups based on four continuous personality characteristics: introversion/extroversion, sensing/intuition, thinking/feeling, judging/perception . (I’m an ISTJ for what it’s worth.)

Now, in the market and social research world, we also like to divide people into groups. We focus mainly on objective and easy to measure demographic characters like gender, age, and region though sometimes we also include household size, age of children, education, income, religion, and language. We do our best to collect samples of people who look like a census based on these demographic targets and oftentimes, our measurements are quite good.  Sometimes, we try to improve our measurements by incorporating a different set of variables like political affiliation, type of home, pets, charitable behaviours, and so forth. 

All of these variables get us closer to building samples that look like census but they never get us all the way there. We get so close and yet we are always missing the one thing that properly describes each human being. That, of course, is personality. And if you think about it, in many cases, we’re only using demographic characteristics because we don’t have personality data. Personality is really hard to measure and target. We use age and gender and religion and the rest to help inform about personality characteristics. Hence why I bring up the MBTI. The perfect set of research sample targets. 

The MBTI may not be the right test, but there are many thoroughly tested and normed personality measurement scales that are easily available to registered, certified psychologists. They include tests like the 16PF, the Big 5, or the NEO, all of which measure constructs such as social desirability, authoritarianism, extraversion, reasoning, stability, dominance, or perfectionism. These tests take decades to create and are held in veritable locked boxes so as to maintain their integrity. They can take an hour or more for someone to complete and they cost a bundle to use. (Make it YOUR entire life’s work to build one test and see if you give it away for free.) Which means these tests will not and can not ever be used for the purpose I describe here. 

However, it is absolutely possible for a Psychologist or psychological researcher to build a new, proprietary personality scale which mirrors standardized tests albeit in a shorter format, and performs the same function. The process is simple. Every person who joins a panel answers ten or twenty personality questions. When they answer a client questionnaire, they get ten more personality questions, and so on, and so on, until every person on a panel has taken the entire test and been assigned to a personality group. We all know how profiling and reprofiling works and this is no different. And now we know which people are more or less susceptible to social desirability. And which people like authoritarianism. And which people are rule bound. Sound interesting given the US federal election? I thought so. 

So, which company does this? Which company targets people based on personality characteristics? Which company fills quotas based on personality? Actually, I don’t know. I’ve never heard of one that does. But the first panel company to successfully implement this method will be vastly ahead of every other sample provider. I’d love help you do it. It would be really fun. 🙂

New Math For Nonprobability Samples #AAPOR 

Moderator: Hanyu Sun, Westat

Next Steps Towards a New Math for Nonprobability Sample Surveys; Mansour Fahimi, GfK Custom Research Frances M. Barlas, GfK Custom Research Randall K. Thomas, GfK Custom Research Nicole R. Buttermore, GfK Custom Research

  • Neuman paradigm requires completes sampling frames and complete response rates
  • Non-prob is important because those assumptions are not met, sampling frames are incomplete, response rates are low, budget and time crunches
  • We could ignore that we are dealing with nonprobability samples, find new math to handle this, try more weighting methods [speaker said commercial research ignores the issue – that is absolutely not true. We are VERY aware of it and work within appropriate guidelines]
  • In practice, there is incomplete sampling frames so samples aren’t random and respondents choose to not respond and weighting has to be more creative, uncertainty with inferences is increasing
  • There is fuzz all over, relationship is nonlinear and complicated 
  • Geodemographic weighting is inadequate; weighted estimates to benchmarks show huge significant differences [this assumes the benchmarks were actually valid truth but we know there is error around those numbers too]
  • Calibration 1.0 – correct for higher agreement propensity with early adopters – try new products first, like variety of new brands, shop for new, first among my friends, tell others about new brands; this is in addition to geography
  • But this is only a Université adjustment, one theme, sometimes it’s insufficient
  • Sought a Multivariate adjustment
  • Calibration 2.0 – social engagement, self importance, shopping habits, happiness, security, politics, community, altruism, survey participation, Internet and social media
  • But these dozens of questions would burden the task for respondents, and weighting becomes an issue
  • What is the right subset of questions for biggest effort
  • Number of surveys per month, hours on Internet for personal use, trying new products before others, time spend watching TV, using coupons, number of relocations in past 5 years
  • Tested against external benchmarks, election, BRFSS questions, NSDUH, CPS/ACS questions
  • Nonprobability samples based on geodemogarphics are the worst of the set, adding calibration improves them, nonprobability plus calibration is even better, probability panel was the best [pseudo probability]
  • Calibration 3.0 is hours on Internet, time watching TV, trying new products, frequency expressing opinions online
  • Remember Total Research Error, there is more error than just sampling error
  • Combining nonprobability and probability samples, use stratification methods so you have resemblance of target population, gives you better sample size for weighting adjustments
  • Because there are so many errors everywhere, even nonprobability samples can be improved
  • Evading calibration is wishing thinking and misleading 

Quota Controls in Survey Research: A Test of Accuracy and Inter-source Reliability in Online Samples; Steven H. Gittelman, MKTG, INC.; Randall K. Thomas, GfK Custom Research Paul J. Lavrakas, Independent Consultant Victor Lange, Consultant

  • A moment of silence for a probabilistic frame 🙂
  • FoQ 2 – do quota controls help with effectiveness of sample selections, what about propensity weight, matching models
  • 17 panels gave 3000 interviews via three sampling methods each; panels remain anonymous, 2012-2013; plus telephone sample including cell phone; English only; telephone was 23 minutes 
  • A – nested region, sex, age
  • B – added non nested ethnicity quotas
  • C – add no nested education quotas
  • D – companies proprietary method
  • 27 benchmark variables across six government and academic studies; 3 questions were deleted because of social desirability bias
  • Doing more than A did not result in reduction of bias, nested age and sex within region was sufficient; race had no effect and neither did C and those made the method more difficult; but this is overall only and not looking at subsamples
  • None of the proprietary methods provided any improvement to accuracy, on average they weren’t powerful and they were a ton of work with tons of sample
  • ABC were essentially identical; one proprietary methods did worse;  phone was not all that better
  • Even phone – 33% of differences were statistically significant [makes me think that benchmarks aren’t really gold standard but simply another sample with its own error bars]
  • The proprietary methods weren’t necessarily better than phone
  • [shout out to Reg Baker 🙂 ]
  • Some benchmarks performed better than others, some questions were more of a problem than others. If you’re studying Top 16 you’re in trouble
  • Demo only was better than the advanced models, advanced models were much worse or no better than demo only models
  • An advanced model could be better or worse on any benchmark but you can’t predict which benchmark
  • Advanced models show promise but we don’t know which is best for which topic
  • Need to be careful to not create circular predictions, covariates overly correlated, if you balance a study on bananas you’re going to get bananas
  • Icarus syndrome – covariates too highly correlated
  • Its’ okay to test privately but clients need to know what the modeling questions are, you don’t want to end up with weighting models using the study variables
  • [why do we think that gold standard benchmarks have zero errors?]

Capitalizing on Passive Data in Online Surveys; Tobias B. Konitzer, Stanford University David Rothschild, Microsoft Research 

  • Most of our data is nonprobability to some extent
  • Can use any variable for modeling, demos, survey frequency, time to complete surveys
  • Define target population from these variables, marginal percent is insufficient, this constrains variables to only those where you know that information 
  • Pollfish is embedded in phones, mobile based, has extra data beyond online samples, maybe it’s a different mode, it’s cheaper faster than face to face and telephone, more flexible than face to face though perhaps less so than online,efficient incentives
  • 14 questions, education, race, age, location, news consumption, news knowledge, income, party ID, also passive data for research purposes – geolocation, apps, device info
  • Geo is more specific than IP address, frequency at that location, can get FIPS information from it which leads to race data, with Lat and long can reduce the number of questions on survey
  • Need to assign demographics based on FIPS data in an appropriate way, modal response wouldn’t work, need to use probabilities, eg if 60% of a FIPS is white, then give the person a 60% chance of being white
  • Use app data to improve group assignments

Improvements to survey modes #PAPOR #MRX 

What Are They Thinking? How IVR Captures Public Opinion For a Democracy, Mary McDougall, Survox

  • many choices, online is cheapest followed by IVR followed by phone interview
  • many still do not have internet – seniors, non-white, low income, no high school degree
  • phone can help you reach those people, can still do specific targeting
  • good idea to include multiple modes to test for any mode effects
  • technology is no longer a barrier for choosing a data collection strategy
  • ignoring cell phones is poor sampling
  • use labor startegically to allow IVR
  • tested IVR on political polling, 300 completes in 2.5 hours, met the quotas, once a survey was started it was generally completed

The Promising Role of Fax in Surveys of Clinical Establishments: Observations from a Multi-mode Survey of Ambulatory Surgery Centers, Natalie Teixeira, Anne Herleth, and Vasudha Narayanan, Weststat; Kelsey O’Yong, Los Angeles Department of Public Health

  • we often want responses from an organization not a company
  • 500 medical facilities, 60 questions about staffing and infection control practices
  • used multimode – telephone, postal, web, and fax
  • many people requested the survey by fax and many people did convert modes
  • because fax was so successful, reminder calls were combined with fax automatically and saw successful conversions to this method
  • this does not follow the current trend
  • fax is immediate and keeps gatekeepers engaged, maybe it was seen as a novelty
  • [“innovative fax methodology” so funny to hear that phrase. I have never ever ever considered fax as a methodology. And yet, it CAN be effective. 🙂 ]
  • options to use “mass” faxing exist

The Pros and Cons of Persistence During Telephone Recruitment for an Establishment Survey, Paul Weinfurter and Vasudha Narayanan, Westat

  • half of restaurant issues are employees coming to work ill, new law was coming into effect regarding sick pay
  • recruit 300 restaurants to recruit 1 manager, 1 owner, and a couple food preparers
  • telephone recruitment and in person interviews, english, spanish, mandarin, 15 minutes, $20 gift card
  • most of the time they couldn’t get a manager on the phone and they received double the original sample of restaurants to contact
  • it was assumed that restaurants would participate because the sponsor was health inspectors, but it was not mandatory and they couldn’t be told it was mandatory, there were many scams related to this so people just declined, also all of the health inspectors weren’t even aware of the study
  • 73% were unreachable after 3 calls, hard to get a person of authority during open hours
  • increased call attempts to five times, but continued on when they thought recruitment was likely
  • recruited 77 more from people who were called more than 5 times
  • as a result, data were not limited to a quicker to reach sample
  • people called up to ten times remained noncommittal and never were interviewed
  • there wasn’t an ideal number of calls to get maximum recruits and minimum costs
  • but the method wasn’t really objective, the focus was on restaurants that seemed like they might be reachable
  • possibly more representation than if they had stopped all their recruitment at five calls
  • [would love to see results crossed by number of attempts]

Analysis, design, and sampling methods #PAPOR #MRX 

Live blogged at #PAPOR in San Francisco. Any errors or bad jokes are my own.

Enhancing the use of Qualitative Research to Understand Public Opinion, Paul J. Lavrakas, Independent Consultant; Margaret R. Roller, Roller Research

  • thinks research has become to quantitative because qual is typically not as rigorous but this should and can change
  • public opinion in not a number generated from polls, polls are imperfect and limited
  • aapor has lost sight of this [you’re a brave person to say this! very glad to hear it at a conference]
  • we need more balance, we aren’t a survey research organization, we are a public opinion organization, our conference programs are extremely biased quantitative
  • there should be criteria to judge the trustworthyness of research – was it fit for purpose
  • credible, transferable, dependability, confirmability
  • all qual resaerch should be credible, analyzable, transparent, useful
  • credible – sample repreentation and data collection
  • do qual researchers seriously consider non-response bias?
  • credibility – scope deals with coverage design and nonresponse, data gathering – information obtained, researcher effects, participant effects
  • analyzability – intercoder reliability, transcription quaity
  • transparency – thick descriptions of details in final documents

Comparisons of Fully Balanced, Minimally Balanced, and Unbalanced Rating Scales, Mingnan Liu, Sarah Cho, and Noble Kuriakose, SurveyMonkey

  • there are many ways to ask the same question
  • is it a good time or a bad time? – fully balanced
  • is it a good time or not? – minimally balanced
  • do you or do you not think it is getting better?
  • are things headed in the right direction?
  • [my preference – avoid introducing any balancing in the question, only put it in the answer. For instance: What do you think about buying  a house? Good time, Bad time]
  • results – effect sizes are very small, no differences between the groups
  • in many different questions tested, there was no difference in the formats

Conflicting Thoughts: The Effect of Information on Support for an Increase in the Federal Minimum Wage Level, Joshua Cooper & Alejandra Gimenez, Brigham Young University, First Place Student Paper Competition Winner

  • Used paper surveys for the experiment, 13000 respondents, 25 forms
  • Would you favor or oppose raising the minimum wage.
  • Some were told how many people would increase their income, some were told how many jobs would be lost, some were told both
  • Negative info opposed a wage increase, positive info in favor of wage increase, people who were told both opposed a wage increase
  • independents were more likely to say don’t know
  • negatively strongly outweighs the good across all types of respondents regardless of gnder, income, religion, partyID
  • jobs matter, more than anything

Training for survey research: who, where, how #AAPOR #MRX 

moderated by Frauke Kreuter

Prezzie #1: training needs in survey methods

  • started a program in the 1970s with 4 courses, two in statistics and 2 in sampling, that was pretty good at the time, it covered the basics well
  • in 1993, 3 courses data collection, 3 in sampling, 2 practicums, 4 statistics, 3 design classes, 1 on federal statistical system
  • many journals have started since then, survey methodology, journal of official statistics, POQ and aapor publications journal of survey statistics and methodology, international conferences, now entire conference on total survey error
  • statisticians need to know basic theory, sampling theory of complex designs and weighting and imputation, small area estimation, disclosure control, record linkage, paradata, responsive design, panel survey methods, survey management, ethics; it’s impossible to know about and training for everything now
  • in early days, treating humans nicely was never mentioned, it wasn’t important; now we realize it’s important [yet we still don’t treat people as nicely as we ought too. isn’t a long, poorly designed, poorly written survey disrespectful?]
  • a masters degree can no longer cover everything we need to know as survey researchers, can run summer programs for training, can do online training, can do advanced certificate programs
  • the world is training so fast so how can training keep up with everything, punch cards are history and whatever we’re doing now will be history soon enough
  • we need to train more people but undergrads don’t know about our field

Prezzie #2: training for modern survey statisticians

  • survey practice journal special issue – February 2015
  • might be 147 federal positions per year advertising for statisicians, we are training only about a quarter of what’s needed
  • we need core statistical skills but also communication and presentation skills
  • training gap right now is most grad courses only have one course in sampling
  • most courses use R (55%)
  • only 40% of courses are taught by faculty who work specifically in statistics
  • weighting is a major gap, don’t talk about non-response adjustments
  • big aining gap in design trade offs – discrete parameters, continuous parameters, split sample randomization
  • most training happens on the job
  • [this session is so popular I can’t put my feet up on the chair in front of me! The room is full!]

Prezzie #3:  assessing the present

  • our science draws on many other disciplines
  • trained in the methods and how to evaluate those methods, trained in qual and quant like ethnography and psychmetric analysis
  • there are five university based programs, mostly at the graduate level, plus professional conferences, short courses and seminars
  • current programs do the core well, increasing focus on hybrid training, trainers are also practitioners which is invaluable
  • training gap on full survey life cycle experience, not enough practical experience in the training, not enough multi-cultural training and the population has a large and growing non-native english speaking base
  • quant dominates most survey programs [well of course, a survey program is surveys, why not develop a RESEARCH program]
  • you can have a productive career with little statistical knowlege, you can be a qual researcher [well that’s just offensive! why shouldn’t qual researchers also know statistics?]
  • ideal program still needs the core classes but it also needs more qual and user experience, more specialized courses, more practicums, more internships, more major projects like a thesis

Prezzie #4:  on the job training

  • she did interviews with people for her talk – she’s qualitative 🙂
  • the workplace is interdisciplinary with many skill sets and various roles
  • know your role – are you a jack of all trades or filling a niche
  • in private business, everyone knows a bit about everything
  • at the census bureau, it’s massively specialized – she works on pre-testing of non-english surveys
  • you need to create opportunities for yourself – request stretch tasks, seek mentors, volunteer to help with new projects, shadow experienced people – screen sharing is a wonderful thing
  • take short courses, pursue graduate degrees, read and present – you are responsible for your future growth
  • as management you can – promote learning by doing, share the big picture, encourage networking, establish a culture of ongoing learning
  • you can learn on the job without money

Prezzie #5: future of training

  • we are in the midst of a paradigmatic shift in our industry
  • survey informatics – computer science, math and stats, and cognitive and social spcyhology – this is the new reality
  • resistance to online surveys is the same as the emergence of the telephone survey – skeptical, resistant
  • the speaker was a heretic when he first started talking about online surveys
  • we need technology and computers for higher quality data, increase quantitative in data collection
  • we now have paradata and metadata and auxiliary data – page level data, question level data, personal level data, day level data
  • data is no longer just the answers to the questions, methodologists need to understand all these types of data
  • concerned we’re not keep up and preparing the next generation
  • [discussion of how panels can be good, like people have never heard of panels, sadly some people do need to hear this]
  • computer science must be our new partner [go home and learn R and Python right now]
  • we won’t have to ask are you watching TV, the TV will know who’s the room, who’s looking at the TV, who left to get a snack
  • least powerful low level professors who know the new tech have no power to do anything about it and they have no funding

Evaluating polling accuracy #AAPOR #MRX 

moderated by Mary McDougall, CfMC Survox Solutions

prezzie 1: midterm election polling in Georgia

  • georgia has generally been getting more media attention because it is bluer than expected, change may be fast enough to outpace polls, population has changed a lot particularly around atlanta, georgia has become less white
  • telephone survey using voter registration info, tested three weights – voter data, party targets, education weights
  • registered voting weight was much better, education weighting was worse
  • voter weight improvied estimates in georgia but you need voter information
  • [why do presenters keep talking about need more research for reliability purposes, isn’t that default?]

prezzie #2: error in the 2014 preelection polls

  • house effects – difference between one poll and every other poll, difference from industry average
  • they aren’t what they used to be, used to be interview method and weight practices
  • regression model is better than difference of means tests
  • could it be whether the pollster is extremely active or if they only do it once in a while
  • results show the more poll the more accurate you are, if you poll more in risky areas you are less accurate – but overall these results were kind of null
  • second model using just many pollsters was much better – arkansas had a lot more error, it had the most pollsters
  • in the end, cant really explain

prezzie #3: north carolina senate elections

  • to use RDD or registration based sampling; will turn out be high or low; a small university has limited resources with highly talented competition
  • chose RBS and did three polls, worked saturday to thursday, used live interviewers, screen for certain or probably will vote
  • RBS worked well here, there were demographic gaps, big race gap, big party gaps

prezzie #4: opinion polls in referendums

  • [seriously presenters, what’s with these slides that are paragraphs of text?]
  • most polls are private and not often released, questions are all different, there is no incumbent being measured
  • data here are 15 tobacco control elections and 126 questions in total, courts forced the polls to be public, find them on legacy library website
  • five types of questions – uninformed heads up questions where you’re asked whether you agree or strongly agree [i.e., leading, biased, unethical questions. annie not happy!]
  • predictions are better closer to the election, spending is a good predictor, city size is a good predictor
  • using the word ‘strongly’ in the question doesn’t improve accuracy
  • asking the question exactly as the ballot doesn’t improve the accuracy
  • asking more questions from your side of the opinion doesn’t improve the accuracy 
  • polls often overestimate the winner’s percentage
  • [these polls are great examples of abusing survey best practices research]
  • post election surveys are accurate and useful for other purposes
  • [big slam against appor for not promoting revealing of survey sponsors]

prezzie #5: comparing measures of accuracy

  • big issue is opt-in surveys versus random sample [assuming random sampling of humans is possible!]
  • accuracy affected by probability sampling, days to election, sample sizes, number of fielding days
  • used elections in sweden with has eight parties in parliament, many traditional methods are inappropriate with multi-candidate elections
  • sample size was good predictor, fielding days was not predictive, opt-in sample was worse but overall r square was very small

prezzie #6: polling third party candidates

  • why do we care about these? don’t want to waste space on candidates who only get 1% of the votes
  • 1500 data points, 121 organizations, 94 third party candidates – thank you to HuffPollster and DailyKos
  • aggregate accuracy was good, most were overstatement, but there was systematic bias
  • using the candidates names makes a difference, but if you name one candidate, you should name them all – i know i’m not voting for the top two candidates so i’m probably voting for this third party person you listed
  • accuracy gets better closer to the date, sometimes you don’t know who the third party candidate is till close to the date
  • live phone and IVR underestimate, internet overestimated
  • there were important house effects – CBS/yougove underestimate; PPP overestimates; on average FOX news is fairly accurate with third party candidates

Comparing probability and nonprobability samples #AAPOR #MRX 

prezzie #1: how different and probability and nonprobability designs

  • nonprobability samples often get the correct rresults and probability samples are sometimes wrong. maybe they are more similar than we realize
  • nonprobability sampling may have a sample frame but it’s not the same as a census population
  • how do you choose, which factors are important
  • what method does the job that you require, that fits your purpose
  • is the design relevant, does it meet the goals with the resources, does the method gives you results in the time you need, accessability, can you find the people you need, interpretability and reliability, accuracy of estimates with acceptable mean square error, coherance in terms of results matching up with other data points from third parties [of course, who’s to say what the right answer is, everyone could be wrong as we’ve seen in recent elections]
  • nonprobability can be much faster, probability can be more relevant
  • nonprobability can get you right to the people you want to listen to
  • both methods suffer from various types of error, some more than others, must consider total survey error [i certainly hope you’ve been considering TSE since day 1]
  • driver will decide the type of study you end up doing
  • how can nonprob methods help prob methods, because they do offer much good stuff
  • [interesting talk, nice differentiation between prob and nonprob even though I did cringe at a few definitions, eg I dont see that quality is the differentiator between prob and nonprob]

prezzie #2: comparison of surveys based on prob and nonprob

  • limbo – how low can you go with a nonprob sample
  • bandwagon – well everyone else is doing nonprob sample [feelings getting hurt here]
  • statistical adjustment of nonprob samples helps but it is only a partial solution
  • nonprob panel may have an undefined response rate
  • need to look at point estimates and associations in both the samples, does sampling only matter when you need population point estimates
  • psychology research is often done all with college students [been there, done that!]
  • be sure to weight and stratify the data
  • education had a large effect between prob and nonprob sample [as it usually does along with income]
  • point estimates were quite different in cases, but the associations were much closer so if you don’t need a precise point estimate a nonprob sample could do the trick

prezzie #4: sample frame and mode effects

  • used very similar omnibus surveys, included questions where they expected to find differences
  • compared point estimates of the methods as well as to benchmarks of larger census surveys
  • for health estimates, yes, there were differences but where the benchmark was high so were the point estimates, similarly low or moderate point estimates, total raw differences maxed out around ten point
  • there was no clear winner for any of the question types though all highs were highs and lows were low
  • no one design is consistently superior

Taking multiple surveys in one session by Mark Kinnucan and Inna Burdein #CASRO #MRX

Live blogged from Nashville. Any errors or bad jokes are my own.

– We want surveys short and simple. to avoid straightlining, and satisficing. reuce breakoffs, and dropping off the panel.
– but companies are ok with panelists taking multiples surveys in a row
– is multiple short surveys better than one long survey?, assume it lets people handle fatigue better, assumes if they do take another survey that that survey will be better quality. is any of this true?
– who takes multiple surveys, what are their completion rates, how good is the data, how does it affect attrition
– defined surveys as all the surveys taken within 1.25 hours
– 40% of surveys are completed in chains
– younger people make more use of chains
– moderate chaining is the norm. most people average 1.5 to 3 surveys per session. about 10% average more than 3 surveys per chain.
– completion rates increase with each survey in the chain. people who want to drop already dropped out.
– buying rate is unaffected by chaining. for people who take five surveys, buying rate increases with each survey.
– why is this? panelists will take more surveys if they did not exhaust themselves in the previous survey. or maybe those with lots of buying behaviours pace their reporting. or those people are truly different. [read the paper. it’s getting too detailed for me to blog on]
– poor responders are more likely to chain, but not massively more likely
– for younger panelists, heavy chainers have greater longevity. for oldest panelists, it results in burnout.
– people who agree to chain, do it because they are ready to do so. if they exhausted in a previous survey, they don’t continue. a small minority abuse the process
– chaining helps younger panelists stay engaged

Do we need to control for non-quota variables? by Deb Santus and Frank Kelly #CASRO #MRX

Live blogged from Nashville. Any errors or bad jokes are my own.

Third author is Peter Kwok

we moved many offline sampling techniques to online sampling. now we have river and dynamic sourcing and routers.

– should we use one or other of both of outgo quotas or return quotas
– balancing quotas are set from sampling frames. usually region, age, gender, household size, often based on US census.
– survey quotas are determined by respondent profiles or subject category.
– some populations are really hard to find. not everyone is simply looking for genpop
– sample frames may not reflect the target populations
– females can respond 20 or more points higher than men
– with river or dynamic sampling, you don’t even know the demos that you’re getting

– router selection is efficient use of respondents but there’s not as much quota control compared to traditional sampling that uses outgo and return balancing
– traditional sampling focused on a specific person for a specific study

carried out a study using various sampling techniques. used interlocking age and gender, plus region.
– 10 minutes, grocery shopping habits, census quotas
cell 1 – 4 balancing variables including income, quotas for outgo
cell 2 – only used age, gender, region quotas on outgo

then weighted to census
– better weights on cell 1, better weight efficiency, minimum weights, maximum weights.
– every type of sample has skews [yer darn right! why do people forget this?]
– controlling for age, gender, region just wasn’t enough
– income and household size did not represent well when they weren’t initially balanced for, marital status also didn’t work well
– some of the profiling questions showed differences as well – belonging to a warehouse club showed differences, using a smartphone to help with chopping showed differences
– quotas do not guarantee a representative sample. additional controls are necessary on outgo. with more controls, weighting can even be unnecessary
– repetition is good. repetition is good. repetition is good (i.e., test-retest reliability is good!)

we need to retain our sample expertise. be smart. learn about sampling and do it well. keep the good things about the traditional ways.

[please please control on the outgo and returns if you can. weighting as a strategy is not the way to think about this. get the sample you need and fuss with it as little as possible through weighting]

Stop Asking for Margin of Error in Polling Research

Originally published on Huffington Post. Also published on Linkedin, Quora, and anywhere else I have an account.

Just a few days ago, I moderated a webinar with four leading researchers and statisticians to discuss the use of margin of error with non-probability samples. To a lot of people, that sounds like a pretty boring topic. Really, who wants to listen to 45 minutes of people arguing about the appropriateness of a statistic?

Who, you ask? Well, more than 600 marketing researchers, social researchers, and pollsters registered for that webinar. That’s as many people who would attend a large conference about far more exciting things like using Oculus Rift and the Apple Watch for marketing research purposes. What this tells me is that there is a lot of quiet grumbling going on.

I didn’t realize how contentious the issue was until I started looking for panelists. My goal was to include 4 or 5 very senior level statisticians with extensive experience using margin of error on either the academic or business side. As I approached great candidate after great candidate, a theme quickly arose among those who weren’t already booked for the same time-slot – the issue was too contentious to discuss in such a public forum. Clearly, this was a topic that had to be brought out into the open.

The margin of error was designed to be used when generalizing results from probability samples to the population. The point of contention is that a large proportion of marketing research, and even polling research, is not conducted with probability samples. Probability samples are theoretical – it is generally impossible to create a sampling frame that includes every single member of a population and it is impossible to force every randomly selected person to participate. Beyond that, the volume of non-sampling errors that are guaranteed to enter the process, from poorly designed questions to overly lengthy complicated surveys to poorly trained interviewers, mean that non-sampling errors could have an even greater negative impact than sampling errors do.

Any reasonably competent statistician can calculate the margin of error with numerous decimal places and attach it to any study. But that doesn’t make it right. That doesn’t make the study more valid. That doesn’t eliminate the potentially misleading effects of leading questions and skip logic errors. The margin of error, a single number, has erroneously come to embody the entire system and processes related to the quality of a study. Which it cannot do.

In spite of these issues, the media continue to demand that Margin of Error be reported. Even when it’s inappropriate and even when it’s insufficient. So to the media, I make this simple request.

Stop insisting that polling and marketing research results include the margin of error.

Sometimes, the best measure of the quality of research is how transparent your vendor is when they describe their research methodology, and the strengths and weaknesses associated with it.

 

%d bloggers like this: