Just a few days ago, I moderated a webinar with four leading researchers and statisticians to discuss the use of margin of error with non-probability samples. To a lot of people, that sounds like a pretty boring topic. Really, who wants to listen to 45 minutes of people arguing about the appropriateness of a statistic?
Who, you ask? Well, more than 600 marketing researchers, social researchers, and pollsters registered for that webinar. That’s as many people who would attend a large conference about far more exciting things like using Oculus Rift and the Apple Watch for marketing research purposes. What this tells me is that there is a lot of quiet grumbling going on.
I didn’t realize how contentious the issue was until I started looking for panelists. My goal was to include 4 or 5 very senior level statisticians with extensive experience using margin of error on either the academic or business side. As I approached great candidate after great candidate, a theme quickly arose among those who weren’t already booked for the same time-slot – the issue was too contentious to discuss in such a public forum. Clearly, this was a topic that had to be brought out into the open.
The margin of error was designed to be used when generalizing results from probability samples to the population. The point of contention is that a large proportion of marketing research, and even polling research, is not conducted with probability samples. Probability samples are theoretical – it is generally impossible to create a sampling frame that includes every single member of a population and it is impossible to force every randomly selected person to participate. Beyond that, the volume of non-sampling errors that are guaranteed to enter the process, from poorly designed questions to overly lengthy complicated surveys to poorly trained interviewers, mean that non-sampling errors could have an even greater negative impact than sampling errors do.
Any reasonably competent statistician can calculate the margin of error with numerous decimal places and attach it to any study. But that doesn’t make it right. That doesn’t make the study more valid. That doesn’t eliminate the potentially misleading effects of leading questions and skip logic errors. The margin of error, a single number, has erroneously come to embody the entire system and processes related to the quality of a study. Which it cannot do.
In spite of these issues, the media continue to demand that Margin of Error be reported. Even when it’s inappropriate and even when it’s insufficient. So to the media, I make this simple request.
Stop insisting that polling and marketing research results include the margin of error.
Sometimes, the best measure of the quality of research is how transparent your vendor is when they describe their research methodology, and the strengths and weaknesses associated with it.
create a sampling plan based on the required characteristics for the project
- go into field and gather data from research participants such that the returns look similar to the sampling plan
- if the difference between the sampling plan and your returns is large, go back to step 2
- if the difference is small, weight the returns to match your sampling plan
- if upon analysis of the results you determine that your weights were calculated incorrectly, reweight the data
Did you catch that? You can’t reweight data unless you’ve already weighted it at least once before. So, if you catch yourself using the word reweight in your next report, ask yourself if you really did reweight the data. You probably didn’t. So don’t say it.
Today’s grammar and statistics rant is brought to by the number pink and the letter seventeen.
Emerging Innovations in Sampling Technologies Moderated by Tim Macer, meaning limited
Hear from industry leaders about how they are employing technology to engage and manage respondents.
- Patrick Comer, CEO, Federated Sample
- Bob Fawson, Chief Access, Supply, & Engagement Officer, SSI
- Kurt Knapton, President & CEO, Research Now
- Mark Simon, Managing Director, North America, Toluna
- Patrick Comer: we think about people who answer our surveys as completes, traffic, monetization when they are people. need to think about the community. have created community managers which may help with retention. have been doing a lot of app testing and some members are testing their own apps. Also have a TV detection tool as an app, only small group of people are testing it, it figures out what show they’re watching. Is it possible to determine which kinds of surveys people abandon and then avoid sending those kinds of surveys to that person in the future?
- Kurt Knapton: Mobile is not the future. It is now. Big rewards and pitfuls to participate and to be so slow to change. in 2013, mobile surpassed desktop to go online. Half of all emails are read on mobile. People reach for their phones 150 times per day. [yes, i confess] Surveys can take longer on mobile because it takes more time to type on a phone. There is only so much time people will spend answering a survey on the go. Are we changing fast enough? Online to mobile is as big as the shift from phone to online 15 years ago but we only have about a fourth amount of time to make that transition. Solutions MUST embrace mobile, you could be at risk if you are not. Change creates opportunity, new metrics, new measurements. Take it out of subjective ratings. Evaluate every survey on how well it will work on mobile and aim for mobile friendly. Why not rebate clients who have mobile optimized surveys? [yeah!] Consider NOT sending mobile unfriendly surveys to responders, they won’t like it and they won’t answer it. Live sniffing of the device will help adjust a survey to suit the device being used, better grid formats, better question formats. Would YOU want to answer the survey? Would YOU abandon the survey. [be honest with yourself!]
- Bob Fawson: we castigate ourselves for being slow but there is a quiet technology revolution going on. collecting big data, applying crm techniques to our data, processing and understanding our data more quickly. how do we process data to treat people as individuals and move more quickly on our processes
- Patrick Comer: we are learning quickly from our peer industries. different regulatory environments. folding marketing data into research data, how do we manage that regulatory issue. changing fast enough? – maybe. be careful what you wish for.
- [i’ll stop naming names now, sorry in advance]
- there are more respondents available in databases that are not typically called panel.
- constraint is data quality and data consistency but all marketers are dealing with this.
- responders have fragmented attention even if we think it’s interesting, particularly when you consider what else they could be doing instead
- behavioural data is also important
- routers + relationships can be valuable
- notion of failing fast is critical for our industry
- we burned through responders with online surveys, should we plan to NOT do that this time?
- we need to crack the problem of re-using data among datasets
- if you want someone to download an app, or link to you, or scrape their data, you need a relationship of trust
- the data that’s easy to get online is not particularly good, the lag in updating easy data is shocking
- revolution is in how we store and use data
- there are many quasi research tools with differing levels of quality
- scale is increasingly important and gives you flexibility to solve the optimization problem
- DIY solutions have leveled the playing field [yes, there are DIY sampling companies]
- River is barely talked about today because we have so many premium options, great respondents, who’ve never seen the horrid long surveys
- need to speed up our processes to stay competitive
- automated distribution, buying, and selling of sample needs to improve
- survey pricing can be dynamic, change during fielding depending on what it’s able to attract
- people can now test different prices of sample and then decide which price gets them what they want
- survey panels don’t use inferred data so the quality of demographics can be wonderful.
- people don’t always say what they do or do what they say and whoever can match data together is going to win
- a lot of the “new” technologies are now normal – river, routing, etc
- opt-in permission opens a world of opportunity
- clients don’t talk about responders or panelists, they talk about consumers [i prefer to talk about people, it’s new term for me, strange isn’t it!]
- people are more open to sharing their entire profiles with companies now
- public sentiment can change very quickly if you aren’t permission based
Respondent Identity Verification with Non-Panel, Real-time Samples: Is There Cause for Concern by Nancy Brigham and James Karr #CASRO #MRX
Respondent Identity Verification with Non-Panel, Real-time Samples: Is There Cause for Concern?”
|As the research industry evolves toward non-panel sample sourcing and real-time sampling, questions have arisen about the quality of these respondents, especially in the area of respondent identity verification. This research addresses two key questions: Are fraudulent identities a concern with non-panel samples, and what are the research implications of assessing identity validation with these samples? This work examines identity verification and survey outcomes among five different levels of Personally Identifiable Information (PII) collection. In addition to the presenters, this paper was authored by Jason Fuller (Ipsos Interactive Services, Ipsos).
- Do people whose validity cannot be confirmed providing bad data? Should we be concerned?
- What do we know about non-panel people? Maybe they don’t want to give PII to just take one survey. Will they abandon surveys if we ask for PII? [I don’t see answering “none” as a garbage question. It’s a question of trust and people realizing you do NOT need my name to ask me my opinions.]
- Is it viable to assess identify validation with non-panel sources?
- In the study, PII was asked at the beginning of the survey [would be great to test end of survey after people have invested all that time in their responses]
- Five conditions asking for combination of name, email, address
- Used a third party validator to check PII
- 25% of people abandoned at this point
- Only 4 out of 2640 respondents gave garbage information at this point, 12 tried to bypass without filling it out and then abandoned. It’s so few people that this is hard to trust. [Hey people, let’s replicate]
- Name and address caused 6% of abandonment, name and email caused only 3% abandonment
- Did people get mad that we asked this? can we see anger in concept test? no.
- didn’t lead to poor quality survey behaviours – used a 13 minute survey
- when given a choice, people prefer to give less information – most people will choose to give name and email, low some people will give all information
- Simply collecting PII didn’t appear to influence other aspects
- Did their non-panel source give lower quality data? no. 82% passed the validation test across all conditions. Those who provide the most comprehensive data validate better but that’s likely because it’s more possible to validate them.
- Real-time sample gives just as good data quality, same pass rates, no data differences
- Conclude the screening question is necessary, heads up that PII question will be coming
- Younger ages abandoned more across all test conditions
- This study only looked at the general population, not hard to reach groups like hispanics, or different modes like mobile browsers, or in-app respondents
- Peanut Labs Ask-Me-Anything with special guest Tom Ewing
- Peanut Labs Ask-Me-Anything with special guest Kristin Luck
- What is a convenience sample?
- What does plus or minus 3% 19 times out of 20 mean?
- Short answer lists inflate endorsement rates
- What is Vue magazine? #MRX
- CASRO in San Antonio: The fun so far #MRX #CASRO
“Are There Perils in Changing the Way We Sample our Respondents?”
Sample and panel providers are always looking to increase their active sample size. In recent years this has taken many companies out of email lists into real time sampling via ad banners or social networks. Research has revealed that panelists recruited by such methods are substantially different than the panelists that opt into online panels. This study addresses the various methods panels implement to generate additional sample, and the tradeoffs these methods require. While there is a clear short term gain of added panelists, there may be long term loss of data stability and panel tenure.
- Inna Burdein, Director of Analytics, The NPD Group, Inc.
- Is there a differences among people who take several surveys in a row versus taking surveys off your website
- Tested data from website survey, email survey, and follow-up survey
- 1400 completes per group
- Website takers are younger and newer.
- Website takers express more interest in surveys and incentives [or they just like clicking a lot]
- Website takers are more online, google a lot, lots of free time
- Completion rates are higher for website takers, and then follow-on surveys. Email takers are last.
- Website takers are more satisfied – easy, reasonable, interesting
- Website takers have more inconsistencies and not following instructions. Follow-ons are more likely to straightline and opt out of responding.
- Website panelists report more purchases, more store visits, more browsing stores, more online purchases, make home improvements, redecorate, go on vacation, invest in stock market
- [More likely to report purchases does not mean more likely to purchase]
- One follow on is kind of normal, but two follow-ons is where the differences happen, more unhappiness, more non-purchase, more straightlining, more use of none of the above
- Significant differences do emerge [but I wonder how many are truly meaningful, would you run your business differently if you got number A vs number B]
- Are there perils in changing the way you sample? It depends. Need enthusiastic responders and more representativeness. Tell people to answer on the website. Possibly balance on channel
- Follow-ons may hurt sample quality if no limit is set – time spent, number of surveys, what is the right rule?
“A Model-Based Approach for Achieving a Representative Sample”
Although the enterprise of online research (with non-probability samples) has witnessed remarkable growth worldwide since its inception about 15 years ago, the US public opinion research community has not yet embraced it, partly because of concerns over data reliability and validity. The aim of this project is to rely on data from a recent, large-scale ARF study to develop an optimal model for achieving a representative sample. By that, we mean one that reduces or eliminates the bias associated with non-probability sampling. In addition to the presenter, this paper was authored byJohn Bremer, (Toluna) and Carol Haney (Toluna).
- George Terhanian, Group Chief Strategy & Products Officer ,Toluna
- The key is representativeness. This topic is not new, we talked about it 15 years ago. Criticisms are not new – Warren Mitofksy said the willingness to discard sampling frames and feeble attempts at manipulating the resulting bias undermines the credibility of the research process
- SLOP – Self, selected, opinion, panel. [Funny!]
- Growth of online research remains strong as it has since the beginning.
- 2011 – AAPOR needs to promote flexibility not dogmatism, established a task force on non-probability methods. Identified sample matching as most promising non-probability approach. Did not offer next steps or an agenda.
- Study with 17 different companies in FOQ study
- Researchers should use the ARF’s FOQ2 data to test on-probability sampling and representativeness
- Used a multi-directional search algorithm (MSA)
- Bias is difference between what respondents report and what we know to be true. e.g., Do you smoke? Benchmark vs panel scores.
- Reduce bias through 1) respondent selection or sampling and 2) post hoc adjustment or weighting [I always prefer sampling]
- FOQ2 suggests weighting needs to include additional variables such as demographics, secondary demographics (household characteristics), behaviours, attitudes
- [If you read my previous post on the four types of conference presenters, this one is definitely a content guru 🙂 ]
- Using only optimal demographics, panel and river sample were reasonably good, reduced bias by 20 to 25%. Time spent online helps to reduce bias and is a proxy for availability in terms of how often they take surveys
- Ten key variables are age gender region, time spent online, race, education [sorry, missed the rest]
- Other variables like feeling hopeful, , concern about privacy of online information were top variables [sorry, missed again, you really need to get the slides!]
- Need to sample on all of these but don’t need to weight on all of them
- [I’m wondering if they used a hold-back sample and whether these results are replicable, the fun of step-wise work is that random chance makes weird things happen]
Probability and Non-Probability Samples in Internet Surveys
Moderator: Brad Larson
Understanding Bias in Probability and Non-Probability Samples of a Rare Population John Boyle, ICF International
- If everything was equal, we would choose a probability sample. But everything is not always equal. Cost and speed are completely different. This can be critical to the objective of the survey.
- Did an influenza vaccination study with pregnant women. Would required 1200 women if you wanted to look at minority samples. Not happening. Influenza data isn’t available at a whim’s notice and women aren’t pregnant at your convenience. Non-probability sample is pretty much the only alternative.
- Most telephone surveys are landline only for cost reasons. RDD has coverage issues. It’s a probability sample but it still has issues.
- Unweighted survey looked quite similar to census data. Looked good when crossed by age as well. Landline are more likely to be older and cell phone only are more likely to be younger. Landline more likely to be married, own a home, be employed, higher income, have insurance from employer.
- Landline vs cell only – no difference on tetanus shot, having a fever. Big differences by flu vaccination though.
- There are no gold standards for this measure, there are mode effects,
- Want probability samples but can’t always achieve them
A Comparison of Results from Dual Frame RDD Telephone Surveys and Google Consumer Surveys
- PEW and Google partnered on this study; 2 question survey
- Consider fit for purpose – can you use it for trends over time, quick reactions, pretesting questions, open-end testing, question format tests
- Not always interested in point estimates but better understanding
- RDD vs Google surveys – average different 6.5 percentage points, distribution closer to zero but there were a number that were quite different
- Demographics were quite similar, google samples were a bit more male, google had fewer younger people, google was much better educated
- Correlations of age and “i always vote” was very high, good correlation of age and “prefer smaller government”
- Political partisanship was very similar, similar for a number of generic opinions – earth is warming, same sex marriage, always vote, school teaching subjects
- Difficult to predict when point estimates will line up to telephone surveys
A Comparison of a Mailed-in Probability Sample Survey and a Non-Probability Internet Panel Survey for Assessing Self-Reported Influenza Vaccination Levels Among Pregnant Women
- Panel survey via email invite, weighted data by census, region, age groups
- Mail survey was a sampling frame of birth certificates, weighted on nonresponse, non-coerage
- Tested demographics and flu behaviours of the two methods
- age distributions were similar [they don’t present margin of error on panel data]
- panel survey had more older people, more education
- Estimates differed on flu vaccine rates, some very small, some larger
- Two methods are generally comparable, no stat testing due to non-prob sample
- Trends of the two methods were similar
- Ppanel survey is good for timely results
Probability vs. Non-Probability Samples: A Comparison of Five Surveys
- [what is a probability panel? i have a really hard time believing this]
- Novus and TNS Sifo considered probability
- YouGov and Cint considered non-probability
- Response rates range from 24% to 59%
- SOM institute (mail), Detector (phone), LORe (web) – random population sample, rates from 8% to 53%
- Data from Sweden
- On average, three methods differ from census results by 4% to 7%, web was worst; demos similar expect education where higher educated were over-represented, driving licence over-rep
- Non-prob samples were more accurate on demographics compared ot prob samples; when they are weighted they are all the same on demographics but education is still a problem
- The five data sources were very similar on a number of different measures, whether prob or non-prob
- demographic accuracy of non-prob panels was better. also closer to political atittudes. No evidence that self recruited panels are worse.
- Need to test more indicators, retest
Modeling a Probability Sample? An Evaluation of Sample Matching for an Internet Measurement Panel
- “construct” a panel that best matches the characteristics of a probability sample
- Select – Match – Measure
- Matched on age, gender, education, race, time online, also looked at income, employment, ethnicity
- Got good correlations and estimates from prob and non-prob.
- Sample matching works quite well [BOX PLOTS!!! i love box plots, so good in so many ways!]
- Non-prob panel has more heavy internet users
- Thoughts on the CMRP designation #MRX #NewMR (mriablog.wordpress.com)
- Minimizing Nonresponse Bias (GREAT session) #AAPOR #MRX (lovestats.wordpress.com)
- The Roles of Blogs in Public Opinion Research Dissemination #AAPOR #MRX (lovestats.wordpress.com)
- AAPOR Women Leaders Share Their Insights #AAPOR #MRX (lovestats.wordpress.com)
When Google announced their survey capabilities, the market research space was abuzz with anticipation. Oh, the possibilities! Clients, of course, were eager to learn about a new option that might be better and cheaper than what market research organizations have to offer. On the other hand, market researchers wondered if they ought to be fearful of the competition. Whichever side of the fence you’re on, it was clear that when Google spoke at MRMW, the room would be full.
Paul McDonald, the Google rep, shared lots of great information about the tool and the audience was genuinely impressed. How could you not be impressed with the smooth and clean design and the quick responses.
But we’re market researchers. We know (or we should know) about statistics and probability sampling and what makes good quality data. So it puzzled me when I saw margin of error reported on their survey results. Margin of error shouldn’t be reported on non-probability samples.
During the break, I asked the person manning the Google demo table about the reason for reporting margin of error. But alas, no answer for me.
However, Google is monitoring the MRMW tweets for they provided this answer to me.
Unfortunately, “stratified sampling according to census rep” has nothing to do with probability sampling. Margin of error can only be reported on probability samples whereby all people have an equal and independent chance of being selected for inclusion. So, if Google wants to report margin of error, then they must insist that their research results only be generalized to people who use Google, people who use the websites on which Google displays the surveys, and people who don’t use ad-block (I’m guessing). There are probably some other conditions in there but I’m obviously not familiar with the technicalities of how Google does their research. Regardless, as soon as you stray from the very basic conditions, you have fallen into convenience sampling territory and margin of error is no longer appropriate to display.
Google has kindly prepared a white paper (Comparing Google Consumer Surveys to Existing Probability and Non-Probability Based Internet Surveys) for those of us interested in the details of their product. I enjoyed reading all the criteria that explained why Google surveys don’t use probability sampling. Do read the white paper as you’ll probably be impressed with the results regardless. And keep in mind that survey panels can’t provide probability samples. Even though someone claimed that’s what they gave Google.
But really, who CARES if it’s a probability sample? 99.9%(a) of all market research does not use probability samples and we get along pretty well. Market researchers understand the issues of not using probability sampling, they understand how to interpret and analyze non-probability results, they know how to create clear and unbiased market research, etc. It’s not that we want probability samples. It’s that we want the smarts to tell us when our non-probability samples aren’t good enough.
I’ll let you know if Google follows up…
Postscript: A google rep and I are in the midst of emails about what type of data warrants use of the margin of error. I’ve been sent this link. If you’re statistically inclined, do have a read. ftp://ftp.eia.doe.gov/electricity/mbsii.pdf
(a) I totally made up that number. I have no clue what percentage of market research uses probability sampling. But since most of us use survey panels, focus groups, mall intercepts, mobile surveys etc you get my point.
- Google surveys, and oh, some other people too #MRMW #MRX (lovestats.wordpress.com)
- The Top 10 Things We Love About Social Media Research #MRX (lovestats.wordpress.com)
- I hate social media research because: It’s not a rep sample #2 #MRX (lovestats.wordpress.com)
- What Market Research in the Mobile World means to me #MRX #MRMW (lovestats.wordpress.com)
Creating a representative sample is a wonderful thing. From hundreds of thousands of people, you randomly pick and choose people until all the necessary cells have the correct number of people
We need to find men and women, younger and older people, more and less educated people, urban and rural people, small and large households, low and high income people. When you start creating that balanced sample, the job is pretty easy. Slot that young woman into the cross between women and young. Slot someone into young and male. Slot someone else into older and lower income. On and on you go until you need to fill just the last few slots.
There are a few openings left in the male side of things. There are a bunch of slots in the high income section.There are a bunch of slots in the low education section. So, really all you need are a few good 18 year old men who didn’t get a high school diploma and who earn $150k per year. Bill Gates is too old as is Mark Zuckerberg. The only people left to fill that slot are…. I have to say it…. gamers – people gaming the survey system to qualify for surveys and earn incentives.
But there are solutions.
- First of all, it’s quite simple to build algorithms that detect unlikely demographic patterns. Research them. Test them. Apply them. Get rid of people who are clearly gaming the system and providing false data.
- Second, instead of desperately trying to find exactly 500 to complete a survey, allow yourself to select 550 so as to avoid the selection of people who’s demographics are so odd that they don’t truly fit the niche you are looking for. You’re not looking for people who’s personal behaviours are so odd that their demographic characteristics are out of whack. You are looking for a representative sample. Don’t let technicalities get in the way of thinking.
You’ll get pushback on this but try it. See what happens.
- 10 reasons why you don’t know why you do what you do #MRX (lovestats.wordpress.com)
- Through the Eyes of A Market Research Methodologist #MRX (lovestats.wordpress.com)
- Banish average scores! #MRX (lovestats.wordpress.com)
- Y U No Have Research Objective? #MRX (lovestats.wordpress.com)