Data Quality Issues For Online Surveys #AAPOR


Moderator: Doug Currivan, RTI International Location: Meeting Room 410, Fourth Floor
Impact of ‘Don’t Know’ Options on Attitudinal and Demographic Questions; Larry Osborn, GfK Custom Research; Nicole R. Buttermore, GfK Custom Research Frances M. Barlas, GfK Custom Research Abigail Giles, GfK Custom Research

  • Telephone and in person rarely offer a don’t know option but they will record it, offering it doesn’t improve data
  • May not be the case with online surveys
  • They offered a prompt following nonresponse to see how it changed results
  • Got 4000 completes
  • Tested attitudinal items – with, without, and with a prompt
  • don’t know data was reduced after a prompt, people did choose an opinion, it was effective and didn’t affect data validity
  • Tested it on a factual item as well, income, which is often missing up to 25% of data
  • Branching income often helps to minimize nonresponse (e.g., start with three income groups and then each group is split into three more groups)
  • 1900 completes for this question – 35k or below, > 35k, DK, and then branched each break; DK was only offered for people who skipped the question
  • Checked validity by correlations with income related variables (e.g., education, employment)
  • Lower rates of missing when DK is offered after nonresponse, it seems most missing data is satisficing 

Assessing Changes in Coverage Bias of Web Surveys as Internet Access Increases in the United States; David Sterrett, NORC at the University of Chicago Dan Malato, NORC at the University of Chicago Jennifer Benz, NORC at the University of Chicago Trevor Tompson, NORC at the University of Chicago Ned English, NORC at the University of Chicago

  • Many people don’t have Internet access but nowadays it’s much better, can we feel safe with a web only survey
  • Is coverage bias minimal enough to not be worried – people who don’t have access to Internet 
  • It can be question by question, not just for the survey overall
  • Coverage bias is a problem if there are major differences between those with coverage and without, if they are the same kinds of people it won’t matter as much
  • Even when you weight data, it might not be a representative sample, weights don’t fix everything
  • ABS – address based sampling design – as opposed to telephone number or email address based
  • General social survey has information about whether people have Internet access and it has many health, social, political, economic issues; can see where coverage error happens
  • Income, education, ethnicity, age are major predictors of Internet access as predicted
  • What issues are beyond weighting on demographics
  • For many issues, there was a less than 1% point coverage error
  • For health, same sex marriage, and education, the differences were up to 7% point different
  • Over time – bias decreased for voting, support for assistance of blacks; but error increased for spend on welfare, marijuana, getting ahead in life 
  • Saw many differences when they looked into subgroups
  • [so many tests happening, definitely need to see replication to rule out which are random error]
  • As people who don’t have Internet access become more different from people who have it, we need to be cognizant of how that skews which subset of results
  • You can’t know whether the you are research is a safe one or not

Squeaky Clean: Data Cleaning and Bias Reduction; Frances M. Barlas, GfK Custom Research Randall K. Thomas, GfK Custom Research Mansour Fahimi, GfK Custom Research Nicole R. Buttermore, GfK Custom Research

  • Do you need to learn your data [yes, because you don’t know if errors sit within a specific group of people, you need to at least be aware of the quality]
  • Some errors are intentional, others accidental, or they couldn’t find the best answer
  • Results did not change if no more than 5% of the data was removed
  • Is there such a thing as too much data cleaning
  • Cleaned out incremental percentages of data and then weighted to census, matched to census data as the benchmark
  • Saw no effect with cleaning up to 50% of the data with one of the panels, similar with the second almost no effect of cleaning
  • [given that different demographic groups have different data quality, it could matter by subsample]

Trap Questions in Online Surveys; Laura Wronski, SurveyMonkey Mingnan Liu, SurveyMonkey

  • Tested a variety of trap questions, easy or hard, beginning or end – used the format of selecting an answer they specify
  • 80% were trapped with the hard question
  • [saddens me that we talk about ‘trapping’ respondents.. They are volunteering their time for us. We must treat them respectfully. Trap questions tell respondents we don’t trust them.]
  • Tested follow ups and captcha 
  • Announcements didn’t result in any differences, picture verification question trapped about ten percent of people
  • Captcha trapped about 1% [probably they couldn’t read it]
  • Prefered the picture trap
  • [I personally prefer using many questions because everyone makes errors somewhere. Someone who makes MANY errors is the problem, not someone who misses one question.]
  • At the end of the survey, asked people if they remembered the data quality question – many people didn’t notice it
  • One trap question is sufficient [wow, I disagree so much with this conclusion]

Identifying Psychosocial Correlates of Response in Panel Research: Evidence from the Health and Retirement Study; Colleen McClain, University of Michigan – MAPOR student paper winner

  • People who are more agreeable are less likely to participate (big 5 traits)
  • More conscientious are more likely to participate 
  • More agreeable people took longer to respond to the survey
  • Conscientious  people respondent more quickly
  • More distrustful are less likely to check their records
  • Effects were very small
  • We need to consider more than demographics when it comes to data quality
%d bloggers like this: