Advances in designing questions in brief #AAPOR #MRX 


Concurrent Session A, Moderator Carl Ramirez, US Government Accountability Office, 9 papers!

prezzie 1: using item repsonse theory modeling

  • useful for generating shorter question lists [assuming you are writing scales, plan to reuse scales many times, and don’t require data to every question youve written]
  • [know what i love about aapor? EVERYONE can present regardless of presentation skill. content comes first. and on a tangent, I’ve already eaten all the candies i found in the dish]

prezzie 2: measurements of adiposity

  • prevalence rate of obesity is 36% in the USA, varies by state but every state is at least 20% [this is embarrassing in a world where millions of people starve to death]
  • we most often use self reported height and weight to calculate BMI, this is how national CDC measures it but these reports are not reliable
  • correlations of BMI and body fat is less than 40%, we create a proxy with an unreliable measure
  • underwater weight is a better measure but there are oviously many drawbacks to that

prezzie 3: asking sensitive GLBT questions

  • respondents categorize things differently than researchers, instructions do affect answers, does placement of those intructions matter? [hm, never really thought of that before]
  • tested long instructions before vs after the question
  • examined means and nonresponse
  • data collection incomplete so can’t report results

prezzie 4: response order effects related to global warming

  • most americans believe climate change is real but one third do not
  • primacy and recency effects can affect results, primacy more often in self-administered, recency more often in interviewer assisted
  • reverse ordered five questions for two groups, 5 attitudes were arranged on a scale from belief to disbelief
  • more people believed global warming when it was presented first, effect size small around 5%
  • it affected the first and last items, not the middle opinions
  • less educated people were more affected by response orders, also people who weren’t interested in the topic were more affected

prezzie 5: varying administration of sensitive questions to reduce nonresponse 

  • higher rates of LGB are assumed to be more accurate
  • 18 minute survey on work and job benefits
  • tried assigning numbers versus words to the answers ( are you 1: gay…. vs are you gay) [interesting idea!]
  • [LOVE sample sizes of over 2300]
  • non response differences were significant but the effect size was just 1% or less
  • it did show higher rates of LGB, do recommend trying this in a  telephone survey

prezzie 6: questionnaire length and response rates

  • CATI used a 10$ incentive, web was $1, and mail was $1 to $4 [confound number 1]
  • short survey was 30% shorter but still well over 200 questions
  • no significant difference in response rate, completion rate better for short version 3% more
  • no effect on web, significant effect on mail

prezzie 8: follow up short surveys to increase response rates

  • based on a taxpayer burden survey n=20000
  • 6 stage invite and reminder process, could receive up to 3 survey packages, generates 40% response rate
  • short form is 4 pages about time to complete and money to complete, takes 10 minutes tocomplete 
  • many of the questions are simply priming questions so that people answer the time and money questions more accurately
  • at stage 6, divided into long and short form
  • there was no significant difference in response rate overall
  • no differences by difficulty or by method of filing 
  • maybe people didn’t realize the envelope has a shorter survey, they may have chucked it without knowing
  • will try a different envelope next time as well as saying overtly its a shorter survey

prezzie 9: unbalanced vs balanced scales to reduce measurement error

  • attempted census, 92% response rate of peace corps volunteers
  • before it was 0 to 5, after it was -2 to +2
  • assume most variation will be in the middle as very happy and very unhappy people will go to the extremes anyways
  • only 33 people in the test, 206 items
  • endpoint results were consistent but the means were very slightly different [needs numerous replications to get over the sample size and margin of error issue]
%d bloggers like this: