The impact of questionnaire design on measurements in surveys #4 #ESRA15 #MRX 


Live blogged from #ESRA15 in Reykjavik. Any errors or bad jokes are my own.

Well, last night i managed to stay up until midnight. The lights at the church went on, lighting up the tower and the very top in an unusual way. They were quite pretty! The rest of the town enjoyed mood lighting as it didn’t really get dark at all. Tourists were still wandering in the streets since there’s no point going to bed in a delightful foreign city if you can still see where you’re going. And if you weren’t a fan of the mood lighting, have no fear! The sun ‘rose’ again just four hours later. If you’re scared of the dark, this is a great place to be – in summer!

Today’s program for me includes yet another sessions of question data quality, polling question design, and my second presentation on how non-native English speakers respond to English surveys. We may like to think that everyone answering our surveys is perfectly fluent but let’s be realistic. About 10% of Americans have difficulty reading/writing in English because it is not their native language. Add to that weakly and non-literate people, and there’s potential big trouble at hand.


the impact of answer format and item order on the quality of measurement

  • compared 2 point scale and 11 point scale, different order of questions and question can even be very widely apart, looked at perceived prestige of occupations
  • separated two pages of the surveys with a music game of guessing the artist and song, purely as distraction from the survey. the second page was the same questions in a completely different order, did the same thing numerous times changing the number of reponse options and question orders each time. whole experiment lasted one hour
  • assumed scale was unidimensional
  • no differences comparing 4 point to 9 point scale, none between 2 point and 9 point scale [so STOP USING HUGE SCALES!!!]
  •  prestige does not change depending on order in the survey [but this is to be expected with non-emotional, non-socially desirable items]
  • respondents confessed they tried to answer well but maybe not the best of their ability or maybe their answers would change the next time [glad to see people know their answers aren’t perfect. and i wouldn’t expect anything different. why SHOULD they put 100% effort into a silly task with no legitimate outcome for them.]

measuring attitudes towards immigration with direct questions – can we compare 4 answer categories with dichotomous responses

  • when sensitive questions are asked, social desirability affects response distributions
  • different groups are affected in different ways
  • asked questions about racial immigration – asked binary or as a 4 point scale
  • it’s not always clear that slightly is closer to none or that moderately is closer to strongly. can’t just assume the bottom two boxes are the same or the top two boxes are the same
  • education does have an effect, as well as age in some cases
  • expression of opposition for immigration depends on the response scale
  • binary responses leads to 30 to 50% more “allow none” responses than the 4 point scale
  • responents with lower education have lower probability to choose middle scale point

cross cultural differences in the impact of number of repsonse categories on response behaviour and data structure of a short scale for locus of control

  • locus of control scale, 4 items, 2 internal, 2 external
  • tested 5 point vs 9 point scale
  • do the means differ, does the factor structure differ
  • I’m  own boss; if i work hard, i’ll succeed; when at work or in m private life what I do is mainly determined by others; bad luck often gets in the way of m plans
  • labeled doesn’t apply at all, applies completely
  • didn’t see important demographic differences
  • saw one interaction but it didn’t really make sense [especially given sample size of 250 and lots of other tests happening]
  • [lots of chatter about significance and non-significance but little discussion of what that meant in real words]
  • there was no effect of item order, # of answer options mattered for external locus but not internal locus of control
  • [i’d say hard to draw any conclusions given the tiny number of items, small sample size. desperately needs a lot of replication]

the optimal number of categories in item specific scales

  • type of rating scale where the answer is specific to the scale and doesn’t necessarly apply to every other item – what is your health? excellent, good, poor
  • quality increased with the number of answer options comparing 11,7,5,3 point scales but not comparing 10,6,4 point scales
  • [not sure what quality means in this case, other audience members didn’t know either, lacking clear explanation of operationalization]
%d bloggers like this: