Tag Archives: ESRA

Representativeness of surveys using internet-based data collection #ESRA15 #MRX 

Live blogged from #ESRA15 in Reykjavik. Any errors or bad jokes are my own.

Yup, it’s sunny outside. And now i’m back inside for the next session. Fortunately, or unfortunately, this session is once again in a below ground room with no windows so I will not be basking in sunlight nor gazing longingly out the window. I guess I’ll be paying full attention to another really great topic.


conditional vs unconditional incentives: comparing the effect on sample composition in the recruitment of the german internet panel study GIP

  • unconditional incentives tend to perform better than promised incentives
  • include $5 with advance letter compared to promised $10 with thank you letter; assuming 50% response rate, cost of both groups is the same
  • consider nonresponse bias, consider sample demo distribution
  • unconditional incentive had 51% response rate, conditional incentive had 42% response rate
  • didn’t see a nonresponse bias [by demographics I assume, so many speakers are talking about important effects but not specifically saying what those effects are]
  • as a trend, the two sets of data provide very similar research results, yes differences in means but always fairly close together, confidence intervals always overlap


evolution of representativeness in an online probability panel

  • LISS panel – probability panel, includes households without internet accesst, 30 minutes per month, paid for every completed questionnaire
  • is there systematic attrition, are core questionnaires affected by attrition
  • normally sociademographics only which is restrictive
  • missing data imputed using Mice
  • strongest loss in panel of sociodemographic properties
  • there are seasonal drops in attrition, for instance in June which is lots of holidays
  • has more effects for survey attitudes and health traits, less so for political and personality traits which are quite stable even with attrition
  • try to decrease attrition through refreshement based on targets


moderators of survey representativeness – a meta analysis

  • measured single mode vs multimode surveys
  • R-indicators – single measure from 0 to 1 for sample representativeness, based on logistic regression models for repsonse propensity
  • hypothesize mixed mode surveys are more representative than single mode surveys
  • hypothesize cross-sectional surveys are more representative than longitudinal survyes
  • heterogeneity not really explained by moderators

setting up a probability based web panel. lessons learned fromt he ELIPSS pilot study

  • online panel in france, 1000 people, monthly questionnaires, internet access given to each member [we often wonder about the effect of people being on panels since they get used to and learn how to answer surveys, have we forgotten this happens in probability panels too? especially when they are often very small panels]
  • used different contact mdoes including letters, phone, face to face
  • underrepresented on youngest, elderly, less educated, offline people
  • reason for participatign in order – trust in ELIPSS 46%, originality of project 37%, interested in research 32%, free internet access 13%
  • 16% attiriont after 30 months (that’s amazing, really low and really good!), response rate generally above 80%
  • automated process – invites on thursday, sustematic reminders, by text message, app message and email
  • individual followups by phone calls and letters [wow. well that’s how they get a high response rate]
  • individual followups are highly effective [i’d call them stalking and invasive but that’s just me. i guess when you accept free 4g internet and a tablet, you are asking for that invasiveness]
  • age becomes less representative over time, employment status changes a lot, education changes the most but of course young people gain more education over time
  • need to give feedback to panel members as they keep asking for it
  • want to broaden use of panel to scientific community by expanding panel to 3500 people



the pretest of wave 2 of the german health interview and examination survey for children and adolescents as a mixed mode survey, composition of participant groups

  • mixed mode helps to maintain high response, web is prefered by younger people, representativeness could be increased by using multiple modes
  • compared sequential and simultaneous surveys
  • single mode has highest response rate, mixed mode simultaneous was extremely close behind, mixed mode multi-step had the lowest rate
  • paper always gave back the highest porportion of data even when people had the choice of both, 11% to 43% chose the paper among 3 groups
  • sample composition was the same among all four groups, all confidence intervals overlap – age, gender, nationality, immigration, education
  • metaanalysis – overall trend is the same
  • 4% lower response rate in mixed mode – additional mode creates cognitive burden, creates a break in response process, higher breakoffs
  • mixed mode doesn’t increase sample composition nor response rates [that is, giving people multiple options as opposed to just one option, as opposed to multiple groups whereby each groups only knows about one mode of participation.]
  • current study is now a single mode study



Sample composition in online studies #ESRA15 #MRX 

Live blogged at #ESRA15 in Reykjavik. Any errors or bad jokes are my own.

I’ve been pulling out every ounce of bravery I have here in Iceland and I went to the pool again last night (see prevoius posts on public nakedness!). I could have also broken my rule about not traveling after dark in strange cities but since it never gets dark here, I didn’t have to worry about that! The pool was much busier this time. I guess kiddies are more likely to be out and about after dinner on a weekday rather than sunday morning at 9am.  All it meant is that I had a lot more people watching to do. All in all good fun to see little babies and toddlers enjoying a good splash and float!

This morning, the sun was very much up and the clouds very much gone. I’ll be dreaming of breaktime all morning! Until then however, i’ve got five sessions on sample composition in online surveys, and representativeness of online studies to pay attention to. It’s going to be tough but a morning chock full of learning will get me a reward of more pool time!  what is the gain in a probability based online panel to provide internet access to sampling unites that did not have access before

  • germany has GIP, france has ELPSS, netherlands has LISS as probability panels
  • weighting might not be enough to account for bias of people who do not have internet access
  • but representativeness is still a problem because people may not want to participate even if they are given access, recruitment rates are much lower among non-interenet households
  • probaility panels still have problems, you won’t answer every survey you are sent, attrition
  • do we lose much without a representative panel? is it worth the extra cost
  • in Elipss panel, everyone is provided a tablet, not just people without access. the 3G tablet is the incentive you get to keep as long as you are on the panel. so everyone uses the same device to participate in the research
  • what does it mean to not have Internet access – used to be computer + modem. Now there are internet cafes, free wifi is everywhere. hard to define someone as no internet access now. We mean access to complete a survey so tiny smartphones don’t count.
  • 14.5% of adults in france were classified as not having internet. turned out to be 76 people in the end which is a bit small for analytics purposes. But 31 of them still connected every day.
  • non-internet access people always participated less than people who did have internet.
  • people without internet always differ on demographics [proof is chi-square, can’t see data]
  • populations are closer on nationality, being in a relationship, and education – including non-internet helps with these variables, improves representativity
  • access does not equal usage does not equal using it to answer surveys
  • maybe consider a probability based panel without providing access to people who don’t have computer/tablet/home access

parallel phone and web-based interviews: comparability and validity

  • phones are relied on for research and assumed to be good enough for representativeness, however most people don’t answer phone calls when they don’t recognize the number, cant use autodialler in the USA for research
  • online surveys can generate better quality due to programming validation and ability to only be able to choose allowable answers
  • phone and online have differences in presentation mode, presence of human interviewer, can read and reread responses if you wish, social desirability and self-presentation issues – why should online and offline be the same
  • caution about combining data from different modes should be exercised [actually, i would want to combine everything i possibly can. more people contributing in more modes seems to be more representative than excluding people because they aren’t identical]
  • how different is online nonprobability from telephone probability  [and for me, a true probability panel cannot technically exist. its theoretically possible but practically impossible]
  • harris did many years of these studies side by side using very specific methodologies
  • measured variety of topics – opinions of nurses, bug business trust, happiness with health, ratings of president
  • across all questions, average correlation between methods was .92 for unweighted means and .893 for weighted means – more bias with weighted version
  • is it better for scales with many response categories – corrections go up to .95
  • online means of attitudinal items were on average 0.05 lower on scale from 0 to 1. online was systematically biased lower
  • correlations in many areas were consistently extremey high, means were consistently very slightly lower for online data; also nearly identical rank order of items
  • for political polling, the two methods were again massively similar, highly comparable results; mean values were generally very slightly lower – thought to be ability to see the scale online as well as social desirability in telephone method, positivity bias especially for items that are good/bad as opposed to importance 
  • [wow, given this is a study over ten years of results, it really calls into question whether probability samples are worth the time and effort]
  • [audience member said most differences were due to the presence of the interviewer and nothing to do with the mode, the online version was foudn to be truer]

representative web survey

  • only a sample without bias can generalize, the correct answer should be just as often a little bit higher or a little bit lower than reality
  • in their sample, they underreprested 18-34, elementary school education, lowest and highest income people
  • [yes, there are demographic differences in panels compared to census and that is dependent completely on your recruitment method. the issue is how you deal with those differences]
  • online panel showed a socially positive picture of population
  • can you correct bias through targeted sampling and weighting, ethnicity and employment are still biased but income is better [that’s why invites based on returns not outgo are better]
  • need to select on more than gender, age, and region
  • [i love how some speakers still have non-english sections in their presentation – parts they forgot to translate or that weren’t translatable. now THIS is learning from peers around the world!]

measuring subjective wellbeing: does the use of websurveys bias the results? evidence from the 2013 GEM data from luxembourg

  • almost everyone is completely reachable by internet
  • web surveys are cool – convenient for respondents, less social desirability bias, can use multimedia, less expensive, less coding errors; but there are sampling issues and bias from the mode
  • measures of subjective well being – i am satisfied with my life, i have obtained all the important things i want in my life, the condition of my life are excellent, my life is close to my ideal [all positive keyed]
  • online survey gave very slightly lower satisfaction
  • the results is robuts to three econometric techqnies
  • results from happiness equations using differing modes are compatible
  • web surveys are reliable for collecting information on wellbeing

Assessing and addressing measurement equivalence in cross-cultural surveys #ESRA15 #MRX 

Live blogged from #ESRA15 in Reykjavik. Any errors or bad jokes are my own.

Today’s lunch included vanilla Skyr. Made with actual vanilla beans. Beat that yoghurt of home! Once again, i cannot choose a favourite among coconut, pear, banana, and vanilla other than to say it completely beats yoghurt. I even have a favourite brand although since I don’t have the container in front of me right now, I can’t tell you the brand. It still counts very much as brand loyalty though because I know exactly what the container looks like once I get in the store.

I have to say I remain really impressed with the sessions. They are very detail oriented and most people provide sufficient data for me to judge for myself whether I agree with their conclusions. There’s no grandstanding, essentially no sales pitches, and I am getting take-aways in one form or another from nearly every paper. I’m feeling a lot less presentation pressure here simply because it doesn’t seem competitive. If you’ve never been to an ESRA conference, I highly recommend it. Just be prepared to pack your own lunch every day. And that works just great for me.

cross cultural equivalence of survey response latencies

  • how long does it take for a respondent to provide their answer, easy to capture with computer assisted interviewing, uninfluenced by self reports
  • longer latencies seem to represent more processing time for cognitive operations, also represents presence and accessibility of attitudes and strength of those attitudes
  • longer latencies correlated with age, alcohol use, and poorly designed and ambiguous questions, perhaps there is a relationship with ethnic status
  • does latency differ by race/ethnicity; do they vary by language of interview
  • n=600 laboratory interview, 4 race groups, 300 questions taking 77 minutes all about health, order of sections rotated
  • required interviewer to hit a button when they stopped talking and hit a button when the respondent started talking; also recorded whether there were interruptions in the response process; only looked at perfect responses [which are abnormal, right?]
  • reviewed all types of question – dichotomous, categorical, bipolar scales, etc
  • Hispanic, black, Korean indeed took longer to answer compared to white people on the English survey in the USA
  • more educated took slightly less time to answer
  • numeric responses took much longer, yes not took the least, uni-polar was second least
  • trend was about the same by ethnicity
  • language was an important indicator

comparing survey data quality form native and nonnative English speakers

  • me!
  • conclusion – using all of our standard data quality measures may eliminate people based on their language skills not on their data quality skills. But, certain data quality measures are more likely to predict language rather than data quality. We should focus more on on straightlining and overclicking and ignore underclicking as a major error.
  • ask me for the paper 🙂

trust in physicians or trust in physician – testing measurement invariance of trust in physicians in different health care cultures

  • trust reduces social complexity, solves problems of risk, makes interactions possible
  • we lack knowledge of various professions – lawyers, doctors, etc, we don’t understand diagnosis, treatments
  • we must rely on certificates, clothes such as doctors white, location such as a hospital
  • is there generalized trust in doctors
  • different health care systems produce different kinds of trust, ditto cultural contexts, political and values systems
  • compared three countries with health care coverage and similar doctors per person measurements
  • [sorry, didn’t get the main conclusion from the statement “results were significant”]

Advancements of survey design in election polls and surveys #ESRA15 #MRX 

Live blogged from #ESRA15 in Reykjavik. Any errors or bad jokes are my own.

I decided to take the plunge and choose a session in a different building this time. The bravery isn’t much to be noted as I’ve realized that the campus and buildings and rooms at the University of Iceland are far tinier than what I am used to. Where I’d expect neighboring buildings to be a ten minute walk from one end to the other, here it is a 30 second walk. It must be fabulous to attend this university where everything and everyone is so close!

I’m quite loving the facilities. For the most part, the chairs are comfortable. Where it looks like you just have a chair, there is usually a table hiding in the seat in front of you. There is instantly connecting and always on wifi no matter which building you’re in. There are computers in the hallways, and multiple plugs at all the very comfy public seating areas. They make it very easy to be a student here! Perhaps I need another degree?

Designing effective likely voter models in pre-election surveys

  • voter and intention and turnout can be extremely different. 80% say they will vote but 10% to 50% is often the number that actually votes
  • democratic vote share is often over represented [social desirability?]
  • education has a lot of error – 5% error rate, worst demographic variable
  • what voter model reduces these inaccuracies
  • behavioural models (intent do vote, have you voted, dichotomous variables) and resource based models (
  • vote intention does predict turnout – 86% are accurate, also reduces demographic errors
  • there’s not a lot of room to improve except when the polls look really close
  • Gallup tested a two item measure of voting intention – how much have you thought about this election, how likely are you to vote
  • 2 item scale performed far better than the 7 item scale, error rate of 4% vs 1.4%
  • [just shown a histogram with four bars. all four bars look essentially the same. zero attempt to create a non-existent different. THAT’S how you use a chart 🙂 ]
  • gallup approach didnt work well, probability approach performed better
  • best measure of voting intention = Thought about election + likelihood of voting  + education + voted before + strength of partisan identify

polls on national independence: the scottish case in a comparative perspective

  • [Claire Durand from the University of Montreal speaks now. Go Canada! 🙂 ]
  • what happened in Quebec in 1995? referendum on independence
  • Quebec and Scotland are nationalist in a British type system, proportion of non-nationals is similar
  • referendum are 50% + 1 wins
  • but polls have many errors, is there an ant-incumbent effect
  • “no” is always underestimated – whatever the no is
  • are referendum on national independence different – ethnic divide, feeling of exclusion, emotional debate, ideological divide
  • No side has to bring together enemies and don’t have a unified strategy
  • how do you assign non-disclosure?
  • don’t know doesn’t always mean don’t know
  • don’t distribute non-disclosures proportionally, they aren’t random
  • asking how people would vote TODAY resulted in 5 points less nondisclosure
  • corrections need to be applied after the referendum as well
  • people may agree with the general demands of the national parties but not with the solution they propose. maintaining the threat allows them to maintain pressure for change.
  • the Quebec newspapers reported the raw data plus the proportional response so people could judge for themselves

how good are surveys at measuring past electoral behaviour? lessons from an experiment in a french online panel study

  • study bias in individual vote recall
  • sample size of 6000
  • over-reporting of popular party, under-reporting of less popular party
  • 30% of voter recall was inconsistent
  • inconsistent respondents change their recall, changed parties, memory problems, concealing problems, said they didn’t vote, said you vote and then said you didn’t or vice versa
  • could be any number of interviewer issues
  • older people found it more difficult to remember but perhaps they have more voter loyalty
  • when available, use  ]vote real from pre-election survey
  • use vote recall from post election underestimates voter transfers
  • caution in using vote recall to weight samples

methodological issues in measuring vote recall – an analysis of the individual consistency of vote recall in two election longitudinal surveys

  • popularity = weighted average % of electorate represented
  • universality = weighted frequency of representing a majority
  • used four versions of non/weighting including google hits
  • measured 38 questions related to political issues
  • voters are driven by political traditional even if outdated, or by personal images of politicians not based on party manifestors
  • voters are irrational, political landscape has shifted even though people see the parties the same way they were decades ago
  • coalition formation aggravate the situation even more
  • discrepancy between the electorate and the government elected

The impact of questionnaire design on measurements in surveys #4 #ESRA15 #MRX 

Live blogged from #ESRA15 in Reykjavik. Any errors or bad jokes are my own.

Well, last night i managed to stay up until midnight. The lights at the church went on, lighting up the tower and the very top in an unusual way. They were quite pretty! The rest of the town enjoyed mood lighting as it didn’t really get dark at all. Tourists were still wandering in the streets since there’s no point going to bed in a delightful foreign city if you can still see where you’re going. And if you weren’t a fan of the mood lighting, have no fear! The sun ‘rose’ again just four hours later. If you’re scared of the dark, this is a great place to be – in summer!

Today’s program for me includes yet another sessions of question data quality, polling question design, and my second presentation on how non-native English speakers respond to English surveys. We may like to think that everyone answering our surveys is perfectly fluent but let’s be realistic. About 10% of Americans have difficulty reading/writing in English because it is not their native language. Add to that weakly and non-literate people, and there’s potential big trouble at hand.

the impact of answer format and item order on the quality of measurement

  • compared 2 point scale and 11 point scale, different order of questions and question can even be very widely apart, looked at perceived prestige of occupations
  • separated two pages of the surveys with a music game of guessing the artist and song, purely as distraction from the survey. the second page was the same questions in a completely different order, did the same thing numerous times changing the number of response options and question orders each time. whole experiment lasted one hour
  • assumed scale was uni-dimensional
  • no differences comparing 4 point to 9 point scale, none between 2 point and 9 point scale [so STOP USING HUGE SCALES!!!]
  •  prestige does not change depending on order in the survey [but this is to be expected with non-emotional, non-socially desirable items]
  • respondents confessed they tried to answer well but maybe not the best of their ability or maybe their answers would change the next time [glad to see people know their answers aren’t perfect. and i wouldn’t expect anything different. why SHOULD they put 100% effort into a silly task with no legitimate outcome for them.]

measuring attitudes towards immigration with direct questions – can we compare 4 answer categories with dichotomous responses

  • when sensitive questions are asked, social desirability affects response distributions
  • different groups are affected in different ways
  • asked questions about racial immigration – asked binary or as a 4 point scale
  • it’s not always clear that slightly is closer to none or that moderately is closer to strongly. can’t just assume the bottom two boxes are the same or the top two boxes are the same
  • education does have an effect, as well as age in some cases
  • expression of opposition for immigration depends on the response scale
  • binary responses leads to 30 to 50% more “allow none” responses than the 4 point scale
  • respondents with lower education have lower probability to choose middle scale point

cross cultural differences in the impact of number of response categories on response behaviour and data structure of a short scale for locus of control

  • locus of control scale, 4 items, 2 internal, 2 external
  • tested 5 point vs 9 point scale
  • do the means differ, does the factor structure differ
  • I’m  own boss; if i work hard, i’ll succeed; when at work or in m private life what I do is mainly determined by others; bad luck often gets in the way of m plans
  • labeled doesn’t apply at all, applies completely
  • didn’t see important demographic differences
  • saw one interaction but it didn’t really make sense [especially given sample size of 250 and lots of other tests happening]
  • [lots of chatter about significance and non-significance but little discussion of what that meant in real words]
  • there was no effect of item order, # of answer options mattered for external locus but not internal locus of control
  • [i’d say hard to draw any conclusions given the tiny number of items, small sample size. desperately needs a lot of replication]

the optimal number of categories in item specific scales

  • type of rating scale where the answer is specific to the scale and doesn’t necessarily apply to every other item – what is your health? excellent, good, poor
  • quality increased with the number of answer options comparing 11,7,5,3 point scales but not comparing 10,6,4 point scales
  • [not sure what quality means in this case, other audience members didn’t know either, lacking clear explanation of operationalization]

The impact of questionnaire design on measurements in surveys #2 #ESRA15 #MRX  

Live blogged at #ESRA15 in Reykjavik. Any errors or bad jokes are my own.

Breaktime treated us to fruit and croissants this morning. I was hoping for another unique to iceland treat but perhaps that was a sign to stop eating. No, just kidding! Apparently you’re not allowed to bring food or drink into the classrooms. The signs say so. The signs also say no Facebook in the classrooms. Shhhh…. I was on Facebook in the classroom!

The sun is out again and I took a quick walk outside. I am thankful my hotel is at the foot of the famous church. No matter where I am in this city, I can always, easily, and instantly find my hotel. No map needed when the church is several times higher than the next highest building!

I’ve noticed that the questions at this conference are far more nit-picky and critical than I’m used. I suspect that is because the audience includes many academics whose entire job is focused on these topics. They know every minute detail because they’ve done similar studies themselves. It makes for great comments and questions, though it does seem to put the speaker on the spot every time!

smart respondents: let’s keep it short.

  • do we really need scale instructions in the question stem? they add length, mobile screens have limited space, and respondents skip the instructions if the response scale is already labeled [isn’t this just an artifact of old fashioned face to face surveys, telephone surveys]
  • they tested instructions that matched and did not match what was actually in the scale [i can imagine some panelists emailing the company to complain that the survey had errors!]
  • used a probability survey [this is one case where a nonprobability sample would have been well served, easier cheaper to obtain with no need to generalize precisely to a population]
  • answer frequencies looked very similar for correct and incorrect instructions, no significant differences, she’s happy to have nonsignificant results, unaffected by mobile device or age
  • [more regression results shown, once again, speaker did not apologize and the audience did not have a heart attack]
  • it seems like responsents ignore instructions in the question, they reply on the words in the answer options, e.g., grid headers
  • you can omit instructions if the labeling is provided in the answer options
  • works better for experienced survey takers [hm, i doubt that. anyone seeing the answer options will understand. at least, thats my opinion.]

from web to paper: evaluation from data providers and data analysts. The case of annual survey finances of enterprises

  • we send out questionaires, something happens, we get data back – we don’t know what happens 🙂
  • wanted to keep question codes in the survey which seemed unnecessary to respondents, had really long instructions for some questions that didn’t fit on the page so they put them on a pdf
  • 64% of people evaluted the codes on the online questionnaire positively, 12% rated the codes negatively. people liked that they could communicate with statistics netherlands by using the codes
  • 74% negative responses to explanations of question which were intended to reduce calls from statistics netherlands, only 11% were positive
  • only 25% of people consulted the pdf with instructions
  • most people wanted to received a printed version of the questionnaire they filled out, people really wanted to print it and they screen capped it, people liked being able to return later, they could easily get an english version
  • data editors liked that they didn’t have to do data entry but now they needed more time to read and understand what was being said
  • they liked having the email address because they got more direct and precise answers, responses came back faster, they didn’t notice any changes in the time series data

is variation in perception of inequality and redistribution of earnings actual or artifactual. effects of wording, order, and number of items

  • opinions differ when you ask how much should people make vs how much should the top quintile of peopl emake
  • they asked people how much a number of occupations should earn, they also varied how specific the title was e.g., teacher vs math teacher in a public highschool
  • estimates for specific descriptions were higher, high status jobs got much higher estimates
  • adding more occupations to the list makes reliability in earnings decrease

exploring a new way to avoid errors in attitude measurements due to complexity of scientific terms: an example with the term biodiversity

  • how do people talk about complicated terms, their own words often differ from scientific definitions
  • “what comes to mind when you think of biodiversity?” – used text analysis for word frequencies, co-occurences, correspondence analysis, used the results to design items for the second study
  • found five classes of items – standard common definition, associated with human actions to protect it, human envionment relationship, global actions and consequences, scientific definition
  • turned each of the five types of defiintions into a common word definition
  • people gave more positive opinions about biodiversity when they were asked immediately after the definition
  • items based on representations of biodiversity were valid and reliable
  • [quite like this methodology, could be really useful in politics]

[if any of these papers interest you, i recomend finding the author on the ESRA program and asking for an official summary. Global speakers and weak microphones makes note taking more challenging. 🙂 ]

The impact of questionnaire design on measurements in surveys #1 #ESRA15  #MRX  

Live blogged from #ESRA15 in Reykjavik. Any errors or bad jokes are my own.

I tried to stay up until midnight last night but ended going to bed around 10:30pm. Naturally, it was still daylight outside. I woke up this morning at 6am in broad daylight again. I’m pretty sure it never gets dark here no matter what they say. I began my morning routine as usual. Banged my head on the slanted ceiling, stared out the window at the amazing church, made myself waffles in the kitchen, and then walked past the pond teaming with baby ducks. Does it get any better? I think no. Except of course knowing i had another day of great content rich sessions ahead of me!

designs and developments of the income measures in the european social surveys

  • tested different income questions. allowed people to use a weekly, monthly, or annual income scale as they wished. there was also no example response, and no example of what constitutes income. Provided about 30 answer options to choose from, shown in three columns. Provided same result as a very specific question in some countries but not others.
  • also tested every country getting the same number breaks, groups weren’t arranged to reflect each countries distribution. this resulted in some empty breaks [but that’s not necessarily a problem if the other breaks are all well and evenly used]
  • when countries are asked to set up number breaks in well defined deciles, high incomes are chosen more often – affected because people had different ideas of what is and isn’t taxable income
  • [apologies for incomplete notes, i couldn’t quite catch all the details, we did get a “buy the book” comment.]

item non-response and readability of survey questionnaire

  • any non-substantive outcome – missing values, refusals, don’t knows all count
  • non response can lower validity of survey results
  • semantic complexity measured by familiarity of words, length of words, abstract words that can’t be visualized, structural complexity
  • Measured – characters in an item, length of words, percent of abstract words, percent of lesser known words, percent of long words 12 or more characters
  • used the european social survey which is a highly standardized international survey, compared english and estonian, it is conducted face to face, 350 questions, 2422 uk respondents
  • less known and abstract words create more non-response
  • long words increase nonresponse in estonian but not in english, perhaps because english words are shorter anyways
  • percent of long words in english created more nonresponse
  • total length of an item didn’t affect nonresponse
  • [they used a list of uncommon words for measurement, such a book/list does exist in english. I used it in school to choose a list of swear words that had the same frequency levels as regular words.]
  • [audience comment – some languages join many words together which means their words are longer but then there are fewer words, makes comparisons more difficult]

helping respondents provide good answers in web surveys

  • some tasks are inherently difficult in surveys, often because people have to write in an answer, coding is expensive and error prone
  • this study focused on prescription drugs which are difficult to spell, many variations of the same thing, level of detail is unclear, but we have full lists of all these drugs available to us
  • tested text box, drop box to select from list, javascript (type ahead look up)
  • examined breakoff rates, missing data, response times, and codability of responses
  • asked people if they are taking drugs, tell us about three
  • study 1 – breakoffs higher from dropbox and javascript; median response times longer, but codability was better. LIsts didn’t work well at all.
  • study 2 – cleaned up the list, made all the capitalization the same. break off rates were now all the same. response times lower but still higher than the textbox version. codability still better for list versions.
  • study 3 – if they couldn’t find a drug in the list, they were allowed to type it out. unlike previous studies which proceeded with the missing data. dropbox had highest missing data. javascript had lowest missing data. median times highest for drop box. trends for more and more drugs as expected, effect is more but not as much more.
  • older browswers had trouble with dropdowns and javascript and had to be routed to the textbox options
  • if goal is to get codable answers, use a text box. if goal is to create skip patterns then javascript is the way to go.

rating scale labelling in web surveys – are numeric labels an advantage

  • you can use all words to label scales or just words on the end with numbers in between
  • research says there is less satisficing with verbal scales, they are more natural than numbers and there is no inherent meaning of numbers
  • means of the scales were different
  • less tie to completes the end labeled groups
  • people paid more attention to the five point labeled scale, and least to the end point labeled score
  • mean opinions did differ by scale, more positive on fully labeled scale
  • high cognitive burden to map responses of the numeric scales
  • lower reliability for the numeric labels

Surveying sensitive issues – challenges and solutions #ESRA15 #MRX  

Live blogged at #ESRA15 in Reykjavik. Any errors or bad jokes are my own. Break time brought some delightful donuts. I personally only ate one however on behalf of my friend, Seda, I ate several more just for her. By the way, since donuts are in each area, you can just breeze from one area to the next grabbing another donut each time. Just saying…

surveying sensitive questions – prevalence estimates of self-reported delinquency using the crosswise model

  • crime rates differ by country but rates of individuals reporting their own criminal behaviour shows opposite expectations. Thus countries with high rates have lower rates of self-report. Social desirability seems to be the case. Is this true?
  • Need to add random noise to the model so the respondent can hide themself. Needs no randomization device.
  • ask a non-sensiive quesiton and a sensitive question and asked to answer both the same way. Let the respondent indicate whether the answer to both are the same or different. You only need to know the answer of the first question (e.g., is your moms birthday in january? well 1/12 are in january).
  • crosswise model generates vastly high self-criminal rates in countries where you’d expect.
  • also asked people in the survey whether they answered carefully – 15% admitted they did not
  • crosswise results in mugh higher prevalence rates, causal models of deliquent behaviour could be very different
  • satisficing respondents gives less bias than expected
  • estimates of the crosswise model are conservative

pouring water into the wine – the advantages of teh crosswise model asking sensitive questions revisited

  • its easier to implement in self-administtered surveys, no extra randomization device necessary, cognitive burden is lower, no self-protection answering strategies
  • blood donation rates – direction question says 12% but crosswise says 18%
  • crosswise model had a much higher answering time, even after dropping extraordinarily slow people
  • model has some weakneses, the less the better approach is good to determine if the crosswise model works
  • do people understand the instructions and do they specifically follow those instructions

effects of survey sponsorship and mode of administration on respondents answers about their racial attitudes

  • used a number of prejudice scales both blatant and subtle
  • no difference in racial measures on condition of interviewer administration
  • blatant prejudice scale showed a significant interaction for type of sponsor
  • matters more when there is an interviewer and therefore insufficient privacy
  • sponsor effect is likely the result of social desirability
  • response bias is in opposite direction for academic and market research groups
  • does it depend which department does the study – law department, sociology department

impact of survey mode (mail vs telephone) and asking about future intentions 

  • evidence suggests that asking about intent to get screened before asking about screening may minimize over reporting of cancer screening. removes the social pressure to over report.
  • people report behaviors more truthfully in self-administrered forms than interviews
  • purchased real estate on an omnibus survey
  • no main effect for mode
  • in mail mode, asking about intent first was more reflective of reality of screening rates
  • 30% false positive said they had a test but it wasn’t in their medical record
  • little evidence that the intention item affected screening accuracy
  • mailed surveys may positively affected accuracy – but mail survey was one topic whereas the telephone was omnibus

effect of socio-demographic (mis)match between interviewers and respondents on the data quality of answers to sensitive questions

  • theory of liking, some say matching improves chances of participation, may also improve disclosure and reporting, especially gender matching
  • current matched within about five years of age as opposed to arbitrary cut-off points
  • also matched on education
  • male interviewer to female interviewee had lowest response rate
  • older interviewer had lower response rate
  • no effects for education
  • income had the most missing data, parent’s education was next highest missing data likely because education from 50 years ago was different and you’d have to translate, political party had high missing rate
  • if female subject refuses a male interviewer, send a female to try to convince them
  • it’s easier to refuse a person who is the same age as you [maybe it’s a feeling of superiority/inferiority – you’re no better than me, i don’t have to answer to you]
  • men together generate the least item non-response
  • women together might get too comfortable together, too chatty, more non-response, role-boundary issue
  • age matching is less item non-response
  • same education is less item non-response, why do interviewers allow more item non-response when theirrespondent  has a lower education

Related Posts

Direction of response scales #ESRA15 #MRX 

Live blogged at #ESRA15 in Reykjavik. Any errors or bad jokes in the notes are my own.

I discovered that all the buildings are linked indoors. Let it rain, let it rain, i don’t care how much it rains….  [Feel free to sing that as loud as you can.] Lunch was Skyr, oat cookies and some weird beet drink. Yup. I packed it myself. I always try to like yogurt and never really do. Skyr works for me. So far, coconut is my favourite. I’ve forgotten to take pictures of speakers today so let’s see if I can keep the trend going! Lots of folks in this session so @MelCourtright and I are not the only scale geeks out there . 🙂

Response scales: Effects of scale length and direction on reported political attitudes

  • instruments are not neutral, they are a form of communication
  • cross national projects use different scales for the same question so how do you compare the reuslts
  • trust in parliament is a fairly standard question for researchers and so makes a good example
  • 4 point scale is most popular but it is used up to 11 points, traditional format is very positive to very negative
  • included a don’t know in the answer options
  • transformed all scales into a 0 to 1 scale and evenly distributed all scores in between
  • means highest with 7 point scale traditional direction and lowest with 4 point and 11 point traditional direction
  • reverse direction had much fewer mean differences, essentially all the same
  • four point scales show differences in direction, 7 and 11 point show fewer differences in direction
  • [regression results shown on the screen – no one fainted or died, the speaker did not apologize or say she didn’t understand them. interesting difference compared to MRX events.]

Does satisficing drive scale direction effects

  • research shows answers shift towards the start fo the scale but this is not consistent
  • achoring and adjustment effects whereby people use the first answer option as the anchor, interpretative heuristics suggest people choose an early response to express their agreement with the questions, primacy effects due to satisficing decreases cognitive load
  • scores were more positive when the scale started positive, differences were huge across all the brands
  • the pattern is the same but the differences are noticeable
  • speeding measured as 300 milliseconds per word
  • speeders more likely to choose early answer option
  • answers are pushed to the start of the scale, limited evidnce that it is caused by satisficing

Ordering your attention: response order effects in web-based surveys

  • primacy happens more often visually and recency more often orally
  • scales have an inherence order. if you know the first answer option, you know the remainder of the options
  • sample size over 100 000, random assigned to scale order, also tested labeling, orientation, and number of response categories from 2 to 11
  • the order effect was always a primacy effect, differences were significant though small; significant more due to sample size [then why mention the results if you know they aren’t important?]
  • order effects occurred more with fully labeled scales, end labeled scales did not see response order effects
  • second study also supported the primacy effect with half of questions showing the effect
  • much stronger response seen with unipolar scales
  • vertical scales are much stronger response as well
  • largest effect seen for horizontal unipolar scale
  • need to run the same tests with grids, don’t know which response is more valid, need to know what they will be and when

Impact of repsonse scale direction on survey repsonses in web and mobile web surveys

  • why does this effect happen?
  • tested agreement scales and frequency scales
  • shorter scale decreases primacy effect
  • scale length has a signifciant moderating effect – strongly effect for 7 point scales compared to 5 point scale
  • labeling has significant moderating effects – stronger effect for fully labeled
  • question location matters – stronger effect on earlier questions
  • labeled behavioural scale shows the largest impact, end labeled attitudinal scale has the smallest effect
  • scale direction affects responses – more endorsement at start of scale
  • 7 point fully labeled frequency scale is most affected
  • we must use shorter scales and end labeling to reuce scale direction effects in web surveys

Importance of scale direction between different modes

  • term used is forward/reverse scale [as opposed to ascending/descending or positive/negative keyed]
  • in the forward version of the scale, the web creates more agreement; but face to face it’s very weak. face to face shows recency effect
  • effect is the same for general scales (all scales are agreement) and item specific scales (each scale reflects the specific question), more cognitive effort in the item specific scale so maybe less effort is invested in the response
  • item specific scale affected more by the web
  • randomizing scale matters more in online surveys

Related Posts


Assessing the quality of survey data (Good session!) #ESRA15 #MRX 

Live blogged from #ESRA15 in Reykjavik. Any error or bad jokes in the notes are my own. As you can see, I managed to find the next building from the six buildings the conference is using. From here on, it’s smooth sailing! Except for the drizzle. Which makes wandering between buildings from session to session a little less fun and a little more like going to a pool. Without the nakedness. 

Session #1 – Data quality in repeated surveys: evidence from a quasi-experimental design by multiple professors from university of Rome

  • respondents can refuse to participate in the study resulting in series of missing data but their study had very little missing data, only about 5% this time [that’s what student respondents does for you, would like to see a study with much larger missing rates]
  • questions had an i do not know option, and there was only one correct answer
  • 19% of gender/birthday/socioeconomic status changed from survey to survey [but we now understand that gender can change, researchers need to be open to this. And of course, economic status can change in a second]
  • Session #2 – me!  Lots of great questions, thank you everyone!

Session #3 – Processing errors in the cross national surveys

  • we don’t consider process errors very often as part of total survey error
  • found 154 processing errors in the series of studies – illegitimate variable values such as education that makes little sense or age over 100, misleading variable values, contradictory values, value discrepancies, lack of value labels, maybe you’re expecting a range but you get a specific value, what if 2 is coded as yes in the software but no in the survey
  • age and education were most problematic, followed by schooling
  • lack of labels was the worst problem, followed by illegitimate values, and misleading values
  • is 22% discrepancies out of all variables checked good or bad?

Session #4 – how does household composition derived from census data describe or misrepresent different family types

  • strength of census data is their exhaustivity, how does census data differ from a smaller survey
  • census counts household members, family survey describes families and explores people outside the household such as living apart, they desribe different universe. a boarder may not be measured in the family survey but yes mentioned in the census survey
  • in 10% of cases, more people are counted in the census, 87% have the same number of people on both surveys
  • census is an accounting tool, not a tool for understanding social life, people do not organize their lives to be measured and captured at one point and one place in time
  • census only has a family with at least one adult and at least one child
  • isolated adult in a household with other people is 5% of adults in the census, not classified the same in both surveys
  • there is a problem attributing children to the right people – problem with single parent families; single adults are often ‘assigned’ a child from the household
  • a household can include one or two families at the most – complicated when adult children are married and maybe have a kid. A child may be assigned to a grandparent which is in err.
  • isolated adults may live with a partner in the dwelling, some live with their parents, some live with a child (but children move from one household to another), 44% of ‘isolated’ adults live with family members, they aren’t isolated at all
  • previously couples had to be heterosexual, even though they survey as a union the rules split them into isolated adults [that’s depressing. thank you for changing this rule.]
  • census is more imperfect than the survey, it doesnt catch subtle transformations in societal life. calls into question definitions of marginal groups
  • also a problem for young adults who leave home but still have strong ties to the parents home – they may claim their own home and their parents may also still claim them as living together
  • [very interesting talk. never really thought about it]

Session #5 – Unexpectedly high number of duplicates in survey data

  • simulated duplicates created greater bias of the regression coefficient when up to 50% of cases were duplicated 2 to 5 times
  • birthday paradox – how many persons are needed in order to find two having an identical birthday – 23. A single duplicate in a dataset is likely.
  • New method – the Hamming diagram – diversity of data for survey – it looks like a normal curve with some outliers so i’m thing Hamming is simply a score like mahalonobis is for outliers
  • found duplicate sin 10% of surveys, 14 surveys comprised 80% of total duplicates with one survey at 33%
  • which case do you delete? which one is right if indeed one is right. always screen your data before starting a substantial analysis.
  • [i’m thinking that ESRA and AAPOR are great places to do your first conference presentation. there are LOTS of newcomers and presentation skills aren’t fabulous. so you won’t feel the same pressure as at other conferences. Of course, you must have really great content because here, content truly is king]
  • [for my first ESRA conference, i’m quite happy with the quality of the content. now let’s hope for a little sun over the lunch hour while I enjoy Skyr, my new favourite food!]

Related Posts

%d bloggers like this: