Representativeness of surveys using internet-based data collection #ESRA15 #MRX 

Live blogged from #ESRA15 in Reykjavik. Any errors or bad jokes are my own.

Yup, it’s sunny outside. And now i’m back inside for the next session. Fortunately, or unfortunately, this session is once again in a below ground room with no windows so I will not be basking in sunlight nor gazing longingly out the window. I guess I’ll be paying full attention to another really great topic.

 

conditional vs unconditional incentives: comparing the effect on sample composition in the recruitment of the german internet panel study GIP

  • unconditional incentives tend to perform better than promised incentives
  • include $5 with advance letter compared to promised $10 with thank you letter; assuming 50% response rate, cost of both groups is the same
  • consider nonresponse bias, consider sample demo distribution
  • unconditional incentive had 51% response rate, conditional incentive had 42% response rate
  • didn’t see a nonresponse bias [by demographics I assume, so many speakers are talking about important effects but not specifically saying what those effects are]
  • as a trend, the two sets of data provide very similar research results, yes differences in means but always fairly close together, confidence intervals always overlap

https://twitter.com/ialstoop/status/622001573481312256

evolution of representativeness in an online probability panel

  • LISS panel – probability panel, includes households without internet accesst, 30 minutes per month, paid for every completed questionnaire
  • is there systematic attrition, are core questionnaires affected by attrition
  • normally sociademographics only which is restrictive
  • missing data imputed using Mice
  • strongest loss in panel of sociodemographic properties
  • there are seasonal drops in attrition, for instance in June which is lots of holidays
  • has more effects for survey attitudes and health traits, less so for political and personality traits which are quite stable even with attrition
  • try to decrease attrition through refreshement based on targets

https://twitter.com/ialstoop/status/622004420314812417

moderators of survey representativeness – a meta analysis

  • measured single mode vs multimode surveys
  • R-indicators – single measure from 0 to 1 for sample representativeness, based on logistic regression models for repsonse propensity
  • hypothesize mixed mode surveys are more representative than single mode surveys
  • hypothesize cross-sectional surveys are more representative than longitudinal survyes
  • heterogeneity not really explained by moderators

setting up a probability based web panel. lessons learned fromt he ELIPSS pilot study

  • online panel in france, 1000 people, monthly questionnaires, internet access given to each member [we often wonder about the effect of people being on panels since they get used to and learn how to answer surveys, have we forgotten this happens in probability panels too? especially when they are often very small panels]
  • used different contact mdoes including letters, phone, face to face
  • underrepresented on youngest, elderly, less educated, offline people
  • reason for participatign in order – trust in ELIPSS 46%, originality of project 37%, interested in research 32%, free internet access 13%
  • 16% attiriont after 30 months (that’s amazing, really low and really good!), response rate generally above 80%
  • automated process – invites on thursday, sustematic reminders, by text message, app message and email
  • individual followups by phone calls and letters [wow. well that’s how they get a high response rate]
  • individual followups are highly effective [i’d call them stalking and invasive but that’s just me. i guess when you accept free 4g internet and a tablet, you are asking for that invasiveness]
  • age becomes less representative over time, employment status changes a lot, education changes the most but of course young people gain more education over time
  • need to give feedback to panel members as they keep asking for it
  • want to broaden use of panel to scientific community by expanding panel to 3500 people

https://twitter.com/nicolasbecuwe/status/622009359082647552

https://twitter.com/ialstoop/status/622011086783557632

the pretest of wave 2 of the german health interview and examination survey for children and adolescents as a mixed mode survey, composition of participant groups

  • mixed mode helps to maintain high response, web is prefered by younger people, representativeness could be increased by using multiple modes
  • compared sequential and simultaneous surveys
  • single mode has highest response rate, mixed mode simultaneous was extremely close behind, mixed mode multi-step had the lowest rate
  • paper always gave back the highest porportion of data even when people had the choice of both, 11% to 43% chose the paper among 3 groups
  • sample composition was the same among all four groups, all confidence intervals overlap – age, gender, nationality, immigration, education
  • metaanalysis – overall trend is the same
  • 4% lower response rate in mixed mode – additional mode creates cognitive burden, creates a break in response process, higher breakoffs
  • mixed mode doesn’t increase sample composition nor response rates [that is, giving people multiple options as opposed to just one option, as opposed to multiple groups whereby each groups only knows about one mode of participation.]
  • current study is now a single mode study

https://twitter.com/oparnet/status/622015032231075840


 

Sample composition in online studies #ESRA15 #MRX 

Live blogged at #ESRA15 in Reykjavik. Any errors or bad jokes are my own.

I’ve been pulling out every ounce of bravery I have here in Iceland and I went to the pool again last night (see prevoius posts on public nakedness!). I could have also broken my rule about not traveling after dark in strange cities but since it never gets dark here, I didn’t have to worry about that! The pool was much busier this time. I guess kiddies are more likely to be out and about after dinner on a weekday rather than sunday morning at 9am.  All it meant is that I had a lot more people watching to do. All in all good fun to see little babies and toddlers enjoying a good splash and float!

This morning, the sun was very much up and the clouds very much gone. I’ll be dreaming of breaktime all morning! Until then however, i’ve got five sessions on sample composition in online surveys, and representativeness of online studies to pay attention to. It’s going to be tough but a morning chock full of learning will get me a reward of more pool time!  what is the gain in a probability based online panel to provide internet access to sampling unites that did not have access before

  • germany has GIP, france has ELPSS, netherlands has LISS as probability panels
  • weighting might not be enough to account for bias of people who do not have internet access
  • but representativeness is still a problem because people may not want to participate even if they are given access, recruitment rates are much lower among non-interenet households
  • probaility panels still have problems, you won’t answer every survey you are sent, attrition
  • do we lose much without a representative panel? is it worth the extra cost
  • in Elipss panel, everyone is provided a tablet, not just people without access. the 3G tablet is the incentive you get to keep as long as you are on the panel. so everyone uses the same device to participate in the research
  • what does it mean to not have Internet access – used to be computer + modem. Now there are internet cafes, free wifi is everywhere. hard to define someone as no internet access now. We mean access to complete a survey so tiny smartphones don’t count.
  • 14.5% of adults in france were classified as not having internet. turned out to be 76 people in the end which is a bit small for analytics purposes. But 31 of them still connected every day.
  • non-internet access people always participated less than people who did have internet.
  • people without internet always differ on demographics [proof is chi-square, can’t see data]
  • populations are closer on nationality, being in a relationship, and education – including non-internet helps with these variables, improves representativity
  • access does not equal usage does not equal using it to answer surveys
  • maybe consider a probability based panel without providing access to people who don’t have computer/tablet/home access

parallel phone and web-based interviews: comparability and validity

  • phones are relied on for research and assumed to be good enough for representativeness, however most people don’t answer phone calls when they don’t recognize the number, cant use autodialler in the USA for research
  • online surveys can generate better quality due to programming validation and ability to only be able to choose allowable answers
  • phone and online have differences in presentation mode, presence of human interviewer, can read and reread responses if you wish, social desirability and self-presentation issues – why should online and offline be the same
  • caution about combining data from different modes should be exercised [actually, i would want to combine everything i possibly can. more people contributing in more modes seems to be more representative than excluding people because they aren’t identical]
  • how different is online nonprobability from telephone probability  [and for me, a true probability panel cannot technically exist. its theoretically possible but practically impossible]
  • harris did many years of these studies side by side using very specific methodologies
  • measured variety of topics – opinions of nurses, bug business trust, happiness with health, ratings of president
  • across all questions, average correlation between methods was .92 for unweighted means and .893 for weighted means – more bias with weighted version
  • is it better for scales with many response categories – corrections go up to .95
  • online means of attitudinal items were on average 0.05 lower on scale from 0 to 1. online was systematically biased lower
  • correlations in many areas were consistently extremey high, means were consistently very slightly lower for online data; also nearly identical rank order of items
  • for political polling, the two methods were again massively similar, highly comparable results; mean values were generally very slightly lower – thought to be ability to see the scale online as well as social desirability in telephone method, positivity bias especially for items that are good/bad as opposed to importance 
  • [wow, given this is a study over ten years of results, it really calls into question whether probability samples are worth the time and effort]
  • [audience member said most differences were due to the presence of the interviewer and nothing to do with the mode, the online version was foudn to be truer]

representative web survey

  • only a sample without bias can generalize, the correct answer should be just as often a little bit higher or a little bit lower than reality
  • in their sample, they underreprested 18-34, elementary school education, lowest and highest income people
  • [yes, there are demographic differences in panels compared to census and that is dependent completely on your recruitment method. the issue is how you deal with those differences]
  • online panel showed a socially positive picture of population
  • can you correct bias through targeted sampling and weighting, ethnicity and employment are still biased but income is better [that’s why invites based on returns not outgo are better]
  • need to select on more than gender, age, and region
  • [i love how some speakers still have non-english sections in their presentation – parts they forgot to translate or that weren’t translatable. now THIS is learning from peers around the world!]

measuring subjective wellbeing: does the use of websurveys bias the results? evidence from the 2013 GEM data from luxembourg

  • almost everyone is completely reachable by internet
  • web surveys are cool – convenient for respondents, less social desirability bias, can use multimedia, less expensive, less coding errors; but there are sampling issues and bias from the mode
  • measures of subjective well being – i am satisfied with my life, i have obtained all the important things i want in my life, the condition of my life are excellent, my life is close to my ideal [all positive keyed]
  • online survey gave very slightly lower satisfaction
  • the results is robuts to three econometric techqnies
  • results from happiness equations using differing modes are compatible
  • web surveys are reliable for collecting information on wellbeing

Assessing and addressing measurement equivalence in cross-cultural surveys #ESRA15 #MRX 

Live blogged from #ESRA15 in Reykjavik. Any errors or bad jokes are my own.

Today’s lunch included vanilla Skyr. Made with actual vanilla beans. Beat that yoghurt of home! Once again, i cannot choose a favourite among coconut, pear, banana, and vanilla other than to say it completely beats yoghurt. I even have a favourite brand although since I don’t have the container in front of me right now, I can’t tell you the brand. It still counts very much as brand loyalty though because I know exactly what the container looks like once I get in the store.

I have to say I remain really impressed with the sessions. They are very detail oriented and most people provide sufficient data for me to judge for myself whether I agree with their conclusions. There’s no grandstanding, essentially no sales pitches, and I am getting take-aways in one form or another from nearly every paper. I’m feeling a lot less presentation pressure here simply because it doesn’t seem competitive. If you’ve never been to an ESRA conference, I highly recommend it. Just be prepared to pack your own lunch every day. And that works just great for me.

cross cultural equivalence of survey response latencies

  • how long does it take for a respondent to provide their answer, easy to capture with computer assisted interviewing, uninfluenced by self reports
  • longer latencies seem to represent more processing time for cognitive operations, also represents presence and accessiility of attitudes and strength of those attitudes
  • longer latencies correlated with age, alcohol use, and poorly designed and ambiguous questions, perhaps there is a relationship with ethnic status
  • does latency differ by race/ethnicity; do they vary by language of interview
  • n=600 laboratory interview, 4 race groups, 300 questions taking 77 minutes all about health, order of sections rotated
  • required interviwer to hit a button when they stopped talking and hit a button when the respondent started talking; also recorded whether there were interruptions in the response process; only looked at perfect responses [which are abnormal, right?]
  • reviewed all types of question – dichotomous, categorical, bipolar scales, etc
  • hispanic, black, korean indeed took longer to answer compared to white people on the english survey in the usa
  • more educated took slightly less time to answer
  • numeric responses took much longer, yes not took the least, unipolar was second least
  • trend was about the same by ethnicity
  • language was an important indicator

comparing survey data quality form native and nonnative english speakers

  • me!
  • conclusion – using all of our standard data quality measures may eliminate people based on their language skills not on their data quality skills. But, certain data quality measures are more likely to predict language rather than data quality. We should focus more on on straightlining and overclicking and ignore underclicking as a major error.
  • ask me for the paper :)

trust in physicians or trust in physician – testing measurement invariance of trust in phsycians in different health care cultures

  • trust reduces social complexity, solves problems of risk, makes interactions possible
  • we lack knowledge of various professions – lawyers, doctors, etc, we don’t understand diagnosis, treatments
  • we must rely on certificates, clothes such as doctors white, location such as a hospital
  • is there generalized trust in doctors
  • different health care systems produce different kinds of trust, ditto cultural contexts, political and values systems
  • compared three countries with health care coverage and similar doctors per person measurements
  • [sorry, didn’t get the main conclusion from the statement “results were significant”]

Advancements of survey design in election polls and surveys #ESRA15 #MRX 

Live blogged from #ESRA15 in Reykjavik. Any errors or bad jokes are my own.

I decided to take the plunge and choose a session in a different building this time. The bravery isn’t much to be noted as I’ve realized that the campus and buildings and rooms at the University of Iceland are far tinier than what I am used to. Where I’d expect neighboring buildings to be a ten minute walk from one end to the other, here it is a 30 second walk. It must be fabulous to attend this university where everything and everyone is so close!

I’m quite loving the facilities. For the most part, the chairs are comfortable. Where it looks like you just have a chair, there is usually a table hiding in the seat in front of you. There is instantly connecting and always on wifi no matter which building you’re in. There are computers in the hallways, and multiple plugs at all the very comfy public seating areas. They make it very easy to be a student here! Perhaps I need another degree?


Designing effective likely voter models in pre-election surveys

  • voter and intention and turnout can be extremely different. 80% say they will vote but 10% to 50% is often the number that actually votes
  • democratic vote share is often over represented [social desirability?]
  • education has a lot of error – 5% error rate, worst demographic variable
  • what voter model reduces these inaccuracies
  • behavioural models (intent do vote, have you voted, dichotomous variables) and resource based models (
  • vote intention does predict turnout – 86% are accurate, also reduces demographic errors
  • there’s not a lot of room to improve except when the polls look really close
  • Gallup tested a two item measure of voting intention – how much have you thought about this election, how likely are you to vote
  • 2 item scale performed far better than the 7 item scale, error rate of 4% vs 1.4%
  • [just shown a histogram with four bars. all four bars look essentially the same. zero attempt to create a non-existent different. THAT’S how you use a chart :) ]
  • gallup approach didnt work well, probability approach performed better
  • best measure of voting intention = Thought about election + likelikhood of voting  + education + voted before + strength of partisan identify

polls on national independence: the scottish case in a comparative perspective

  • [Claire Durand from the University of Montreal speaks now. Go Canada! :) ]
  • what happened in quebec in 1995? referendum on independence
  • quebec and scotland are nationalist in a british type system, proportion of nonnationals is similar
  • referenda are 50% + 1 wins
  • but polls have many errors, is there an ant-incumbent effect
  • “no” is always underestimated – whatever the no is
  • are referenda on national independence different – ethnic divide, feeling of exclusion, emotional debate, ideological divide
  • No side has to bring together enemies and don’t have a unified strategy
  • how do you assign non-disclosure?
  • don’t know doesn’t always mean don’t know
  • don’t distrbute non-disclosures proportionally, they aren’t random
  • asking how people woud vote TODAY resulted in 5 points less nondisclosure
  • corrections need to be applied after the referendum as well
  • people may agree with the general demans of the national parties but not with the solution they propose. maintaining the threat allows them to maintain pressure for change.
  • the quebec newspapers reported the raw data plus the proportional response so people could judge for themself

how good are surveys at measuring past electora behaviour? lessions from an experiment in a french online panel study

  • study bias in individual vote recall
  • sample size of 6000
  • overreporting of popular party, underreporting of less popular party
  • 30% of voter recall was inconsistent
  • inconsistent respondents change their recall, changed parties, memory problems, concealing problems, said they didn’t vote, said you vote and then said you didn’t or vice versa
  • could be any number of interviewer issues
  • older people found it more difficult to remember but perhaps they have more voter loyalty
  • when avalable, use  ]vote reall from preelection survey
  • use vote reall from post election underestimates voter transfers
  • caution in using vote recall to weight samples

methodological issues in measuring vote recall – an analysis of the individual consistency of vote recall in two election longitudinal surveys

  • popularity = weighted average % of electorate represented
  • universality = weighted frequency of representing a majority
  • used four versions of non/weighting including google hits
  • measured 38 questions related to political issues
  • voters are driven by political traditional even if outdated, or by personal images of politicians not based on party manifestors
  • voters are irrational, political landscape has shifted even though people see the parties the same way they were decades ago
  • coalition formation aggravate the situation even more
  • discrepancy between the electorate and the government elected

The impact of questionnaire design on measurements in surveys #4 #ESRA15 #MRX 

Live blogged from #ESRA15 in Reykjavik. Any errors or bad jokes are my own.

Well, last night i managed to stay up until midnight. The lights at the church went on, lighting up the tower and the very top in an unusual way. They were quite pretty! The rest of the town enjoyed mood lighting as it didn’t really get dark at all. Tourists were still wandering in the streets since there’s no point going to bed in a delightful foreign city if you can still see where you’re going. And if you weren’t a fan of the mood lighting, have no fear! The sun ‘rose’ again just four hours later. If you’re scared of the dark, this is a great place to be – in summer!

Today’s program for me includes yet another sessions of question data quality, polling question design, and my second presentation on how non-native English speakers respond to English surveys. We may like to think that everyone answering our surveys is perfectly fluent but let’s be realistic. About 10% of Americans have difficulty reading/writing in English because it is not their native language. Add to that weakly and non-literate people, and there’s potential big trouble at hand.


the impact of answer format and item order on the quality of measurement

  • compared 2 point scale and 11 point scale, different order of questions and question can even be very widely apart, looked at perceived prestige of occupations
  • separated two pages of the surveys with a music game of guessing the artist and song, purely as distraction from the survey. the second page was the same questions in a completely different order, did the same thing numerous times changing the number of reponse options and question orders each time. whole experiment lasted one hour
  • assumed scale was unidimensional
  • no differences comparing 4 point to 9 point scale, none between 2 point and 9 point scale [so STOP USING HUGE SCALES!!!]
  •  prestige does not change depending on order in the survey [but this is to be expected with non-emotional, non-socially desirable items]
  • respondents confessed they tried to answer well but maybe not the best of their ability or maybe their answers would change the next time [glad to see people know their answers aren’t perfect. and i wouldn’t expect anything different. why SHOULD they put 100% effort into a silly task with no legitimate outcome for them.]

measuring attitudes towards immigration with direct questions – can we compare 4 answer categories with dichotomous responses

  • when sensitive questions are asked, social desirability affects response distributions
  • different groups are affected in different ways
  • asked questions about racial immigration – asked binary or as a 4 point scale
  • it’s not always clear that slightly is closer to none or that moderately is closer to strongly. can’t just assume the bottom two boxes are the same or the top two boxes are the same
  • education does have an effect, as well as age in some cases
  • expression of opposition for immigration depends on the response scale
  • binary responses leads to 30 to 50% more “allow none” responses than the 4 point scale
  • responents with lower education have lower probability to choose middle scale point

cross cultural differences in the impact of number of repsonse categories on response behaviour and data structure of a short scale for locus of control

  • locus of control scale, 4 items, 2 internal, 2 external
  • tested 5 point vs 9 point scale
  • do the means differ, does the factor structure differ
  • I’m  own boss; if i work hard, i’ll succeed; when at work or in m private life what I do is mainly determined by others; bad luck often gets in the way of m plans
  • labeled doesn’t apply at all, applies completely
  • didn’t see important demographic differences
  • saw one interaction but it didn’t really make sense [especially given sample size of 250 and lots of other tests happening]
  • [lots of chatter about significance and non-significance but little discussion of what that meant in real words]
  • there was no effect of item order, # of answer options mattered for external locus but not internal locus of control
  • [i’d say hard to draw any conclusions given the tiny number of items, small sample size. desperately needs a lot of replication]

the optimal number of categories in item specific scales

  • type of rating scale where the answer is specific to the scale and doesn’t necessarly apply to every other item – what is your health? excellent, good, poor
  • quality increased with the number of answer options comparing 11,7,5,3 point scales but not comparing 10,6,4 point scales
  • [not sure what quality means in this case, other audience members didn’t know either, lacking clear explanation of operationalization]

The impact of questionnaire design on measurements in surveys #3 #ESRA15 #MRX 

Live blogged from #ESRA15 in Reykjavik. Any errors or bad jokes are my own.

We had 90 minutes for lunch today which is far too long. Poor me.  I had pear skyr today to contrast yesterday’s coconut skyr. I can’t decide which one i like better. Oh, the hard decisions I have to make!  I went for a walk which was great since it drizzled all day yesterday. The downtown is tiny compared to my home so it’s quite fun to walk from one end to the other, including dawdling and eating, in less than half an hour. It’s so tiny that you don’t need a map. Just start walking and take any street that catches your fancy. I dare you to get lost. Or feel like you’re in an unsafe neighbourhood. It’s not possible.

I am in complete awe at the bird life here. There are a number of species i’ve never seen before which on its own is fun. It is also baby season so most of the ducks are paired off and escorting 2 to 8 tiny babies. They are utterly adorable as the babies float so well that they can barely swim underwater to eat. I haven’t seen any puffins along the shore line. I’m still hopeful that a random one will accidentally wander across my path.

By the way, exceptional beards really are a thing here. In case you were curious.

  the Who: experimental evidence on the effect of respondent selection on collecting individual asset ownership information

  • how do you choose who to interview?
  • “Most knowledgeble person”, random selection, the couple together, each individual adult by themself about themself, by themself about other people
  • research done in uganda so certainly not generalizable to north america
  • ask about dwelling, land, livestock, banking, bequeathing, selling, renting, collateral, investments
  • used CAPI, interviews matched on gender, average interview was 30 minutes
  • challenges included hard to find couples together as one person might be working in the field, hard to explain what assets were
  • asking couple together shows differences from ownership incidence but the rest is the same
  • [sorry, couldn’t determine what “signficant positive results” actually meant. would like to know. :( ]

Portuguese national health examination survey: questionnaire development

  • study includes physical measurements and a survey of health status, health behaviours, medication, income, expenses
  • pre-tested the survey for comprehension and complexity
  • found they were asking for things from decades ago and people couldn’t remember (eg when did you last smoke)
  • some mutually exclusive questions actually were not
  • you can’t just ask about ‘activity’ you have to ask about ‘physical activity that makes you sweat’
  • responses cards helped so that people didn’t have to say an embarrassing word
  • had to add instructions that “some questions may not apply to you but answer anyways” because people felt that if you saw them walking you shouldn’t ask whether they can walk
  • gave examples of what sitting on the job, or light activity on the job meant so that desk sitters don’t include walking to the bathroom as activity
  • pretest revealed a number of errors that could be corrected, language and recall problems can be overcome with better questions

an integrated household survey for Wales

  • “no change” is not a realistic option [i wish more people felt that way]
  • duplication among the various surveys, inefficient, survey costs are high
  • opportunity to build more flexibility into a new survey
  • annual sample size of 12000, randomly selected 16+ adults, 45 minutes
  • want to examine effects of offering incentives
  • survey is still in field
  • 40% lower cost compared to previous, significant gains in flexibility

undesired repsonse to sureys, wrong answers or poorly worded question? how respondents insist on reporting their situation despite unclear questionning

  • compared census information with family survey information
  • interested in open text answers
  • census has been completed since 1881
  • belle-mere can mean stepmother and mother in law in french
  • can’t tell if grandchildren in the house belong to which adult child in the house
  • ami can mean friend or boyfriend or partner or spouse, some people will also specify childhood friend or unemployed friend or family friend
  • can’t tell if an unknown location of child means they don’t know the address or the child has died
  • do people with an often changing address live in a camper, or travel for work?
  • if you only provide age in years for babies you won’t know if it’s stillborn or actually 1 year old

ask a positive question and get a positive answer: evidence on acquiesence bias from health care centers in nigeria

  • created two pairs of questions were one was positive and one was negative – avoided the word no [but the extremeness of the questions differed, eg., “Price was reasonable” vs “Price was too expensive” ]
  • some got all positive, all negative, or a random mix
  • pilot test was a disaster, in rural nigeria people weren’t familiar with this type of question
  • instead, started out asking a question about football so people could understand how the question worked. asked agree or disagree, then asked moderately or strongly – two stage likert scale
  • lab fees were reasonable generated very different result than lab fees were unreasonable [so what is reality?]
  • it didn’t matter if negatives were mixed in with positives
  • acquiescence bias affects both positive and negative questions, can’t say if it’s truly satisficing, real answer is probably somewhere in between [makes we wonder, can we develop an equation to tease out truth]
  •  large ceiling effects on default positive framing — clinics are satisfactory despite serious deficiencies
  • can’t increase scores with any intervention but you can easily decrease the scores
  • maybe patient satisfaction is the wrong measure
  • recommend using negative framing to avoid ceiling effects [I wonder if in north america, we’re so good at complaining that this isn’t relevant]

The impact of questionnaire design on measurements in surveys #2 #ESRA15 #MRX  

Live blogged at #ESRA15 in Reykjavik. Any errors or bad jokes are my own.

Breaktime treated us to fruit and croissants this morning. I was hoping for another unique to iceland treat but perhaps that was a sign to stop eating. No, just kidding! Apparently you’re not allowed to bring food or drink into the classrooms. The signs say so. The signs also say no Facebook in the classrooms. Shhhh…. I was on Facebook in the classroom!

The sun is out again and I took a quick walk outside. I am thankful my hotel is at the foot of the famous church. No matter where I am in this city, I can always, easily, and instantly find my hotel. No map needed when the church is several times higher than the next highest building!

I’ve noticed that the questions at this conference are far more nit-picky and critical than I’m used. I suspect that is because the audience includes many academics whose entire job is focused on these topics. They know every minute detail because they’ve done similar studies themselves. It makes for great comments and questions, though it does seem to put the speaker on the spot every time!

smart respondents: let’s keep it short.

  • do we really need scale instructions in the question stem? they add length, mobile screens have limited space, and respondents skip the instructions if the response scale is already labeled [isn’t this just an artifact of old fashioned face to face surveys, telephone surveys]
  • they tested instructions that matched and did not match what was actually in the scale [i can imagine some panelists emailing the company to complain that the survey had errors!]
  • used a probability survey [this is one case where a nonprobability sample would have been well served, easier cheaper to obtain with no need to generalize precisely to a population]
  • answer frequencies looked very similar for correct and incorrect instructions, no significant differences, she’s happy to have nonsignificant results, unaffected by mobile device or age
  • [more regression results shown, once again, speaker did not apologize and the audience did not have a heart attack]
  • it seems like responsents ignore instructions in the question, they reply on the words in the answer options, e.g., grid headers
  • you can omit instructions if the labeling is provided in the answer options
  • works better for experienced survey takers [hm, i doubt that. anyone seeing the answer options will understand. at least, thats my opinion.]

from web to paper: evaluation from data providers and data analysts. The case of annual survey finances of enterprises

  • we send out questionaires, something happens, we get data back – we don’t know what happens :)
  • wanted to keep question codes in the survey which seemed unnecessary to respondents, had really long instructions for some questions that didn’t fit on the page so they put them on a pdf
  • 64% of people evaluted the codes on the online questionnaire positively, 12% rated the codes negatively. people liked that they could communicate with statistics netherlands by using the codes
  • 74% negative responses to explanations of question which were intended to reduce calls from statistics netherlands, only 11% were positive
  • only 25% of people consulted the pdf with instructions
  • most people wanted to received a printed version of the questionnaire they filled out, people really wanted to print it and they screen capped it, people liked being able to return later, they could easily get an english version
  • data editors liked that they didn’t have to do data entry but now they needed more time to read and understand what was being said
  • they liked having the email address because they got more direct and precise answers, responses came back faster, they didn’t notice any changes in the time series data

is variation in perception of inequality and redistribution of earnings actual or artifactual. effects of wording, order, and number of items

  • opinions differ when you ask how much should people make vs how much should the top quintile of peopl emake
  • they asked people how much a number of occupations should earn, they also varied how specific the title was e.g., teacher vs math teacher in a public highschool
  • estimates for specific descriptions were higher, high status jobs got much higher estimates
  • adding more occupations to the list makes reliability in earnings decrease

exploring a new way to avoid errors in attitude measurements due to complexity of scientific terms: an example with the term biodiversity

  • how do people talk about complicated terms, their own words often differ from scientific definitions
  • “what comes to mind when you think of biodiversity?” – used text analysis for word frequencies, co-occurences, correspondence analysis, used the results to design items for the second study
  • found five classes of items – standard common definition, associated with human actions to protect it, human envionment relationship, global actions and consequences, scientific definition
  • turned each of the five types of defiintions into a common word definition
  • people gave more positive opinions about biodiversity when they were asked immediately after the definition
  • items based on representations of biodiversity were valid and reliable
  • [quite like this methodology, could be really useful in politics]

[if any of these papers interest you, i recomend finding the author on the ESRA program and asking for an official summary. Global speakers and weak microphones makes note taking more challenging. :) ]



The impact of questionnaire design on measurements in surveys #1 #ESRA15  #MRX  

Live blogged from #ESRA15 in Reykjavik. Any errors or bad jokes are my own.

I tried to stay up until midnight last night but ended going to bed around 10:30pm. Naturally, it was still daylight outside. I woke up this morning at 6am in broad daylight again. I’m pretty sure it never gets dark here no matter what they say. I began my morning routine as usual. Banged my head on the slanted ceiling, stared out the window at the amazing church, made myself waffles in the kitchen, and then walked past the pond teaming with baby ducks. Does it get any better? I think no. Except of course knowing i had another day of great content rich sessions ahead of me!


designs and developments of the income measures in the european social surveys

  • tested different income questions. allowed people to use a weekly, monthly, or annual income scale as they wished. there was also no example response, and no example of what constitutes income. Provided about 30 answer options to choose from, shown in three columns. Provided same result as a very specific question in some countries but not others.
  • also tested every country getting the same number breaks, groups weren’t arranged to reflect each countries distribution. this resulted in some empty breaks [but that’s not necessarily a problem if the other breaks are all well and evenly used]
  • when countries are asked to set up number breaks in well defined deciles, high incomes are chosen more often – affected because people had different ideas of what is and isn’t taxable income
  • [apologies for incomplete notes, i couldn’t quite catch all the details, we did get a “buy the book” comment.]

item non-response and readability of survey questionnaire

  • any non-substantive outcome – missing values, refusals, don’t knows all count
  • non response can lower validity of survey results
  • semantic complexity measured by familiarity of words, length of words, abstract words that can’t be visualized, structural complexity
  • Measured – characters in an item, length of words, percent of abstract words, percent of lesser known words, percent of long words 12 or more characters
  • used the european social survey which is a highly standardized international survey, compared english and estonian, it is conducted face to face, 350 questions, 2422 uk respondents
  • less known and abstract words create more non-response
  • long words increase nonresponse in estonian but not in english, perhaps because english words are shorter anyways
  • percent of long words in english created more nonresponse
  • total length of an item didn’t affect nonresponse
  • [they used a list of uncommon words for measurement, such a book/list does exist in english. I used it in school to choose a list of swear words that had the same frequency levels as regular words.]
  • [audience comment – some languages join many words together which means their words are longer but then there are fewer words, makes comparisons more difficult]

helping respondents provide good answers in web surveys

  • some tasks are inherently difficult in surveys, often because people have to write in an answer, coding is expensive and error prone
  • this study focused on prescription drugs which are difficult to spell, many variations of the same thing, level of detail is unclear, but we have full lists of all these drugs available to us
  • tested text box, drop box to select from list, javascript (type ahead look up)
  • examined breakoff rates, missing data, response times, and codability of responses
  • asked people if they are taking drugs, tell us about three
  • study 1 – breakoffs higher from dropbox and javascript; median response times longer, but codability was better. LIsts didn’t work well at all.
  • study 2 – cleaned up the list, made all the capitalization the same. break off rates were now all the same. response times lower but still higher than the textbox version. codability still better for list versions.
  • study 3 – if they couldn’t find a drug in the list, they were allowed to type it out. unlike previous studies which proceeded with the missing data. dropbox had highest missing data. javascript had lowest missing data. median times highest for drop box. trends for more and more drugs as expected, effect is more but not as much more.
  • older browswers had trouble with dropdowns and javascript and had to be routed to the textbox options
  • if goal is to get codable answers, use a text box. if goal is to create skip patterns then javascript is the way to go.

rating scale labelling in web surveys – are numeric labels an advantage

  • you can use all words to label scales or just words on the end with numbers in between
  • research says there is less satisficing with verbal scales, they are more natural than numbers and there is no inherent meaning of numbers
  • means of the scales were different
  • less tie to completes the end labeled groups
  • people paid more attention to the five point labeled scale, and least to the end point labeled score
  • mean opinions did differ by scale, more positive on fully labeled scale
  • high cognitive burden to map responses of the numeric scales
  • lower reliability for the numeric labels

Surveying sensitive issues – challenges and solutions #ESRA15 #MRX  

Live blogged at #ESRA15 in Reykjavik. Any errors or bad jokes are my own. Break time brought some delightful donuts. I personally only ate one however on behalf of my friend, Seda, I ate several more just for her. By the way, since donuts are in each area, you can just breeze from one area to the next grabbing another donut each time. Just saying…

surveying sensitive questions – prevalence estimates of self-reported delinquency using the crosswise model

  • crime rates differ by country but rates of individuals reporting their own criminal behaviour shows opposite expectations. Thus countries with high rates have lower rates of self-report. Social desirability seems to be the case. Is this true?
  • Need to add random noise to the model so the respondent can hide themself. Needs no randomization device.
  • ask a non-sensiive quesiton and a sensitive question and asked to answer both the same way. Let the respondent indicate whether the answer to both are the same or different. You only need to know the answer of the first question (e.g., is your moms birthday in january? well 1/12 are in january).
  • crosswise model generates vastly high self-criminal rates in countries where you’d expect.
  • also asked people in the survey whether they answered carefully – 15% admitted they did not
  • crosswise results in mugh higher prevalence rates, causal models of deliquent behaviour could be very different
  • satisficing respondents gives less bias than expected
  • estimates of the crosswise model are conservative

pouring water into the wine – the advantages of teh crosswise model asking sensitive questions revisited

  • its easier to implement in self-administtered surveys, no extra randomization device necessary, cognitive burden is lower, no self-protection answering strategies
  • blood donation rates – direction question says 12% but crosswise says 18%
  • crosswise model had a much higher answering time, even after dropping extraordinarily slow people
  • model has some weakneses, the less the better approach is good to determine if the crosswise model works
  • do people understand the instructions and do they specifically follow those instructions

effects of survey sponsorship and mode of administration on respondents answers about their racial attitudes

  • used a number of prejudice scales both blatant and subtle
  • no difference in racial measures on condition of interviewer administration
  • blatant prejudice scale showed a significant interaction for type of sponsor
  • matters more when there is an interviewer and therefore insufficient privacy
  • sponsor effect is likely the result of social desirability
  • response bias is in opposite direction for academic and market research groups
  • does it depend which department does the study – law department, sociology department

impact of survey mode (mail vs telephone) and asking about future intentions 

  • evidence suggests that asking about intent to get screened before asking about screening may minimize over reporting of cancer screening. removes the social pressure to over report.
  • people report behaviors more truthfully in self-administrered forms than interviews
  • purchased real estate on an omnibus survey
  • no main effect for mode
  • in mail mode, asking about intent first was more reflective of reality of screening rates
  • 30% false positive said they had a test but it wasn’t in their medical record
  • little evidence that the intention item affected screening accuracy
  • mailed surveys may positively affected accuracy – but mail survey was one topic whereas the telephone was omnibus

effect of socio-demographic (mis)match between interviewers and respondents on the data quality of answers to sensitive questions

  • theory of liking, some say matching improves chances of participation, may also improve disclosure and reporting, especially gender matching
  • current matched within about five years of age as opposed to arbitrary cut-off points
  • also matched on education
  • male interviewer to female interviewee had lowest response rate
  • older interviewer had lower response rate
  • no effects for education
  • income had the most missing data, parent’s education was next highest missing data likely because education from 50 years ago was different and you’d have to translate, political party had high missing rate
  • if female subject refuses a male interviewer, send a female to try to convince them
  • it’s easier to refuse a person who is the same age as you [maybe it’s a feeling of superiority/inferiority – you’re no better than me, i don’t have to answer to you]
  • men together generate the least item non-response
  • women together might get too comfortable together, too chatty, more non-response, role-boundary issue
  • age matching is less item non-response
  • same education is less item non-response, why do interviewers allow more item non-response when theirrespondent  has a lower education


Related Posts


Direction of response scales #ESRA15 #MRX 

Live blogged at #ESRA15 in Reykjavik. Any errors or bad jokes in the notes are my own.

I discovered that all the buildings are linked indoors. Let it rain, let it rain, i don’t care how much it rains….  [Feel free to sing that as loud as you can.] Lunch was Skyr, oat cookies and some weird beet drink. Yup. I packed it myself. I always try to like yogurt and never really do. Skyr works for me. So far, coconut is my favourite. I’ve forgotten to take pictures of speakers today so let’s see if I can keep the trend going! Lots of folks in this session so @MelCourtright and I are not the only scale geeks out there . :)

Response scales: Effects of scale length and direction on reported political attitudes

  • instruments are not neutral, they are a form of communication
  • cross national projects use different scales for the same question so how do you compare the reuslts
  • trust in parliament is a fairly standard question for researchers and so makes a good example
  • 4 point scale is most popular but it is used up to 11 points, traditional format is very positive to very negative
  • included a don’t know in the answer options
  • transformed all scales into a 0 to 1 scale and evenly distributed all scores in between
  • means highest with 7 point scale traditional direction and lowest with 4 point and 11 point traditional direction
  • reverse direction had much fewer mean differences, essentially all the same
  • four point scales show differences in direction, 7 and 11 point show fewer differences in direction
  • [regression results shown on the screen – no one fainted or died, the speaker did not apologize or say she didn’t understand them. interesting difference compared to MRX events.]

Does satisficing drive scale direction effects

  • research shows answers shift towards the start fo the scale but this is not consistent
  • achoring and adjustment effects whereby people use the first answer option as the anchor, interpretative heuristics suggest people choose an early response to express their agreement with the questions, primacy effects due to satisficing decreases cognitive load
  • scores were more positive when the scale started positive, differences were huge across all the brands
  • the pattern is the same but the differences are noticeable
  • speeding measured as 300 milliseconds per word
  • speeders more likely to choose early answer option
  • answers are pushed to the start of the scale, limited evidnce that it is caused by satisficing

Ordering your attention: response order effects in web-based surveys

  • primacy happens more often visually and recency more often orally
  • scales have an inherence order. if you know the first answer option, you know the remainder of the options
  • sample size over 100 000, random assigned to scale order, also tested labeling, orientation, and number of response categories from 2 to 11
  • the order effect was always a primacy effect, differences were significant though small; significant more due to sample size [then why mention the results if you know they aren’t important?]
  • order effects occurred more with fully labeled scales, end labeled scales did not see response order effects
  • second study also supported the primacy effect with half of questions showing the effect
  • much stronger response seen with unipolar scales
  • vertical scales are much stronger response as well
  • largest effect seen for horizontal unipolar scale
  • need to run the same tests with grids, don’t know which response is more valid, need to know what they will be and when

Impact of repsonse scale direction on survey repsonses in web and mobile web surveys

  • why does this effect happen?
  • tested agreement scales and frequency scales
  • shorter scale decreases primacy effect
  • scale length has a signifciant moderating effect – strongly effect for 7 point scales compared to 5 point scale
  • labeling has significant moderating effects – stronger effect for fully labeled
  • question location matters – stronger effect on earlier questions
  • labeled behavioural scale shows the largest impact, end labeled attitudinal scale has the smallest effect
  • scale direction affects responses – more endorsement at start of scale
  • 7 point fully labeled frequency scale is most affected
  • we must use shorter scales and end labeling to reuce scale direction effects in web surveys

Importance of scale direction between different modes

  • term used is forward/reverse scale [as opposed to ascending/descending or positive/negative keyed]
  • in the forward version of the scale, the web creates more agreement; but face to face it’s very weak. face to face shows recency effect
  • effect is the same for general scales (all scales are agreement) and item specific scales (each scale reflects the specific question), more cognitive effort in the item specific scale so maybe less effort is invested in the response
  • item specific scale affected more by the web
  • randomizing scale matters more in online surveys


Related Posts

 

Enter your email address to subscribe

Join 11,036 other followers

LoveStats on Twitter

All Top

Featured in Alltop
Follow

Get every new post delivered to your Inbox.

Join 11,036 other followers

%d bloggers like this: