Tag Archives: data quality

Mobile devices and modular survey design by Paul Johnson #PAPOR #MRX 

Live blogged at the #PAPOR conference in San Francisco. Any errors or bad jokes are my own.

  • now we can sample by individuals, phone numbers, location, transaction
  • can reach by an application, eail, text, IVR but make sure you have permission for the method you use (TCPA)
  • 55+ prefer to dial an 800 number for a survey, young perfer prefer an SMS contact method; important to provide as many methods as possible so people can choose the method they prefer
  • mobile devices give you lots of extra data – purchase history, health information, social network information, passive listening – make sure you have permission to collect the information you need; give something back in terms of sharing results or hiding commercials
  • Over 25% of your sample is already taking surveys on a mobile device, you should check what device people are using, skip questions that wont render well on small screens
  • remove unnecessary graphics, background templates are not helpful
  • keep surveys under 20 minutes [i always advise 10 minutes]
  • use large buttons, minimal scrolling; never scroll left/right
  • avoid using radio buttons, aim for large buttons intead
  • for openends, put a large box to encourage people to us a lot of words
  • mobile open ends have just as much content although there may be fewer words, more acronyms, more profanity
  • be sure to use a back button if you use auto-next
  • if you include flash or images be sure to ask whether people saw the image
  • consider modularizing your surveys, ensure one module has all the important variables, give everyone a random module, let people answer more modules if they wish
  • How to fill in missing data  – data imputation or respondent matching [both are artificial data remember! you don’t have a sense of truth. you’re inferring answers to infer results.   Why are we SOOOOO against missing data?]
  • most people will actually finish all the modules if you ask politely
  • you will find differences between modular and not but the end conclusions are the same [seriously, in what world do two sets of surveys ever give the same result? why should this be different?]
Advertisements

It’s a dog eat DIY world at the #AMSRS 2015 National Conference

  What started out as a summary of the conference turned into an entirely different post – DIY surveys. You’ll just have to wait for my summary then!

My understanding is that this was the first time SurveyMonkey spoke at an #AMSRS conference. It resulted in what seemed to be perceived by the audience as a controversial question and it was asked in an antagonistic way – what does SurveyMonkey intend to do about the quality of surveys prepared by nonprofessionals. This is a question with a multi-faceted answer.

First of all, let me begin by reminding everyone that out of all the surveys prepared by professional, fully-trained survey researchers, most of those surveys incorporate at least a couple of bad questions. Positively keyed grids abound, long grids abound, poorly worded and leading questions abound, overly lengthly surveys abound. For all of our concerns about amateurs writing surveys, I sometimes feel as though the pot is calling the kettle black.

But really, this isn’t a SurveyMonkey question at all. This is a DIY question. And it isn’t a controversial question at all. The DIY issue has been raised for a few years at North American conferences. It’s an issue with which every industry must deal. Taxis are dealing with Uber. Hotels are dealing with AirBnB. Electricians, painters, and lawn care services in my neighbourhood are dealing with me. Naturally, my electrical and painting work isn’t up to snuff with the professionals and I’m okay with that. But my lawn care services go above and beyond what the professionals can do. I am better than the so-called experts in this area. Basically, I am the master of my own domain – I decide for myself who will do the jobs I need doing. I won’t tell you who will do the jobs at your home and you won’t tell me who will do my jobs. Let me reassure you, I don’t plan to do any home surgery.

You can look at this from another point of view as well. If the electricians and painters did their job extremely well, extremely conveniently, and at a fair price, I would most certainly hire the pros. And the same goes for survey companies. If we worked within our potential clients’ schedules, with excellent quality, with excellent outcomes, and with excellent prices, potential clients who didn’t have solid research skills wouldn’t bother to do the research themselves. We, survey researchers, have created an environment where potential clients do not see the value in what we do. Perhaps we’ve let them down in the past, perhaps our colleagues have let them down in the past. 

And of course, there’s another aspect to the DIY industry. For every client who does their own research work, no matter how skilled and experienced they are, that’s one less job you will get hired to do. I often wonder how much concern over DIY is simply the fear of lost business. In this sense, I see it as a re-organization of jobs. If research companies lose jobs to companies using DIY, then those DIY company will need to hire more researchers. The jobs are still there, they’re just in different places. 

But to get back to the heart of the question, what should DIY companies do to protect the quality of the work, to protect their industry, when do-it-yourselfers insist on DIY? Well, DIY companies can offer help in many forms. Webinars, blog posts, and white papers are great ways to share knowledge about survey writing and analysis. Question and survey templates make it really easy for newbies to write better surveys. And why not offer personalized survey advice from a professional. There are many things that DIY companies can do and already do.

Better yet, what should non-DIY companies do? A better job, that’s what. Write awesome surveys, not satisfactory surveys. Write awesome reports, not sufficient reports. Give awesome presentations, not acceptable presentations. Be prompt, quick, and flexible, and don’t drag clients from person to person over days and weeks. When potential clients see the value that professional services provide, DIY won’t even come to mind.

And of course, what should research associations do? Advocate for the industry. Show Joe nonresearcher what they miss out on by not hiring a professional. Create guidelines and standards to which DIY companies can aspire and prove themselves. 

It’s a DIY world out there. Get on board or be very, very worried.

Sample composition in online studies #ESRA15 #MRX 

Live blogged at #ESRA15 in Reykjavik. Any errors or bad jokes are my own.

I’ve been pulling out every ounce of bravery I have here in Iceland and I went to the pool again last night (see prevoius posts on public nakedness!). I could have also broken my rule about not traveling after dark in strange cities but since it never gets dark here, I didn’t have to worry about that! The pool was much busier this time. I guess kiddies are more likely to be out and about after dinner on a weekday rather than sunday morning at 9am.  All it meant is that I had a lot more people watching to do. All in all good fun to see little babies and toddlers enjoying a good splash and float!

This morning, the sun was very much up and the clouds very much gone. I’ll be dreaming of breaktime all morning! Until then however, i’ve got five sessions on sample composition in online surveys, and representativeness of online studies to pay attention to. It’s going to be tough but a morning chock full of learning will get me a reward of more pool time!  what is the gain in a probability based online panel to provide internet access to sampling unites that did not have access before

  • germany has GIP, france has ELPSS, netherlands has LISS as probability panels
  • weighting might not be enough to account for bias of people who do not have internet access
  • but representativeness is still a problem because people may not want to participate even if they are given access, recruitment rates are much lower among non-interenet households
  • probaility panels still have problems, you won’t answer every survey you are sent, attrition
  • do we lose much without a representative panel? is it worth the extra cost
  • in Elipss panel, everyone is provided a tablet, not just people without access. the 3G tablet is the incentive you get to keep as long as you are on the panel. so everyone uses the same device to participate in the research
  • what does it mean to not have Internet access – used to be computer + modem. Now there are internet cafes, free wifi is everywhere. hard to define someone as no internet access now. We mean access to complete a survey so tiny smartphones don’t count.
  • 14.5% of adults in france were classified as not having internet. turned out to be 76 people in the end which is a bit small for analytics purposes. But 31 of them still connected every day.
  • non-internet access people always participated less than people who did have internet.
  • people without internet always differ on demographics [proof is chi-square, can’t see data]
  • populations are closer on nationality, being in a relationship, and education – including non-internet helps with these variables, improves representativity
  • access does not equal usage does not equal using it to answer surveys
  • maybe consider a probability based panel without providing access to people who don’t have computer/tablet/home access

parallel phone and web-based interviews: comparability and validity

  • phones are relied on for research and assumed to be good enough for representativeness, however most people don’t answer phone calls when they don’t recognize the number, cant use autodialler in the USA for research
  • online surveys can generate better quality due to programming validation and ability to only be able to choose allowable answers
  • phone and online have differences in presentation mode, presence of human interviewer, can read and reread responses if you wish, social desirability and self-presentation issues – why should online and offline be the same
  • caution about combining data from different modes should be exercised [actually, i would want to combine everything i possibly can. more people contributing in more modes seems to be more representative than excluding people because they aren’t identical]
  • how different is online nonprobability from telephone probability  [and for me, a true probability panel cannot technically exist. its theoretically possible but practically impossible]
  • harris did many years of these studies side by side using very specific methodologies
  • measured variety of topics – opinions of nurses, bug business trust, happiness with health, ratings of president
  • across all questions, average correlation between methods was .92 for unweighted means and .893 for weighted means – more bias with weighted version
  • is it better for scales with many response categories – corrections go up to .95
  • online means of attitudinal items were on average 0.05 lower on scale from 0 to 1. online was systematically biased lower
  • correlations in many areas were consistently extremey high, means were consistently very slightly lower for online data; also nearly identical rank order of items
  • for political polling, the two methods were again massively similar, highly comparable results; mean values were generally very slightly lower – thought to be ability to see the scale online as well as social desirability in telephone method, positivity bias especially for items that are good/bad as opposed to importance 
  • [wow, given this is a study over ten years of results, it really calls into question whether probability samples are worth the time and effort]
  • [audience member said most differences were due to the presence of the interviewer and nothing to do with the mode, the online version was foudn to be truer]

representative web survey

  • only a sample without bias can generalize, the correct answer should be just as often a little bit higher or a little bit lower than reality
  • in their sample, they underreprested 18-34, elementary school education, lowest and highest income people
  • [yes, there are demographic differences in panels compared to census and that is dependent completely on your recruitment method. the issue is how you deal with those differences]
  • online panel showed a socially positive picture of population
  • can you correct bias through targeted sampling and weighting, ethnicity and employment are still biased but income is better [that’s why invites based on returns not outgo are better]
  • need to select on more than gender, age, and region
  • [i love how some speakers still have non-english sections in their presentation – parts they forgot to translate or that weren’t translatable. now THIS is learning from peers around the world!]

measuring subjective wellbeing: does the use of websurveys bias the results? evidence from the 2013 GEM data from luxembourg

  • almost everyone is completely reachable by internet
  • web surveys are cool – convenient for respondents, less social desirability bias, can use multimedia, less expensive, less coding errors; but there are sampling issues and bias from the mode
  • measures of subjective well being – i am satisfied with my life, i have obtained all the important things i want in my life, the condition of my life are excellent, my life is close to my ideal [all positive keyed]
  • online survey gave very slightly lower satisfaction
  • the results is robuts to three econometric techqnies
  • results from happiness equations using differing modes are compatible
  • web surveys are reliable for collecting information on wellbeing

Assessing and addressing measurement equivalence in cross-cultural surveys #ESRA15 #MRX 

Live blogged from #ESRA15 in Reykjavik. Any errors or bad jokes are my own.

Today’s lunch included vanilla Skyr. Made with actual vanilla beans. Beat that yoghurt of home! Once again, i cannot choose a favourite among coconut, pear, banana, and vanilla other than to say it completely beats yoghurt. I even have a favourite brand although since I don’t have the container in front of me right now, I can’t tell you the brand. It still counts very much as brand loyalty though because I know exactly what the container looks like once I get in the store.

I have to say I remain really impressed with the sessions. They are very detail oriented and most people provide sufficient data for me to judge for myself whether I agree with their conclusions. There’s no grandstanding, essentially no sales pitches, and I am getting take-aways in one form or another from nearly every paper. I’m feeling a lot less presentation pressure here simply because it doesn’t seem competitive. If you’ve never been to an ESRA conference, I highly recommend it. Just be prepared to pack your own lunch every day. And that works just great for me.

cross cultural equivalence of survey response latencies

  • how long does it take for a respondent to provide their answer, easy to capture with computer assisted interviewing, uninfluenced by self reports
  • longer latencies seem to represent more processing time for cognitive operations, also represents presence and accessibility of attitudes and strength of those attitudes
  • longer latencies correlated with age, alcohol use, and poorly designed and ambiguous questions, perhaps there is a relationship with ethnic status
  • does latency differ by race/ethnicity; do they vary by language of interview
  • n=600 laboratory interview, 4 race groups, 300 questions taking 77 minutes all about health, order of sections rotated
  • required interviewer to hit a button when they stopped talking and hit a button when the respondent started talking; also recorded whether there were interruptions in the response process; only looked at perfect responses [which are abnormal, right?]
  • reviewed all types of question – dichotomous, categorical, bipolar scales, etc
  • Hispanic, black, Korean indeed took longer to answer compared to white people on the English survey in the USA
  • more educated took slightly less time to answer
  • numeric responses took much longer, yes not took the least, uni-polar was second least
  • trend was about the same by ethnicity
  • language was an important indicator

comparing survey data quality form native and nonnative English speakers

  • me!
  • conclusion – using all of our standard data quality measures may eliminate people based on their language skills not on their data quality skills. But, certain data quality measures are more likely to predict language rather than data quality. We should focus more on on straightlining and overclicking and ignore underclicking as a major error.
  • ask me for the paper 🙂

trust in physicians or trust in physician – testing measurement invariance of trust in physicians in different health care cultures

  • trust reduces social complexity, solves problems of risk, makes interactions possible
  • we lack knowledge of various professions – lawyers, doctors, etc, we don’t understand diagnosis, treatments
  • we must rely on certificates, clothes such as doctors white, location such as a hospital
  • is there generalized trust in doctors
  • different health care systems produce different kinds of trust, ditto cultural contexts, political and values systems
  • compared three countries with health care coverage and similar doctors per person measurements
  • [sorry, didn’t get the main conclusion from the statement “results were significant”]

The impact of questionnaire design on measurements in surveys #4 #ESRA15 #MRX 

Live blogged from #ESRA15 in Reykjavik. Any errors or bad jokes are my own.

Well, last night i managed to stay up until midnight. The lights at the church went on, lighting up the tower and the very top in an unusual way. They were quite pretty! The rest of the town enjoyed mood lighting as it didn’t really get dark at all. Tourists were still wandering in the streets since there’s no point going to bed in a delightful foreign city if you can still see where you’re going. And if you weren’t a fan of the mood lighting, have no fear! The sun ‘rose’ again just four hours later. If you’re scared of the dark, this is a great place to be – in summer!

Today’s program for me includes yet another sessions of question data quality, polling question design, and my second presentation on how non-native English speakers respond to English surveys. We may like to think that everyone answering our surveys is perfectly fluent but let’s be realistic. About 10% of Americans have difficulty reading/writing in English because it is not their native language. Add to that weakly and non-literate people, and there’s potential big trouble at hand.


the impact of answer format and item order on the quality of measurement

  • compared 2 point scale and 11 point scale, different order of questions and question can even be very widely apart, looked at perceived prestige of occupations
  • separated two pages of the surveys with a music game of guessing the artist and song, purely as distraction from the survey. the second page was the same questions in a completely different order, did the same thing numerous times changing the number of response options and question orders each time. whole experiment lasted one hour
  • assumed scale was uni-dimensional
  • no differences comparing 4 point to 9 point scale, none between 2 point and 9 point scale [so STOP USING HUGE SCALES!!!]
  •  prestige does not change depending on order in the survey [but this is to be expected with non-emotional, non-socially desirable items]
  • respondents confessed they tried to answer well but maybe not the best of their ability or maybe their answers would change the next time [glad to see people know their answers aren’t perfect. and i wouldn’t expect anything different. why SHOULD they put 100% effort into a silly task with no legitimate outcome for them.]

measuring attitudes towards immigration with direct questions – can we compare 4 answer categories with dichotomous responses

  • when sensitive questions are asked, social desirability affects response distributions
  • different groups are affected in different ways
  • asked questions about racial immigration – asked binary or as a 4 point scale
  • it’s not always clear that slightly is closer to none or that moderately is closer to strongly. can’t just assume the bottom two boxes are the same or the top two boxes are the same
  • education does have an effect, as well as age in some cases
  • expression of opposition for immigration depends on the response scale
  • binary responses leads to 30 to 50% more “allow none” responses than the 4 point scale
  • respondents with lower education have lower probability to choose middle scale point

cross cultural differences in the impact of number of response categories on response behaviour and data structure of a short scale for locus of control

  • locus of control scale, 4 items, 2 internal, 2 external
  • tested 5 point vs 9 point scale
  • do the means differ, does the factor structure differ
  • I’m  own boss; if i work hard, i’ll succeed; when at work or in m private life what I do is mainly determined by others; bad luck often gets in the way of m plans
  • labeled doesn’t apply at all, applies completely
  • didn’t see important demographic differences
  • saw one interaction but it didn’t really make sense [especially given sample size of 250 and lots of other tests happening]
  • [lots of chatter about significance and non-significance but little discussion of what that meant in real words]
  • there was no effect of item order, # of answer options mattered for external locus but not internal locus of control
  • [i’d say hard to draw any conclusions given the tiny number of items, small sample size. desperately needs a lot of replication]

the optimal number of categories in item specific scales

  • type of rating scale where the answer is specific to the scale and doesn’t necessarily apply to every other item – what is your health? excellent, good, poor
  • quality increased with the number of answer options comparing 11,7,5,3 point scales but not comparing 10,6,4 point scales
  • [not sure what quality means in this case, other audience members didn’t know either, lacking clear explanation of operationalization]

The impact of questionnaire design on measurements in surveys #3 #ESRA15 #MRX 

Live blogged from #ESRA15 in Reykjavik. Any errors or bad jokes are my own.

We had 90 minutes for lunch today which is far too long. Poor me.  I had pear skyr today to contrast yesterday’s coconut skyr. I can’t decide which one i like better. Oh, the hard decisions I have to make!  I went for a walk which was great since it drizzled all day yesterday. The downtown is tiny compared to my home so it’s quite fun to walk from one end to the other, including dawdling and eating, in less than half an hour. It’s so tiny that you don’t need a map. Just start walking and take any street that catches your fancy. I dare you to get lost. Or feel like you’re in an unsafe neighbourhood. It’s not possible.

I am in complete awe at the bird life here. There are a number of species i’ve never seen before which on its own is fun. It is also baby season so most of the ducks are paired off and escorting 2 to 8 tiny babies. They are utterly adorable as the babies float so well that they can barely swim underwater to eat. I haven’t seen any puffins along the shore line. I’m still hopeful that a random one will accidentally wander across my path.

By the way, exceptional beards really are a thing here. In case you were curious.

  the Who: experimental evidence on the effect of respondent selection on collecting individual asset ownership information

  • how do you choose who to interview?
  • “Most knowledgeble person”, random selection, the couple together, each individual adult by themself about themself, by themself about other people
  • research done in uganda so certainly not generalizable to north america
  • ask about dwelling, land, livestock, banking, bequeathing, selling, renting, collateral, investments
  • used CAPI, interviews matched on gender, average interview was 30 minutes
  • challenges included hard to find couples together as one person might be working in the field, hard to explain what assets were
  • asking couple together shows differences from ownership incidence but the rest is the same
  • [sorry, couldn’t determine what “signficant positive results” actually meant. would like to know. 😦 ]

Portuguese national health examination survey: questionnaire development

  • study includes physical measurements and a survey of health status, health behaviours, medication, income, expenses
  • pre-tested the survey for comprehension and complexity
  • found they were asking for things from decades ago and people couldn’t remember (eg when did you last smoke)
  • some mutually exclusive questions actually were not
  • you can’t just ask about ‘activity’ you have to ask about ‘physical activity that makes you sweat’
  • responses cards helped so that people didn’t have to say an embarrassing word
  • had to add instructions that “some questions may not apply to you but answer anyways” because people felt that if you saw them walking you shouldn’t ask whether they can walk
  • gave examples of what sitting on the job, or light activity on the job meant so that desk sitters don’t include walking to the bathroom as activity
  • pretest revealed a number of errors that could be corrected, language and recall problems can be overcome with better questions

an integrated household survey for Wales

  • “no change” is not a realistic option [i wish more people felt that way]
  • duplication among the various surveys, inefficient, survey costs are high
  • opportunity to build more flexibility into a new survey
  • annual sample size of 12000, randomly selected 16+ adults, 45 minutes
  • want to examine effects of offering incentives
  • survey is still in field
  • 40% lower cost compared to previous, significant gains in flexibility

undesired repsonse to sureys, wrong answers or poorly worded question? how respondents insist on reporting their situation despite unclear questionning

  • compared census information with family survey information
  • interested in open text answers
  • census has been completed since 1881
  • belle-mere can mean stepmother and mother in law in french
  • can’t tell if grandchildren in the house belong to which adult child in the house
  • ami can mean friend or boyfriend or partner or spouse, some people will also specify childhood friend or unemployed friend or family friend
  • can’t tell if an unknown location of child means they don’t know the address or the child has died
  • do people with an often changing address live in a camper, or travel for work?
  • if you only provide age in years for babies you won’t know if it’s stillborn or actually 1 year old

ask a positive question and get a positive answer: evidence on acquiesence bias from health care centers in nigeria

  • created two pairs of questions were one was positive and one was negative – avoided the word no [but the extremeness of the questions differed, eg., “Price was reasonable” vs “Price was too expensive” ]
  • some got all positive, all negative, or a random mix
  • pilot test was a disaster, in rural nigeria people weren’t familiar with this type of question
  • instead, started out asking a question about football so people could understand how the question worked. asked agree or disagree, then asked moderately or strongly – two stage likert scale
  • lab fees were reasonable generated very different result than lab fees were unreasonable [so what is reality?]
  • it didn’t matter if negatives were mixed in with positives
  • acquiescence bias affects both positive and negative questions, can’t say if it’s truly satisficing, real answer is probably somewhere in between [makes we wonder, can we develop an equation to tease out truth]
  •  large ceiling effects on default positive framing — clinics are satisfactory despite serious deficiencies
  • can’t increase scores with any intervention but you can easily decrease the scores
  • maybe patient satisfaction is the wrong measure
  • recommend using negative framing to avoid ceiling effects [I wonder if in north america, we’re so good at complaining that this isn’t relevant]

The impact of questionnaire design on measurements in surveys #1 #ESRA15  #MRX  

Live blogged from #ESRA15 in Reykjavik. Any errors or bad jokes are my own.

I tried to stay up until midnight last night but ended going to bed around 10:30pm. Naturally, it was still daylight outside. I woke up this morning at 6am in broad daylight again. I’m pretty sure it never gets dark here no matter what they say. I began my morning routine as usual. Banged my head on the slanted ceiling, stared out the window at the amazing church, made myself waffles in the kitchen, and then walked past the pond teaming with baby ducks. Does it get any better? I think no. Except of course knowing i had another day of great content rich sessions ahead of me!


designs and developments of the income measures in the european social surveys

  • tested different income questions. allowed people to use a weekly, monthly, or annual income scale as they wished. there was also no example response, and no example of what constitutes income. Provided about 30 answer options to choose from, shown in three columns. Provided same result as a very specific question in some countries but not others.
  • also tested every country getting the same number breaks, groups weren’t arranged to reflect each countries distribution. this resulted in some empty breaks [but that’s not necessarily a problem if the other breaks are all well and evenly used]
  • when countries are asked to set up number breaks in well defined deciles, high incomes are chosen more often – affected because people had different ideas of what is and isn’t taxable income
  • [apologies for incomplete notes, i couldn’t quite catch all the details, we did get a “buy the book” comment.]

item non-response and readability of survey questionnaire

  • any non-substantive outcome – missing values, refusals, don’t knows all count
  • non response can lower validity of survey results
  • semantic complexity measured by familiarity of words, length of words, abstract words that can’t be visualized, structural complexity
  • Measured – characters in an item, length of words, percent of abstract words, percent of lesser known words, percent of long words 12 or more characters
  • used the european social survey which is a highly standardized international survey, compared english and estonian, it is conducted face to face, 350 questions, 2422 uk respondents
  • less known and abstract words create more non-response
  • long words increase nonresponse in estonian but not in english, perhaps because english words are shorter anyways
  • percent of long words in english created more nonresponse
  • total length of an item didn’t affect nonresponse
  • [they used a list of uncommon words for measurement, such a book/list does exist in english. I used it in school to choose a list of swear words that had the same frequency levels as regular words.]
  • [audience comment – some languages join many words together which means their words are longer but then there are fewer words, makes comparisons more difficult]

helping respondents provide good answers in web surveys

  • some tasks are inherently difficult in surveys, often because people have to write in an answer, coding is expensive and error prone
  • this study focused on prescription drugs which are difficult to spell, many variations of the same thing, level of detail is unclear, but we have full lists of all these drugs available to us
  • tested text box, drop box to select from list, javascript (type ahead look up)
  • examined breakoff rates, missing data, response times, and codability of responses
  • asked people if they are taking drugs, tell us about three
  • study 1 – breakoffs higher from dropbox and javascript; median response times longer, but codability was better. LIsts didn’t work well at all.
  • study 2 – cleaned up the list, made all the capitalization the same. break off rates were now all the same. response times lower but still higher than the textbox version. codability still better for list versions.
  • study 3 – if they couldn’t find a drug in the list, they were allowed to type it out. unlike previous studies which proceeded with the missing data. dropbox had highest missing data. javascript had lowest missing data. median times highest for drop box. trends for more and more drugs as expected, effect is more but not as much more.
  • older browswers had trouble with dropdowns and javascript and had to be routed to the textbox options
  • if goal is to get codable answers, use a text box. if goal is to create skip patterns then javascript is the way to go.

rating scale labelling in web surveys – are numeric labels an advantage

  • you can use all words to label scales or just words on the end with numbers in between
  • research says there is less satisficing with verbal scales, they are more natural than numbers and there is no inherent meaning of numbers
  • means of the scales were different
  • less tie to completes the end labeled groups
  • people paid more attention to the five point labeled scale, and least to the end point labeled score
  • mean opinions did differ by scale, more positive on fully labeled scale
  • high cognitive burden to map responses of the numeric scales
  • lower reliability for the numeric labels

Surveying sensitive issues – challenges and solutions #ESRA15 #MRX  

Live blogged at #ESRA15 in Reykjavik. Any errors or bad jokes are my own. Break time brought some delightful donuts. I personally only ate one however on behalf of my friend, Seda, I ate several more just for her. By the way, since donuts are in each area, you can just breeze from one area to the next grabbing another donut each time. Just saying…

surveying sensitive questions – prevalence estimates of self-reported delinquency using the crosswise model

  • crime rates differ by country but rates of individuals reporting their own criminal behaviour shows opposite expectations. Thus countries with high rates have lower rates of self-report. Social desirability seems to be the case. Is this true?
  • Need to add random noise to the model so the respondent can hide themself. Needs no randomization device.
  • ask a non-sensiive quesiton and a sensitive question and asked to answer both the same way. Let the respondent indicate whether the answer to both are the same or different. You only need to know the answer of the first question (e.g., is your moms birthday in january? well 1/12 are in january).
  • crosswise model generates vastly high self-criminal rates in countries where you’d expect.
  • also asked people in the survey whether they answered carefully – 15% admitted they did not
  • crosswise results in mugh higher prevalence rates, causal models of deliquent behaviour could be very different
  • satisficing respondents gives less bias than expected
  • estimates of the crosswise model are conservative

pouring water into the wine – the advantages of teh crosswise model asking sensitive questions revisited

  • its easier to implement in self-administtered surveys, no extra randomization device necessary, cognitive burden is lower, no self-protection answering strategies
  • blood donation rates – direction question says 12% but crosswise says 18%
  • crosswise model had a much higher answering time, even after dropping extraordinarily slow people
  • model has some weakneses, the less the better approach is good to determine if the crosswise model works
  • do people understand the instructions and do they specifically follow those instructions

effects of survey sponsorship and mode of administration on respondents answers about their racial attitudes

  • used a number of prejudice scales both blatant and subtle
  • no difference in racial measures on condition of interviewer administration
  • blatant prejudice scale showed a significant interaction for type of sponsor
  • matters more when there is an interviewer and therefore insufficient privacy
  • sponsor effect is likely the result of social desirability
  • response bias is in opposite direction for academic and market research groups
  • does it depend which department does the study – law department, sociology department

impact of survey mode (mail vs telephone) and asking about future intentions 

  • evidence suggests that asking about intent to get screened before asking about screening may minimize over reporting of cancer screening. removes the social pressure to over report.
  • people report behaviors more truthfully in self-administrered forms than interviews
  • purchased real estate on an omnibus survey
  • no main effect for mode
  • in mail mode, asking about intent first was more reflective of reality of screening rates
  • 30% false positive said they had a test but it wasn’t in their medical record
  • little evidence that the intention item affected screening accuracy
  • mailed surveys may positively affected accuracy – but mail survey was one topic whereas the telephone was omnibus

effect of socio-demographic (mis)match between interviewers and respondents on the data quality of answers to sensitive questions

  • theory of liking, some say matching improves chances of participation, may also improve disclosure and reporting, especially gender matching
  • current matched within about five years of age as opposed to arbitrary cut-off points
  • also matched on education
  • male interviewer to female interviewee had lowest response rate
  • older interviewer had lower response rate
  • no effects for education
  • income had the most missing data, parent’s education was next highest missing data likely because education from 50 years ago was different and you’d have to translate, political party had high missing rate
  • if female subject refuses a male interviewer, send a female to try to convince them
  • it’s easier to refuse a person who is the same age as you [maybe it’s a feeling of superiority/inferiority – you’re no better than me, i don’t have to answer to you]
  • men together generate the least item non-response
  • women together might get too comfortable together, too chatty, more non-response, role-boundary issue
  • age matching is less item non-response
  • same education is less item non-response, why do interviewers allow more item non-response when theirrespondent  has a lower education


Related Posts


Direction of response scales #ESRA15 #MRX 

Live blogged at #ESRA15 in Reykjavik. Any errors or bad jokes in the notes are my own.

I discovered that all the buildings are linked indoors. Let it rain, let it rain, i don’t care how much it rains….  [Feel free to sing that as loud as you can.] Lunch was Skyr, oat cookies and some weird beet drink. Yup. I packed it myself. I always try to like yogurt and never really do. Skyr works for me. So far, coconut is my favourite. I’ve forgotten to take pictures of speakers today so let’s see if I can keep the trend going! Lots of folks in this session so @MelCourtright and I are not the only scale geeks out there . 🙂

Response scales: Effects of scale length and direction on reported political attitudes

  • instruments are not neutral, they are a form of communication
  • cross national projects use different scales for the same question so how do you compare the reuslts
  • trust in parliament is a fairly standard question for researchers and so makes a good example
  • 4 point scale is most popular but it is used up to 11 points, traditional format is very positive to very negative
  • included a don’t know in the answer options
  • transformed all scales into a 0 to 1 scale and evenly distributed all scores in between
  • means highest with 7 point scale traditional direction and lowest with 4 point and 11 point traditional direction
  • reverse direction had much fewer mean differences, essentially all the same
  • four point scales show differences in direction, 7 and 11 point show fewer differences in direction
  • [regression results shown on the screen – no one fainted or died, the speaker did not apologize or say she didn’t understand them. interesting difference compared to MRX events.]

Does satisficing drive scale direction effects

  • research shows answers shift towards the start fo the scale but this is not consistent
  • achoring and adjustment effects whereby people use the first answer option as the anchor, interpretative heuristics suggest people choose an early response to express their agreement with the questions, primacy effects due to satisficing decreases cognitive load
  • scores were more positive when the scale started positive, differences were huge across all the brands
  • the pattern is the same but the differences are noticeable
  • speeding measured as 300 milliseconds per word
  • speeders more likely to choose early answer option
  • answers are pushed to the start of the scale, limited evidnce that it is caused by satisficing

Ordering your attention: response order effects in web-based surveys

  • primacy happens more often visually and recency more often orally
  • scales have an inherence order. if you know the first answer option, you know the remainder of the options
  • sample size over 100 000, random assigned to scale order, also tested labeling, orientation, and number of response categories from 2 to 11
  • the order effect was always a primacy effect, differences were significant though small; significant more due to sample size [then why mention the results if you know they aren’t important?]
  • order effects occurred more with fully labeled scales, end labeled scales did not see response order effects
  • second study also supported the primacy effect with half of questions showing the effect
  • much stronger response seen with unipolar scales
  • vertical scales are much stronger response as well
  • largest effect seen for horizontal unipolar scale
  • need to run the same tests with grids, don’t know which response is more valid, need to know what they will be and when

Impact of repsonse scale direction on survey repsonses in web and mobile web surveys

  • why does this effect happen?
  • tested agreement scales and frequency scales
  • shorter scale decreases primacy effect
  • scale length has a signifciant moderating effect – strongly effect for 7 point scales compared to 5 point scale
  • labeling has significant moderating effects – stronger effect for fully labeled
  • question location matters – stronger effect on earlier questions
  • labeled behavioural scale shows the largest impact, end labeled attitudinal scale has the smallest effect
  • scale direction affects responses – more endorsement at start of scale
  • 7 point fully labeled frequency scale is most affected
  • we must use shorter scales and end labeling to reuce scale direction effects in web surveys

Importance of scale direction between different modes

  • term used is forward/reverse scale [as opposed to ascending/descending or positive/negative keyed]
  • in the forward version of the scale, the web creates more agreement; but face to face it’s very weak. face to face shows recency effect
  • effect is the same for general scales (all scales are agreement) and item specific scales (each scale reflects the specific question), more cognitive effort in the item specific scale so maybe less effort is invested in the response
  • item specific scale affected more by the web
  • randomizing scale matters more in online surveys


Related Posts

 

Assessing the quality of survey data (Good session!) #ESRA15 #MRX 

Live blogged from #ESRA15 in Reykjavik. Any error or bad jokes in the notes are my own. As you can see, I managed to find the next building from the six buildings the conference is using. From here on, it’s smooth sailing! Except for the drizzle. Which makes wandering between buildings from session to session a little less fun and a little more like going to a pool. Without the nakedness. 

Session #1 – Data quality in repeated surveys: evidence from a quasi-experimental design by multiple professors from university of Rome

  • respondents can refuse to participate in the study resulting in series of missing data but their study had very little missing data, only about 5% this time [that’s what student respondents does for you, would like to see a study with much larger missing rates]
  • questions had an i do not know option, and there was only one correct answer
  • 19% of gender/birthday/socioeconomic status changed from survey to survey [but we now understand that gender can change, researchers need to be open to this. And of course, economic status can change in a second]
  • Session #2 – me!  Lots of great questions, thank you everyone!

Session #3 – Processing errors in the cross national surveys

  • we don’t consider process errors very often as part of total survey error
  • found 154 processing errors in the series of studies – illegitimate variable values such as education that makes little sense or age over 100, misleading variable values, contradictory values, value discrepancies, lack of value labels, maybe you’re expecting a range but you get a specific value, what if 2 is coded as yes in the software but no in the survey
  • age and education were most problematic, followed by schooling
  • lack of labels was the worst problem, followed by illegitimate values, and misleading values
  • is 22% discrepancies out of all variables checked good or bad?

Session #4 – how does household composition derived from census data describe or misrepresent different family types

  • strength of census data is their exhaustivity, how does census data differ from a smaller survey
  • census counts household members, family survey describes families and explores people outside the household such as living apart, they desribe different universe. a boarder may not be measured in the family survey but yes mentioned in the census survey
  • in 10% of cases, more people are counted in the census, 87% have the same number of people on both surveys
  • census is an accounting tool, not a tool for understanding social life, people do not organize their lives to be measured and captured at one point and one place in time
  • census only has a family with at least one adult and at least one child
  • isolated adult in a household with other people is 5% of adults in the census, not classified the same in both surveys
  • there is a problem attributing children to the right people – problem with single parent families; single adults are often ‘assigned’ a child from the household
  • a household can include one or two families at the most – complicated when adult children are married and maybe have a kid. A child may be assigned to a grandparent which is in err.
  • isolated adults may live with a partner in the dwelling, some live with their parents, some live with a child (but children move from one household to another), 44% of ‘isolated’ adults live with family members, they aren’t isolated at all
  • previously couples had to be heterosexual, even though they survey as a union the rules split them into isolated adults [that’s depressing. thank you for changing this rule.]
  • census is more imperfect than the survey, it doesnt catch subtle transformations in societal life. calls into question definitions of marginal groups
  • also a problem for young adults who leave home but still have strong ties to the parents home – they may claim their own home and their parents may also still claim them as living together
  • [very interesting talk. never really thought about it]

Session #5 – Unexpectedly high number of duplicates in survey data

  • simulated duplicates created greater bias of the regression coefficient when up to 50% of cases were duplicated 2 to 5 times
  • birthday paradox – how many persons are needed in order to find two having an identical birthday – 23. A single duplicate in a dataset is likely.
  • New method – the Hamming diagram – diversity of data for survey – it looks like a normal curve with some outliers so i’m thing Hamming is simply a score like mahalonobis is for outliers
  • found duplicate sin 10% of surveys, 14 surveys comprised 80% of total duplicates with one survey at 33%
  • which case do you delete? which one is right if indeed one is right. always screen your data before starting a substantial analysis.
  • [i’m thinking that ESRA and AAPOR are great places to do your first conference presentation. there are LOTS of newcomers and presentation skills aren’t fabulous. so you won’t feel the same pressure as at other conferences. Of course, you must have really great content because here, content truly is king]
  • [for my first ESRA conference, i’m quite happy with the quality of the content. now let’s hope for a little sun over the lunch hour while I enjoy Skyr, my new favourite food!]

Related Posts

%d bloggers like this: