Live blogged from #ESRA15 in Reykjavik. Any errors or bad jokes are my own.
I tried to stay up until midnight last night but ended going to bed around 10:30pm. Naturally, it was still daylight outside. I woke up this morning at 6am in broad daylight again. I’m pretty sure it never gets dark here no matter what they say. I began my morning routine as usual. Banged my head on the slanted ceiling, stared out the window at the amazing church, made myself waffles in the kitchen, and then walked past the pond teaming with baby ducks. Does it get any better? I think no. Except of course knowing i had another day of great content rich sessions ahead of me!
- tested different income questions. allowed people to use a weekly, monthly, or annual income scale as they wished. there was also no example response, and no example of what constitutes income. Provided about 30 answer options to choose from, shown in three columns. Provided same result as a very specific question in some countries but not others.
- also tested every country getting the same number breaks, groups weren’t arranged to reflect each countries distribution. this resulted in some empty breaks [but that’s not necessarily a problem if the other breaks are all well and evenly used]
- when countries are asked to set up number breaks in well defined deciles, high incomes are chosen more often – affected because people had different ideas of what is and isn’t taxable income
- [apologies for incomplete notes, i couldn’t quite catch all the details, we did get a “buy the book” comment.]
item non-response and readability of survey questionnaire
- any non-substantive outcome – missing values, refusals, don’t knows all count
- non response can lower validity of survey results
- semantic complexity measured by familiarity of words, length of words, abstract words that can’t be visualized, structural complexity
- Measured – characters in an item, length of words, percent of abstract words, percent of lesser known words, percent of long words 12 or more characters
- used the european social survey which is a highly standardized international survey, compared english and estonian, it is conducted face to face, 350 questions, 2422 uk respondents
- less known and abstract words create more non-response
- long words increase nonresponse in estonian but not in english, perhaps because english words are shorter anyways
- percent of long words in english created more nonresponse
- total length of an item didn’t affect nonresponse
- [they used a list of uncommon words for measurement, such a book/list does exist in english. I used it in school to choose a list of swear words that had the same frequency levels as regular words.]
- [audience comment – some languages join many words together which means their words are longer but then there are fewer words, makes comparisons more difficult]
helping respondents provide good answers in web surveys
- some tasks are inherently difficult in surveys, often because people have to write in an answer, coding is expensive and error prone
- this study focused on prescription drugs which are difficult to spell, many variations of the same thing, level of detail is unclear, but we have full lists of all these drugs available to us
- examined breakoff rates, missing data, response times, and codability of responses
- asked people if they are taking drugs, tell us about three
- study 2 – cleaned up the list, made all the capitalization the same. break off rates were now all the same. response times lower but still higher than the textbox version. codability still better for list versions.
rating scale labelling in web surveys – are numeric labels an advantage
- you can use all words to label scales or just words on the end with numbers in between
- research says there is less satisficing with verbal scales, they are more natural than numbers and there is no inherent meaning of numbers
- means of the scales were different
- less tie to completes the end labeled groups
- people paid more attention to the five point labeled scale, and least to the end point labeled score
- mean opinions did differ by scale, more positive on fully labeled scale
- high cognitive burden to map responses of the numeric scales
- lower reliability for the numeric labels
Live blogged from the #ESRA15 conference in Reykjavik. Any error or bad jokes in these notes are my own. Thank you ESRA for the free wifi that made this possible. Thank you Reykjavik for coconut Skyr and miles upon miles of beautiful lupins.
- Data quality starts with high response rates and also needs trust from the public in order to provide honest answers
- Iceland is the grandmother of democracy, a nation of social trust, icelanders still give high reponse rates
- Intro: Guni Johannesson
- Settlement until 1262, decline until 19th century, rise again in 20th centure – traditional story
- founded the first democratic parliament, vikings were traders but also murderous terrorists, there was no decline, struggle for independance – this is the revised story; 2008 economic collapse due to many factors including misuse of history – people thought of themselves as vikings and took irresponsible risks
- Lars Lyberg
- WIsh we had more interesting presentations like the previous
- 3M – multi-national, regional, cultural surveys, reveal differences between countries and cultures
- [i always wonder, is cross cultural comparison TRULY possible]
- Some global surveys of happiness included the presence of lakes or strong unions which automatically excludes a number of countries
- problems with 3m studies normally emphasize minimum response rates, specifications are not always adhered to, sometimes fabricated data, translations not done well, lack of total survey error awareness, countries are very different
- special features of these studies – concepts must have a uniform meaning across countries, risk management differs, financial resources differ, national interests are in conflict, scientific challenges, adminisrative challenges, national pride is at stake especially when the media gets a hold of results
- basic design issues – conditions cannot vary from definitions to methods to data collection, sampling can and should vary, weighting and stats are grey zones, quality assurance is necessary
- Must QC early interviews of each interviewer, specs are sometimes not understood, sometimes challeneged, not affordable, not in line with best practice, overwhelming
- Common challenges – hard to reach respondents, differences in literacy levels, considerable non response
- interviewers should be alone with respondent for privacy reasons but it is common to not be alone – india, iraq, brazil there are often extra people around which affects the results, this is particularly important re mental health
- a fixed response rate goal can be almost impossible to achieve, 70% is just unreasonable in many places. spending so much money to achieve that one goal is in conflict with TSE and all the other errors that could be attended to instead. in this example, only a few of the countries achieved it and only barely [and I wonder to what unethical means they went to achieve those]
- strategies – share national exeriences, training, site visits, revised contact forms, explore auxiliary data, monitor fieldwork, assess non response bias
- data fabrication [still cant believe professionals do this 😦 ] 10 of 70 countries in a recent study have questionnable data, in 3 cases they clearly showed some data was fabricated PISA 2009, they often copy paste data [sigh, what a dumb method of cheating, just asking to be caught. so i’m glad they were dumb]
- [WHY do people fabricate? didn’t get the desired response rate? embarrassed about results? too lazy to collect data?]
- Translation issues – translation used to be close transalation with back translation, focus on replication “are you feeling blue” doesnt have the same meaning in another language, this still happens
- Team Translation Model – TRAPD – draft translations, review and refine, adjudicate for pretest
- Social desireaility differs in conformist and individual societies, relative status between interviewer and respondents, response process is different, perceptual variation is magnified even within a country, questionnaires must be different across countries
- workloads differ – countries use different validation methods, countries dont know how to calculate weights, interviewer workload differed
- specifications are often dubious, all kinds of variations are permitted, proxy responses can range fro 0% to 50% which is really bad for embarrassing questions where people don’t want others to know (e.g., a spouse could say the other spouse is happy)
- Quality management approach – descrease distance between user and producer, find root causes of problems, allocate resources based on risk assessment, coordinate team responsibilities, strive for real time interventions, build capacity
- Roger Jowell – 10 golden rules for cross national studies [find and reach this, it’s really good]
- don’t confuse respect for cultural variations with tolderance of methodological anarchy, don’t aim for as many countries as possible, never do a survey in a country you know little about, pay as much attention to aggregate level background information as the individual level variables, assume any new variation you discover is an artifact, resist the temptation to crosstab everything [smart dude, i like these!]
- Surveying sensitive issues – challenges and solutions #ESRA15 #MRX
- Direction of response scales #ESRA15 #MRX
- Assessing the quality of survey data (Good session!) #ESRA15 #MRX
- Keynote: Design and implementation of comparative surveys by Lars Lyberg #ESRA15 #MRX
- How to go to a pool in Reykjavik #ESRA15
Here’s another questionable practice in the survey research industry. It’s very common and the brand people love it. But, survey takers hate it! What am I talking about? Surveys where the brand is talked about as if it was a person. Check out this example for a non-alcoholic beverage:
I know some arrogant people and I know some dominant people. I can’t say I’ve ever met an arrogant or dominant beverage. Now, that’s not to say I don’t understand why the survey is set up like that. It is extremely useful to understand how people perceive your brand and this is one way of doing it. It works fabulously in focus groups when you lead the folks into a new way of thinking about a product.
Unfortunately, a lot of survey takers just don’t have the time, incentive nor inclination to get as passionate about a product as the brand manager does. The brand manager lives, eats, and breathes that brand. They come to work, spend 8 to 14 hours reading about it, researching it, and hypothesizing about it. At the end of the day, that brand really is a person to them.
To me, it’s just a can of pop. And i’m thirsty now.
This is easy to do as most market research surveys are already designed to accomplish it. If you’ve taken a survey recently, you’ve probably seen it. Here’s a great example:
Let me guess, you said ‘Extremely Important’ to every single item. And, you probably answered completely honestly. Know what? You just straightlined. Straightlining is a very bad thing in the world of survey design. Why? Because it’s hard to tell whether someone truly answered the questions carefully or they were simply clicking as fast as they possibly could without reading anything. Gimme that incentive baby!
But how you do get around this? The most important thing about designing high quality survey questions is ensuring the use of both positively and negatively keyed items. In other words, half of your questions need to be phrased in a way that make the product sound good and the other half bad. Here’s how it works…
Instead of saying ‘Is all natural,’ it could have said ‘Has artificial ingredients.’
Or, instead of ‘Is a good source of nutrition,’ it could have said ‘Is a poor source of nutrition.’
If it’s this easy to, why don’t we do it? Why do I constantly get pre-written surveys that are chock full of questions that encourage straightlining? Maybe they’ve done it like this for a long time and they don’t want their norms tampered with. Maybe they feel like they’re encouraging people to think negatively about the product. Or, by offering negative options, maybe they’re encouraging people to rate their products negatively even if they wouldn’t have otherwise. But in the end, don’t you want QUALITY data? Trustworthy data? Actionable data?
So follow the rules. If there is no option but to write a grid question. At least write a good grid question.
[tweetmeme source=”lovestats” only_single=false]The Unintelligencer
So how was yore dai today? Done yall grab an coffee, sit at yor desk, en than nots move until lunch? ore, Do joo spend 20 minuteses duin quick check o’ your email, followed by an 10 minute discussion of new product, followed by an 2 are meetin durrin which tiem ewe checked your email & twittered? Did yoo wach TV last night? Did ytou patiently sit 4 da entire 60 minuteses watchin aw de commercials or did yuo git up at erry commercial for a drink, a snack, a peck onna cheek, an email, or a…. um…. pee break? My guess iz you chose the second opsion ins boff caseses.
So Y do wii ekspect survey responders too b able 2 sit thru a 45 minute survey? Why do wee ekspect thems tew do it 1ce a week, eveyr week? wut could possibly cause them too be interested n a survey for that lawng when you don’t even do it in your regular life?
Awl we’re reely doin wif theeses lawng surveys is givin our precious survey participants reason to reconsider sharin their opinions, reason to let their attension wander, reason to move oan to somethin else. Sure, long survey gievs you lots of detailed data, in depff informasion, and plenty of opportunity to run fancy schmancy multivariate statistics. Butt, wiff response rateses followin such a scary declyn, it is hi time the survey research industry reconsiders what a long survey is. On that note, tell mee what YOU think is too long for a survey.
So how was your day today? Did you grab a coffee, sit at your desk, and then not move until lunch? Or, did you spend 20 minutes doing a quick check of your email, followed by a 10 minute discussion of a new product, followed by a 2 hour meeting during which time you checked your email and twittered? Did you watch TV last night? Did you patiently sit for the entire 60 minutes watching all the commercials or did you get up at every commercial for a drink, a snack, a peck on the cheek, an email, or a…. um…. pee break? My guess is you chose the second option in both cases.
So why do we expect survey responders to be able to sit through a 45 minute survey? Why do we expect them to do it once a week, every week? What could possibly cause them to be interested in a survey for that long when you don’t even do it in your regular life?
All we’re really doing with these long surveys is giving our precious survey participants reason to reconsider sharing their opinions, reason to let their attention wander, reason to move on to something else. Sure, long survey give you lots of detailed data, in depth information, and plenty of opportunity to run fancy schmancy multivariate statistics. But, with response rates following such a scary decline, it is high time the survey research industry reconsiders what a long survey is. On that note, tell me what YOU think is too long for a survey.
You live, breathe, and eat your brand. You love nothing more than to discuss the most intricate and minute details of how amazing your brand is, how revolutionary it is, how much better it is than any other competitive brand out there. And, you can talk about it for hours on end and always have something else to talk about. I don’t blame you. I’ve tried your brand. It is totally amazing.
But, do you SERIOUSLY think that survey participants care even 1% as much as you do? They’re just grabbing the pads of paper on the top shelf at Office Depot because they ran out of paper. Do you SERIOUSLY think they care so much about paper that they can remain engaged in a 30 minute survey, discussing how your brand of paper meets all of their paper and paper accessory needs? How it fills their desire to know at the depths of their heart that this paper is statistically significantly better than some other paper?
Perhaps I’m stretching things a bit, but I suspect you see my point. My favourite real example is a survey about gum. Grid after grid after grid. Does it “meet your daily chewing needs?” Does it “satisfy your nutritional requirements?” For all I care, they might as well ask me if it “meets my needs for world peace.” All I want is something that tastes good to chew on while I wait for the bus.
Think about that next time you’re designing the most grammatically correct, professionally phrased, and comprehensively detailed (long) survey. Why aren’t you seeing good data quality with all those efforts?