Questionning the questionnaire – using games to real self-report biases by Amber Brown and Joe Marks #CASRO #MRX
Live blogging from Nashville. Any errors or bad jokes are my own.
– surveys that aren’t well designed have social desirability bias, aspirational biases, demand characteristics, satisficing
– games can help with some of these if they are properly designed
– purchase/visit intent can have problems as people want to please you, are aspirational in their answers with little follow through, similar to charitable giving and exercise
– study asked about prior and future behaviour of behaviours
– people were offered either cash or theme park tickets and then asked whether they planned to visit the park – would they take the cash (they probably won’t go) or would they take the tickets (they probably will go) (Cash is always less)
– for a charity company – will you donate your incentive to a charity or take the cash (cash is always less)
– for an exercise company – will you take a sports authority gift card or a cash incentive (cash is always less)
– for readership – will you take a book store gift card or cash
– the incentive choice was a good predictor of the intent question
– games engage instinctual thinking. you’re just trying to win. people play games every day. it’s faster and gives less time for biases to creep in
– the test is actual choice behaviour which his similar to the marketplace
– would you be willing to donate to wikipedia? real case study – do you want $10 in cash or donate $50 to wikipedia. 14% chose the 10$ donation but 2% chose the $30 donation
– the game comes much closer to real behaviour
– can help to counter biases that poorly designed surveys may have
[i want to read the paper on this one. very cool!]
Reaching for the Holy Grail: Validation Techniques for Online B2B Panels by Edward Paul Johnson #CASRO #MRX
“Reaching for the Holy Grail: Validation Techniques for Online B2B Panels”
When completing surveys online respondents have the ability to claim false credentials in order to qualify for higher paid surveys. This research seeks to apply to the online panel space the same methods used to keep telephone B2B sample clean. The presentation seeks to provide conference attendees with a better understanding of the importance of recruitment source to quality; which validation techniques are effective; and the legal and privacy pitfalls to watch out for when validating business sample.
Edward Paul Johnson, Director of Analytics, SSI
- Our clients want business professionals, qualified decision makers, informed on the topic, engaged and interested
- We know client lists are skewed
- We want it all online because it’s faster and convenient
- Looking for B2B people, it’s hard to first find them. Then you have to reach the boss. Then you have to weed out the fraud to get rid of people who just want the incentives. Then you need to create a lasting relationship because they are valuable for future research as well.
- Many times, you don’t even know the market size of the sample of people you are looking for – computer hardware purchasers?
- Fraud is normally very small, normally only 1 to 2% in the overall panel. But fraud has an advantage because they qualify for more than the honest people do. There might be 20% fraud among business people just because more fraudsters qualify.
- What are our weapons? Existing relationships like hotels airlines and credit cards, data mining the profile to find contradictions, social media linking, phone validation
- Every weapon is a two edged sword – Bigger panel means lower quality but smaller panel means higher quality. Only bigger panels allow more fraudsters and smaller panels eliminate honest people.
- Better to have multiple tests, one with high specificity an done with high sensitivity
- Data mining the profile removed about 15% of panelists – number of reports, company size were important variables. Unusual company size to number of computers was helpful. Over 45 years old and less than 1 year in the industry helped somewhat.
- LinkedIn validation – 600 people volunteered to connect to LinkedIn but profiles were often incomplete, email addresses were different. number of connections and skills was helpful but individual skills were too varied to be helpful. Fraudsters likely don’t volunteer to connect their accounts. Wasn’t a good method.
- Phone validation – Good confirmation test but it excluded good panelists. Some gave bad phone numbers or it was disconnected or they no longer work at that number. Good confirmation test but not a good entry test.
- Tips for phone validation – let them know you will call them at work. Call very close to when they joined, within 2 days. Keep the validation short, to 2 minutes, name company title. Use trained interviewers who know how to bypass gate keepers. The gate keeper might be able to validate this for you.
- DOES improve data quality. Existing relationships isn’t enough. Be careful of excluding good people, can do just as much damage with false positive and false negatives.
- It will never be perfect. There is no holy grail but you can improve it all the time.
After two days at CASRO, I learned the following:
- When you use a 5 point or 7 point scale, you will get different answers
- When you label or don’t label scales, you will get different answers
- When you use a web survey vs a mobile survey, you will get different answers
- When you gamify a survey, you will get different answers
- (And from the good ol’ days) when you run the same survey on two different panels, you will get different answers
What are we to gain from all of this? Well, no matter what you do or how you do it, you will get different results on surveys every time. There’s just no way around it. What we HOPE is that the results won’t be contrary, but rather simply different in magnitude. That rank orders will remain generally similar, that hates will remain hates, and loves will remain loves. Indeed, if we are lucky enough to run a single study across a number of different methods or styles and get similar rank orders every time, it’s a good indication that the conclusions we’ve drawn are both reliable and valid. Heaven.
What this problem also suggests is that there is and can be no right answer. The only right answer is the one in the responder’s head and given that people can’t even adequately describe what is going on in their head, it seems that we will never know the right answer. What we can do is develop clear and specific research hypotheses, and match them up with clear and specific research designs. That is best way to create reliable and valid answers.
We may not know the exact right answer, but we can know a good answer.
- Validity of Gamification: Sweeney, Goldstein, and Becker #CASRO #MRX (lovestats.wordpress.com)
- Cyborgs vs Monsters in modularizing surveys: Edward Paul Johnson and Lynn Siluk #CASRO #MRX (lovestats.wordpress.com)
- Shorter isn’t always better: Inna Burdein #CASRO #MRX (lovestats.wordpress.com)
Sentiment analysis is a very controversial subject with many people highly doubtful of the validity of the results. With that in mind, I have developed a set of rules that will allow you to ensure your data is scored with validity levels greater than 90%.
- Choose messages that are short. The shorter the better. Tweets are a perfect example as people generally make only one concise point that can’t be misconstrued. Longer messages simply introduce extraneous information that isn’t essential to the main message.
- Don’t collect data from blogs and forums where people may express their points in long, drawn out, overly verbose ways. These types of messages may include well described pros and cons, positives and negatives, and this only confuses things.
- Remove from your dataset any messages that incorporate unclear opinions or contradictory opinions. Obviously, the speaker isn’t sure of what they are speaking and so their opinion won’t be helpful.
- Remove from your dataset any messages that you aren’t sure how to score. Perhaps they contain emoticons you aren’t familiar with, slang that doesn’t make any sense, or grammatical errors that render the message not understandable.
Rather than worrying about ignoring important subsamples of people who have complicated opinions, people who associate themselves with subcultures, or the obvious skewing and biasing of results, simply focus on the parts of the data that you know will be correct. And there you go. Your sentiment analysis is 95% accurate. It doesn’t generalize to any population, but boy is it accurate!
— If you’re interested in a less sarcastic view of how accurate sentiment analysis, Seth Grimes is the expert in this field. Read here as he explains why validity scores can’t really be any better than 83%
Wait, was that a typo? Quantity over quality? Well, I meant what I said.
Question #1: What was the sample size of your last tracker? 30 per time frame? 50 per time frame? What about your last custom study? 300? 500?
Question #2: How many pages of questions and demos and cross-tabs did you flip through searching for any chi-square or t-test that was statistically significant? 100? 200?
Here’s the problem. We run ridiculously long surveys with far too few participants per test cell and we are ok with searching through far too many Type 2 errors.
Here’s the solution. Put your money into large sample sizes and not into question topic after question topic. Focus on sample sizes within demographic groups rather than questions with 4 or 8 people per cell. Trade variety of questions for reliability of results. Trade overly long surveys for properly sampled cells. Trade breadth of topics for validity of individual questions. Take money away from more and more questions and put it directly into more and more validity and reliability. Radical.
Please comment below. What was the sample size of your last study and what was the sample size within many of the cells?
- How to encourage speeding in your surveys
- Merry Christmas to all and to all a good sample size #MRX (lovestats.wordpress.com)
- Big Data? Big Deal. #MRX (lovestats.wordpress.com)
- New book! The Listen Lady: A novel and social media research guide baked into one #MRX (lovestats.wordpress.com)
1) You can tell how valid a sentiment scoring system is by evaluating as few as 20 records
2) You can accurately judge validity by examining the originally assigned score and deciding if you agree with it
3) If data for one brand is valid, data for all the brands are probably valid as well
4) You can judge validity by checking twitter data as it is the lowest common denominator
5) If the system is based on natural language processing, you know the sentiment is valid
6) If the sentiment scoring is manual, you know it’s perfectly valid.
- Sentiment Analysis is THE BOMB! #mrx (lovestats.wordpress.com)
- Why Semantic Analysis trumps Sentiment Analysis (networkedinsights.com)
- Applications in Social Media: Sentiment Analysis 4 (socialtimes.com)
I love shopping for toilet paper. It is one of the most exhilarating parts of my life. It’s extremely rewarding to choose between the 12 pack and the 24 pack knowing that my family won’t have to worry about running out mid-wipe for at least a week or two. I love taking the extra time to choose between extra soft and supreme soft knowing that it brings with it the responsibility of selecting the most appropriate texture for my loved ones’ bottoms. And making the important decision between 2 ply and 3 ply means that I’ve have the joy of taking charge of cleaning up those nasty messes that no one wants to talk about.
I truly enjoy and look forward to each of these decisions because it is fun to consider the myriad options and know that I have succeeded in bringing joy to my family.
Unfortunately, even though the process of shopping for toilet paper is extremely fun, the process of answering surveys about toilet paper shopping isn’t. I just wish there was some way to make answering toilet paper surveys more fun, like a game, like it is in real life.
Because I know if the survey taking experience emulated my real life experience, my survey answers would be more valid. Don’t you agree?
Here’s your task. Read the following list of tasks and identify which ones are useless to brands and clients: – Watching how people interact with and actually use a product – Listening to how people talk about products with their peers – Learning which features people use to convince other consumers – Learning how consumers convince others to use a product
– Observing facial expressions of disgust and shame and love and peace – Watching for passion and complacency
Your second task: Make a list of all of the research methods that are error-free, risk-free and always give valid and reliable results.
There may be no perfect research method but there’s definitely a place for focus groups.