Tag Archives: sampling

Voxpopme 7: How will automation impact the industry, and you personally, over the next twelve months?

Along with a group of market resevoxpopme logoarchers from around the world, I was asked to participate in Voxpopme Perspectives โ€“ an initiative wherein insights industry experts share ideas about a variety of topics via video. You can read more about it here or watch the videos here. Viewers can then reach out over Twitter or upload their own video response. Iโ€™m more of a writer so youโ€™ll catch me blogging rather than vlogging. ๐Ÿ™‚

Episode 7: How will automation impact the industry, and you personally, over the next twelve months?

I’m not concerned with the next 12 months whatsoever. If we aren’t planning for the next five and ten years, we’re going to be in a lot of trouble. With that in mind, I’d like to consider how automation and artificial intelligence will impact me over that time frame.

The reality is that my job will change a lot. No longer will I receive a dataset, clean out poor quality data, run statistics, write a report, and prepare a presentation. Every aspect of that will be handled automatically and with artificial intelligence. I will receive a report at my desk that is perfectly written, with the perfect charts, and perfectly aligned to my clients’ needs.

So why will I still be there? I’ll be the person who points out the illogical outcomes of the data. How errors enter during the data collection process via human cognitive biases. I’ll be the person who interprets the data in an odd way that wasn’t predicted by the data but is still a plausible outcome. I’ll help clients read between the lines and use the results wisely rather than by the book – or rather, by the AI.

So how will automation and artificial intelligence impact our industry? If your business sells repetitive tasks, from survey programming to data cleaning to statistics to chart preparation and report writing, you’d better have a long term plan. Figure out your unique method of selling WISE applications. Not just data, but wiser data and wiser charts and wiser reports. There are already hundreds of companies innovating in these areas right now and they are waiting to find their customers. I expect you don’t want to hand over your customers to them.

How the best research panel in the world accurately predicts every election result #polling #MRXย 

Forget for a moment the debate about whether the MBTI is a valid and reliable personality measurement tool. (I did my Bachelors thesis on it, and I studied psychometric theory as part of my PhD in experimental psychology so I can debate forever too.) Let’s focus instead on the MBTI because tests similar to it can be answered online and you can find out your result in a few minutes. It kind of makes sense and people understand the idea of using it to understand themselves and their reactions to our world. If you’re not so familiar with it, the MBTI divides people into groups based on four continuous personality characteristics: introversion/extroversion, sensing/intuition, thinking/feeling, judging/perception . (I’m an ISTJ for what it’s worth.)

Now, in the market and social research world, we also like to divide people into groups. We focus mainly on objective and easy to measure demographic characters like gender, age, and region though sometimes we also include household size, age of children, education, income, religion, and language. We do our best to collect samples of people who look like a census based on these demographic targets and oftentimes, our measurements are quite good.  Sometimes, we try to improve our measurements by incorporating a different set of variables like political affiliation, type of home, pets, charitable behaviours, and so forth. 

All of these variables get us closer to building samples that look like census but they never get us all the way there. We get so close and yet we are always missing the one thing that properly describes each human being. That, of course, is personality. And if you think about it, in many cases, we’re only using demographic characteristics because we don’t have personality data. Personality is really hard to measure and target. We use age and gender and religion and the rest to help inform about personality characteristics. Hence why I bring up the MBTI. The perfect set of research sample targets. 

The MBTI may not be the right test, but there are many thoroughly tested and normed personality measurement scales that are easily available to registered, certified psychologists. They include tests like the 16PF, the Big 5, or the NEO, all of which measure constructs such as social desirability, authoritarianism, extraversion, reasoning, stability, dominance, or perfectionism. These tests take decades to create and are held in veritable locked boxes so as to maintain their integrity. They can take an hour or more for someone to complete and they cost a bundle to use. (Make it YOUR entire life’s work to build one test and see if you give it away for free.) Which means these tests will not and can not ever be used for the purpose I describe here. 

However, it is absolutely possible for a Psychologist or psychological researcher to build a new, proprietary personality scale which mirrors standardized tests albeit in a shorter format, and performs the same function. The process is simple. Every person who joins a panel answers ten or twenty personality questions. When they answer a client questionnaire, they get ten more personality questions, and so on, and so on, until every person on a panel has taken the entire test and been assigned to a personality group. We all know how profiling and reprofiling works and this is no different. And now we know which people are more or less susceptible to social desirability. And which people like authoritarianism. And which people are rule bound. Sound interesting given the US federal election? I thought so. 

So, which company does this? Which company targets people based on personality characteristics? Which company fills quotas based on personality? Actually, I don’t know. I’ve never heard of one that does. But the first panel company to successfully implement this method will be vastly ahead of every other sample provider. I’d love help you do it. It would be really fun. ๐Ÿ™‚

New Math For Nonprobability Samples #AAPORย 

Moderator: Hanyu Sun, Westat

Next Steps Towards a New Math for Nonprobability Sample Surveys; Mansour Fahimi, GfK Custom Research Frances M. Barlas, GfK Custom Research Randall K. Thomas, GfK Custom Research Nicole R. Buttermore, GfK Custom Research

  • Neuman paradigm requires completes sampling frames and complete response rates
  • Non-prob is important because those assumptions are not met, sampling frames are incomplete, response rates are low, budget and time crunches
  • We could ignore that we are dealing with nonprobability samples, find new math to handle this, try more weighting methods [speaker said commercial research ignores the issue – that is absolutely not true. We are VERY aware of it and work within appropriate guidelines]
  • In practice, there is incomplete sampling frames so samples aren’t random and respondents choose to not respond and weighting has to be more creative, uncertainty with inferences is increasing
  • There is fuzz all over, relationship is nonlinear and complicated 
  • Geodemographic weighting is inadequate; weighted estimates to benchmarks show huge significant differences [this assumes the benchmarks were actually valid truth but we know there is error around those numbers too]
  • Calibration 1.0 – correct for higher agreement propensity with early adopters – try new products first, like variety of new brands, shop for new, first among my friends, tell others about new brands; this is in addition to geography
  • But this is only a Universitรฉ adjustment, one theme, sometimes it’s insufficient
  • Sought a Multivariate adjustment
  • Calibration 2.0 – social engagement, self importance, shopping habits, happiness, security, politics, community, altruism, survey participation, Internet and social media
  • But these dozens of questions would burden the task for respondents, and weighting becomes an issue
  • What is the right subset of questions for biggest effort
  • Number of surveys per month, hours on Internet for personal use, trying new products before others, time spend watching TV, using coupons, number of relocations in past 5 years
  • Tested against external benchmarks, election, BRFSS questions, NSDUH, CPS/ACS questions
  • Nonprobability samples based on geodemogarphics are the worst of the set, adding calibration improves them, nonprobability plus calibration is even better, probability panel was the best [pseudo probability]
  • Calibration 3.0 is hours on Internet, time watching TV, trying new products, frequency expressing opinions online
  • Remember Total Research Error, there is more error than just sampling error
  • Combining nonprobability and probability samples, use stratification methods so you have resemblance of target population, gives you better sample size for weighting adjustments
  • Because there are so many errors everywhere, even nonprobability samples can be improved
  • Evading calibration is wishing thinking and misleading 

Quota Controls in Survey Research: A Test of Accuracy and Inter-source Reliability in Online Samples; Steven H. Gittelman, MKTG, INC.; Randall K. Thomas, GfK Custom Research Paul J. Lavrakas, Independent Consultant Victor Lange, Consultant

  • A moment of silence for a probabilistic frame ๐Ÿ™‚
  • FoQ 2 – do quota controls help with effectiveness of sample selections, what about propensity weight, matching models
  • 17 panels gave 3000 interviews via three sampling methods each; panels remain anonymous, 2012-2013; plus telephone sample including cell phone; English only; telephone was 23 minutes 
  • A – nested region, sex, age
  • B – added non nested ethnicity quotas
  • C – add no nested education quotas
  • D – companies proprietary method
  • 27 benchmark variables across six government and academic studies; 3 questions were deleted because of social desirability bias
  • Doing more than A did not result in reduction of bias, nested age and sex within region was sufficient; race had no effect and neither did C and those made the method more difficult; but this is overall only and not looking at subsamples
  • None of the proprietary methods provided any improvement to accuracy, on average they weren’t powerful and they were a ton of work with tons of sample
  • ABC were essentially identical; one proprietary methods did worse;  phone was not all that better
  • Even phone – 33% of differences were statistically significant [makes me think that benchmarks aren’t really gold standard but simply another sample with its own error bars]
  • The proprietary methods weren’t necessarily better than phone
  • [shout out to Reg Baker ๐Ÿ™‚ ]
  • Some benchmarks performed better than others, some questions were more of a problem than others. If you’re studying Top 16 you’re in trouble
  • Demo only was better than the advanced models, advanced models were much worse or no better than demo only models
  • An advanced model could be better or worse on any benchmark but you can’t predict which benchmark
  • Advanced models show promise but we don’t know which is best for which topic
  • Need to be careful to not create circular predictions, covariates overly correlated, if you balance a study on bananas you’re going to get bananas
  • Icarus syndrome – covariates too highly correlated
  • Its’ okay to test privately but clients need to know what the modeling questions are, you don’t want to end up with weighting models using the study variables
  • [why do we think that gold standard benchmarks have zero errors?]

Capitalizing on Passive Data in Online Surveys; Tobias B. Konitzer, Stanford University David Rothschild, Microsoft Research 

  • Most of our data is nonprobability to some extent
  • Can use any variable for modeling, demos, survey frequency, time to complete surveys
  • Define target population from these variables, marginal percent is insufficient, this constrains variables to only those where you know that information 
  • Pollfish is embedded in phones, mobile based, has extra data beyond online samples, maybe it’s a different mode, it’s cheaper faster than face to face and telephone, more flexible than face to face though perhaps less so than online,efficient incentives
  • 14 questions, education, race, age, location, news consumption, news knowledge, income, party ID, also passive data for research purposes – geolocation, apps, device info
  • Geo is more specific than IP address, frequency at that location, can get FIPS information from it which leads to race data, with Lat and long can reduce the number of questions on survey
  • Need to assign demographics based on FIPS data in an appropriate way, modal response wouldn’t work, need to use probabilities, eg if 60% of a FIPS is white, then give the person a 60% chance of being white
  • Use app data to improve group assignments

Improvements to survey modes #PAPOR #MRXย 

What Are They Thinking? How IVR Captures Public Opinion For a Democracy, Mary McDougall, Survox

  • many choices, online is cheapest followed by IVR followed by phone interview
  • many still do not have internet – seniors, non-white, low income, no high school degree
  • phone can help you reach those people, can still do specific targeting
  • good idea to include multiple modes to test for any mode effects
  • technology is no longer a barrier for choosing a data collection strategy
  • ignoring cell phones is poor sampling
  • use labor strategically to allow IVR
  • tested IVR on political polling, 300 completes in 2.5 hours, met the quotas, once a survey was started it was generally completed

The Promising Role of Fax in Surveys of Clinical Establishments: Observations from a Multi-mode Survey of Ambulatory Surgery Centers, Natalie Teixeira, Anne Herleth, and Vasudha Narayanan, Weststat; Kelsey O’Yong, Los Angeles Department of Public Health

  • we often want responses from an organization not a company
  • 500 medical facilities, 60 questions about staffing and infection control practices
  • used multimode – telephone, postal, web, and fax
  • many people requested the survey by fax and many people did convert modes
  • because fax was so successful, reminder calls were combined with fax automatically and saw successful conversions to this method
  • this does not follow the current trend
  • fax is immediate and keeps gatekeepers engaged, maybe it was seen as a novelty
  • [“innovative fax methodology” so funny to hear that phrase. I have never ever ever considered fax as a methodology. And yet, it CAN be effective. ๐Ÿ™‚ ]
  • options to use “mass” faxing exist

The Pros and Cons of Persistence During Telephone Recruitment for an Establishment Survey, Paul Weinfurter and Vasudha Narayanan, Westat

  • half of restaurant issues are employees coming to work ill, new law was coming into effect regarding sick pay
  • recruit 300 restaurants to recruit 1 manager, 1 owner, and a couple food preparers
  • telephone recruitment and in person interviews, English, Spanish, mandarin, 15 minutes, $20 gift card
  • most of the time they couldn’t get a manager on the phone and they received double the original sample of restaurants to contact
  • it was assumed that restaurants would participate because the sponsor was health inspectors, but it was not mandatory and they couldn’t be told it was mandatory, there were many scams related to this so people just declined, also all of the health inspectors weren’t even aware of the study
  • 73% were unreachable after 3 calls, hard to get a person of authority during open hours
  • increased call attempts to five times, but continued on when they thought recruitment was likely
  • recruited 77 more from people who were called more than 5 times
  • as a result, data were not limited to a quicker to reach sample
  • people called up to ten times remained noncommittal and never were interviewed
  • there wasn’t an ideal number of calls to get maximum recruits and minimum costs
  • but the method wasn’t really objective, the focus was on restaurants that seemed like they might be reachable
  • possibly more representation than if they had stopped all their recruitment at five calls
  • [would love to see results crossed by number of attempts]

Analysis, design, and sampling methods #PAPOR #MRXย 

Live blogged at #PAPOR in San Francisco. Any errors or bad jokes are my own.

Enhancing the use of Qualitative Research to Understand Public Opinion, Paul J. Lavrakas, Independent Consultant; Margaret R. Roller, Roller Research

  • thinks research has become to quantitative because qual is typically not as rigorous but this should and can change
  • public opinion in not a number generated from polls, polls are imperfect and limited
  • aapor has lost sight of this [you’re a brave person to say this! very glad to hear it at a conference]
  • we need more balance, we aren’t a survey research organization, we are a public opinion organization, our conference programs are extremely biased quantitative
  • there should be criteria to judge the trustworthyness of research – was it fit for purpose
  • credible, transferable, dependability, confirmability
  • all qual research should be credible, analyzable, transparent, useful
  • credible – sample representation and data collection
  • do qual researchers seriously consider non-response bias?
  • credibility – scope deals with coverage design and nonresponse, data gathering – information obtained, researcher effects, participant effects
  • analyzability – intercoder reliability, transcription quaity
  • transparency – thick descriptions of details in final documents

Comparisons of Fully Balanced, Minimally Balanced, and Unbalanced Rating Scales, Mingnan Liu, Sarah Cho, and Noble Kuriakose, SurveyMonkey

  • there are many ways to ask the same question
  • is it a good time or a bad time? – fully balanced
  • is it a good time or not? – minimally balanced
  • do you or do you not think it is getting better?
  • are things headed in the right direction?
  • [my preference – avoid introducing any balancing in the question, only put it in the answer. For instance: What do you think about buying ย a house? Good time, Bad time]
  • results – effect sizes are very small, no differences between the groups
  • in many different questions tested, there was no difference in the formats

Conflicting Thoughts: The Effect of Information on Support for an Increase in the Federal Minimum Wage Level, Joshua Cooper & Alejandra Gimenez, Brigham Young University, First Place Student Paper Competition Winner

  • Used paper surveys for the experiment, 13000 respondents, 25 forms
  • Would you favor or oppose raising the minimum wage.
  • Some were told how many people would increase their income, some were told how many jobs would be lost, some were told both
  • Negative info opposed a wage increase, positive info in favor of wage increase, people who were told both opposed a wage increase
  • independents were more likely to say don’t know
  • negatively strongly outweighs the good across all types of respondents regardless of gnder, income, religion, partyID
  • jobs matter, more than anything

Sample composition in online studies #ESRA15 #MRXย 

Live blogged at #ESRA15 in Reykjavik. Any errors or bad jokes are my own.

I’ve been pulling out every ounce of bravery I have here in Iceland and I went to the pool again last night (see prevoius posts on public nakedness!). I could have also broken my rule about not traveling after dark in strange cities but since it never gets dark here, I didn’t have to worry about that! The pool was much busier this time. I guess kiddies are more likely to be out and about after dinner on a weekday rather than sunday morning at 9am.  All it meant is that I had a lot more people watching to do. All in all good fun to see little babies and toddlers enjoying a good splash and float!

This morning, the sun was very much up and the clouds very much gone. I’ll be dreaming of breaktime all morning! Until then however, i’ve got five sessions on sample composition in online surveys, and representativeness of online studies to pay attention to. It’s going to be tough but a morning chock full of learning will get me a reward of more pool time!  what is the gain in a probability based online panel to provide internet access to sampling unites that did not have access before

  • germany has GIP, france has ELPSS, netherlands has LISS as probability panels
  • weighting might not be enough to account for bias of people who do not have internet access
  • but representativeness is still a problem because people may not want to participate even if they are given access, recruitment rates are much lower among non-interenet households
  • probaility panels still have problems, you won’t answer every survey you are sent, attrition
  • do we lose much without a representative panel? is it worth the extra cost
  • in Elipss panel, everyone is provided a tablet, not just people without access. the 3G tablet is the incentive you get to keep as long as you are on the panel. so everyone uses the same device to participate in the research
  • what does it mean to not have Internet access – used to be computer + modem. Now there are internet cafes, free wifi is everywhere. hard to define someone as no internet access now. We mean access to complete a survey so tiny smartphones don’t count.
  • 14.5% of adults in france were classified as not having internet. turned out to be 76 people in the end which is a bit small for analytics purposes. But 31 of them still connected every day.
  • non-internet access people always participated less than people who did have internet.
  • people without internet always differ on demographics [proof is chi-square, can’t see data]
  • populations are closer on nationality, being in a relationship, and education – including non-internet helps with these variables, improves representativity
  • access does not equal usage does not equal using it to answer surveys
  • maybe consider a probability based panel without providing access to people who don’t have computer/tablet/home access

parallel phone and web-based interviews: comparability and validity

  • phones are relied on for research and assumed to be good enough for representativeness, however most people don’t answer phone calls when they don’t recognize the number, cant use autodialler in the USA for research
  • online surveys can generate better quality due to programming validation and ability to only be able to choose allowable answers
  • phone and online have differences in presentation mode, presence of human interviewer, can read and reread responses if you wish, social desirability and self-presentation issues – why should online and offline be the same
  • caution about combining data from different modes should be exercised [actually, i would want to combine everything i possibly can. more people contributing in more modes seems to be more representative than excluding people because they aren’t identical]
  • how different is online nonprobability from telephone probability  [and for me, a true probability panel cannot technically exist. its theoretically possible but practically impossible]
  • harris did many years of these studies side by side using very specific methodologies
  • measured variety of topics – opinions of nurses, bug business trust, happiness with health, ratings of president
  • across all questions, average correlation between methods was .92 for unweighted means and .893 for weighted means – more bias with weighted version
  • is it better for scales with many response categories – corrections go up to .95
  • online means of attitudinal items were on average 0.05 lower on scale from 0 to 1. online was systematically biased lower
  • correlations in many areas were consistently extremey high, means were consistently very slightly lower for online data; also nearly identical rank order of items
  • for political polling, the two methods were again massively similar, highly comparable results; mean values were generally very slightly lower – thought to be ability to see the scale online as well as social desirability in telephone method, positivity bias especially for items that are good/bad as opposed to importance 
  • [wow, given this is a study over ten years of results, it really calls into question whether probability samples are worth the time and effort]
  • [audience member said most differences were due to the presence of the interviewer and nothing to do with the mode, the online version was foudn to be truer]

representative web survey

  • only a sample without bias can generalize, the correct answer should be just as often a little bit higher or a little bit lower than reality
  • in their sample, they underreprested 18-34, elementary school education, lowest and highest income people
  • [yes, there are demographic differences in panels compared to census and that is dependent completely on your recruitment method. the issue is how you deal with those differences]
  • online panel showed a socially positive picture of population
  • can you correct bias through targeted sampling and weighting, ethnicity and employment are still biased but income is better [that’s why invites based on returns not outgo are better]
  • need to select on more than gender, age, and region
  • [i love how some speakers still have non-english sections in their presentation – parts they forgot to translate or that weren’t translatable. now THIS is learning from peers around the world!]

https://twitter.com/gerrynicolaas/status/621985749022449664

measuring subjective wellbeing: does the use of websurveys bias the results? evidence from the 2013 GEM data from luxembourg

  • almost everyone is completely reachable by internet
  • web surveys are cool – convenient for respondents, less social desirability bias, can use multimedia, less expensive, less coding errors; but there are sampling issues and bias from the mode
  • measures of subjective well being – i am satisfied with my life, i have obtained all the important things i want in my life, the condition of my life are excellent, my life is close to my ideal [all positive keyed]
  • online survey gave very slightly lower satisfaction
  • the results is robuts to three econometric techqnies
  • results from happiness equations using differing modes are compatible
  • web surveys are reliable for collecting information on wellbeing

Training for survey research: who, where, how #AAPOR #MRXย 

moderated by Frauke Kreuter

Prezzie #1: training needs in survey methods

  • started a program in the 1970s with 4 courses, two in statistics and 2 in sampling, that was pretty good at the time, it covered the basics well
  • in 1993, 3 courses data collection, 3 in sampling, 2 practicums, 4 statistics, 3 design classes, 1 on federal statistical system
  • many journals have started since then, survey methodology, journal of official statistics, POQ and aapor publications journal of survey statistics and methodology, international conferences, now entire conference on total survey error
  • statisticians need to know basic theory, sampling theory of complex designs and weighting and imputation, small area estimation, disclosure control, record linkage, paradata, responsive design, panel survey methods, survey management, ethics; it’s impossible to know about and training for everything now
  • in early days, treating humans nicely was never mentioned, it wasn’t important; now we realize it’s important [yet we still don’t treat people as nicely as we ought too. isn’t a long, poorly designed, poorly written survey disrespectful?]
  • a masters degree can no longer cover everything we need to know as survey researchers, can run summer programs for training, can do online training, can do advanced certificate programs
  • the world is training so fast so how can training keep up with everything, punch cards are history and whatever we’re doing now will be history soon enough
  • we need to train more people but undergrads don’t know about our field

Prezzie #2: training for modern survey statisticians

  • survey practice journal special issue – February 2015
  • might be 147 federal positions per year advertising for statisicians, we are training only about a quarter of what’s needed
  • we need core statistical skills but also communication and presentation skills
  • training gap right now is most grad courses only have one course in sampling
  • most courses use R (55%)
  • only 40% of courses are taught by faculty who work specifically in statistics
  • weighting is a major gap, don’t talk about non-response adjustments
  • big aining gap in design trade offs – discrete parameters, continuous parameters, split sample randomization
  • most training happens on the job
  • [this session is so popular I can’t put my feet up on the chair in front of me! The room is full!]

Prezzie #3:  assessing the present

  • our science draws on many other disciplines
  • trained in the methods and how to evaluate those methods, trained in qual and quant like ethnography and psychmetric analysis
  • there are five university based programs, mostly at the graduate level, plus professional conferences, short courses and seminars
  • current programs do the core well, increasing focus on hybrid training, trainers are also practitioners which is invaluable
  • training gap on full survey life cycle experience, not enough practical experience in the training, not enough multi-cultural training and the population has a large and growing non-native english speaking base
  • quant dominates most survey programs [well of course, a survey program is surveys, why not develop a RESEARCH program]
  • you can have a productive career with little statistical knowlege, you can be a qual researcher [well that’s just offensive! why shouldn’t qual researchers also know statistics?]
  • ideal program still needs the core classes but it also needs more qual and user experience, more specialized courses, more practicums, more internships, more major projects like a thesis

Prezzie #4:  on the job training

  • she did interviews with people for her talk – she’s qualitative ๐Ÿ™‚
  • the workplace is interdisciplinary with many skill sets and various roles
  • know your role – are you a jack of all trades or filling a niche
  • in private business, everyone knows a bit about everything
  • at the census bureau, it’s massively specialized – she works on pre-testing of non-english surveys
  • you need to create opportunities for yourself – request stretch tasks, seek mentors, volunteer to help with new projects, shadow experienced people – screen sharing is a wonderful thing
  • take short courses, pursue graduate degrees, read and present – you are responsible for your future growth
  • as management you can – promote learning by doing, share the big picture, encourage networking, establish a culture of ongoing learning
  • you can learn on the job without money

Prezzie #5: future of training

  • we are in the midst of a paradigmatic shift in our industry
  • survey informatics – computer science, math and stats, and cognitive and social spcyhology – this is the new reality
  • resistance to online surveys is the same as the emergence of the telephone survey – skeptical, resistant
  • the speaker was a heretic when he first started talking about online surveys
  • we need technology and computers for higher quality data, increase quantitative in data collection
  • we now have paradata and metadata and auxiliary data – page level data, question level data, personal level data, day level data
  • data is no longer just the answers to the questions, methodologists need to understand all these types of data
  • concerned we’re not keep up and preparing the next generation
  • [discussion of how panels can be good, like people have never heard of panels, sadly some people do need to hear this]
  • computer science must be our new partner [go home and learn R and Python right now]
  • we won’t have to ask are you watching TV, the TV will know who’s the room, who’s looking at the TV, who left to get a snack
  • least powerful low level professors who know the new tech have no power to do anything about it and they have no funding

Evaluating polling accuracy #AAPOR #MRXย 

moderated by Mary McDougall, CfMC Survox Solutions

prezzie 1: midterm election polling in Georgia

  • georgia has generally been getting more media attention because it is bluer than expected, change may be fast enough to outpace polls, population has changed a lot particularly around atlanta, georgia has become less white
  • telephone survey using voter registration info, tested three weights – voter data, party targets, education weights
  • registered voting weight was much better, education weighting was worse
  • voter weight improvied estimates in georgia but you need voter information
  • [why do presenters keep talking about need more research for reliability purposes, isn’t that default?]

prezzie #2: error in the 2014 preelection polls

  • house effects – difference between one poll and every other poll, difference from industry average
  • they aren’t what they used to be, used to be interview method and weight practices
  • regression model is better than difference of means tests
  • could it be whether the pollster is extremely active or if they only do it once in a while
  • results show the more poll the more accurate you are, if you poll more in risky areas you are less accurate – but overall these results were kind of null
  • second model using just many pollsters was much better – arkansas had a lot more error, it had the most pollsters
  • in the end, cant really explain

prezzie #3: north carolina senate elections

  • to use RDD or registration based sampling; will turn out be high or low; a small university has limited resources with highly talented competition
  • chose RBS and did three polls, worked saturday to thursday, used live interviewers, screen for certain or probably will vote
  • RBS worked well here, there were demographic gaps, big race gap, big party gaps

prezzie #4: opinion polls in referendums

  • [seriously presenters, what’s with these slides that are paragraphs of text?]
  • most polls are private and not often released, questions are all different, there is no incumbent being measured
  • data here are 15 tobacco control elections and 126 questions in total, courts forced the polls to be public, find them on legacy library website
  • five types of questions – uninformed heads up questions where you’re asked whether you agree or strongly agree [i.e., leading, biased, unethical questions. annie not happy!]
  • predictions are better closer to the election, spending is a good predictor, city size is a good predictor
  • using the word ‘strongly’ in the question doesn’t improve accuracy
  • asking the question exactly as the ballot doesn’t improve the accuracy
  • asking more questions from your side of the opinion doesn’t improve the accuracy 
  • polls often overestimate the winner’s percentage
  • [these polls are great examples of abusing survey best practices research]
  • post election surveys are accurate and useful for other purposes
  • [big slam against appor for not promoting revealing of survey sponsors]

prezzie #5: comparing measures of accuracy

  • big issue is opt-in surveys versus random sample [assuming random sampling of humans is possible!]
  • accuracy affected by probability sampling, days to election, sample sizes, number of fielding days
  • used elections in sweden with has eight parties in parliament, many traditional methods are inappropriate with multi-candidate elections
  • sample size was good predictor, fielding days was not predictive, opt-in sample was worse but overall r square was very small

prezzie #6: polling third party candidates

  • why do we care about these? don’t want to waste space on candidates who only get 1% of the votes
  • 1500 data points, 121 organizations, 94 third party candidates – thank you to HuffPollster and DailyKos
  • aggregate accuracy was good, most were overstatement, but there was systematic bias
  • using the candidates names makes a difference, but if you name one candidate, you should name them all – i know i’m not voting for the top two candidates so i’m probably voting for this third party person you listed
  • accuracy gets better closer to the date, sometimes you don’t know who the third party candidate is till close to the date
  • live phone and IVR underestimate, internet overestimated
  • there were important house effects – CBS/yougove underestimate; PPP overestimates; on average FOX news is fairly accurate with third party candidates

Comparing probability and nonprobability samples #AAPOR #MRXย 

prezzie #1: how different and probability and nonprobability designs

  • nonprobability samples often get the correct rresults and probability samples are sometimes wrong. maybe they are more similar than we realize
  • nonprobability sampling may have a sample frame but it’s not the same as a census population
  • how do you choose, which factors are important
  • what method does the job that you require, that fits your purpose
  • is the design relevant, does it meet the goals with the resources, does the method gives you results in the time you need, accessability, can you find the people you need, interpretability and reliability, accuracy of estimates with acceptable mean square error, coherance in terms of results matching up with other data points from third parties [of course, who’s to say what the right answer is, everyone could be wrong as we’ve seen in recent elections]
  • nonprobability can be much faster, probability can be more relevant
  • nonprobability can get you right to the people you want to listen to
  • both methods suffer from various types of error, some more than others, must consider total survey error [i certainly hope you’ve been considering TSE since day 1]
  • driver will decide the type of study you end up doing
  • how can nonprob methods help prob methods, because they do offer much good stuff
  • [interesting talk, nice differentiation between prob and nonprob even though I did cringe at a few definitions, eg I dont see that quality is the differentiator between prob and nonprob]

prezzie #2: comparison of surveys based on prob and nonprob

  • limbo – how low can you go with a nonprob sample
  • bandwagon – well everyone else is doing nonprob sample [feelings getting hurt here]
  • statistical adjustment of nonprob samples helps but it is only a partial solution
  • nonprob panel may have an undefined response rate
  • need to look at point estimates and associations in both the samples, does sampling only matter when you need population point estimates
  • psychology research is often done all with college students [been there, done that!]
  • be sure to weight and stratify the data
  • education had a large effect between prob and nonprob sample [as it usually does along with income]
  • point estimates were quite different in cases, but the associations were much closer so if you don’t need a precise point estimate a nonprob sample could do the trick

prezzie #4: sample frame and mode effects

  • used very similar omnibus surveys, included questions where they expected to find differences
  • compared point estimates of the methods as well as to benchmarks of larger census surveys
  • for health estimates, yes, there were differences but where the benchmark was high so were the point estimates, similarly low or moderate point estimates, total raw differences maxed out around ten point
  • there was no clear winner for any of the question types though all highs were highs and lows were low
  • no one design is consistently superior

Combining a probability based telephone sample with an opt-in web panel by Randal ZuWallack and James Dayton #CASRO #MRX

Live blogging from Nashville. Any errors or bad jokes are my own.

– National Alcohol Survey in the US, for 18 years plus [because children don’t drink alcohol]
– even people who do not drink end up taking a 34 minute survey compared to 48 minutes for someone who does drink. this is far too long
– only at 18 minutes are people determined to be drinkers or abstainers. [wow, worst screen-out position EVER]
– why data fusion? not everyone is online [please, not everyone is on a panel either. and what about refusals? this fascination with probability panels is often silly]
– RDD measures population percents
– web measures depth of information conditional on who is who
– they matched an online and RDD sample using overlapping variables
– problem is matching can create strange ‘people’ that doesn’t explain real people. however, in aggregate, the distributions work out. we think about it being right on an individual level
– “The awesome thing about having a 45 minute survey”…is the statistical analyses you can do with it [made me laugh. there IS an awesome thing? ๐Ÿ™‚ ]
– [SAS user ๐Ÿ™‚ Have I told you lately….. that I love SAS]
– There were small differences in frequencies between the RDD and web surveys for both wine and beer. averages are very close but significantly different [enter conversation – when does significantly different mean meaningfully different]
– heavy drinking is much much greater on web surveys
– is there social desirability, recall bias ๐Ÿ™‚
– not everything lines up perfectly RDD vs web, general trends are the same but point estimates are different
– so how do you know which set of data is true or better?
– regardless, web does not reproduce RDD estimates
– problem now is which data is correct, need multiple samples from the same panel to test