Tag Archives: representative

Representativeness of surveys using internet-based data collection #ESRA15 #MRX 

Live blogged from #ESRA15 in Reykjavik. Any errors or bad jokes are my own.

Yup, it’s sunny outside. And now i’m back inside for the next session. Fortunately, or unfortunately, this session is once again in a below ground room with no windows so I will not be basking in sunlight nor gazing longingly out the window. I guess I’ll be paying full attention to another really great topic.


conditional vs unconditional incentives: comparing the effect on sample composition in the recruitment of the german internet panel study GIP

  • unconditional incentives tend to perform better than promised incentives
  • include $5 with advance letter compared to promised $10 with thank you letter; assuming 50% response rate, cost of both groups is the same
  • consider nonresponse bias, consider sample demo distribution
  • unconditional incentive had 51% response rate, conditional incentive had 42% response rate
  • didn’t see a nonresponse bias [by demographics I assume, so many speakers are talking about important effects but not specifically saying what those effects are]
  • as a trend, the two sets of data provide very similar research results, yes differences in means but always fairly close together, confidence intervals always overlap


evolution of representativeness in an online probability panel

  • LISS panel – probability panel, includes households without internet accesst, 30 minutes per month, paid for every completed questionnaire
  • is there systematic attrition, are core questionnaires affected by attrition
  • normally sociademographics only which is restrictive
  • missing data imputed using Mice
  • strongest loss in panel of sociodemographic properties
  • there are seasonal drops in attrition, for instance in June which is lots of holidays
  • has more effects for survey attitudes and health traits, less so for political and personality traits which are quite stable even with attrition
  • try to decrease attrition through refreshement based on targets


moderators of survey representativeness – a meta analysis

  • measured single mode vs multimode surveys
  • R-indicators – single measure from 0 to 1 for sample representativeness, based on logistic regression models for repsonse propensity
  • hypothesize mixed mode surveys are more representative than single mode surveys
  • hypothesize cross-sectional surveys are more representative than longitudinal survyes
  • heterogeneity not really explained by moderators

setting up a probability based web panel. lessons learned fromt he ELIPSS pilot study

  • online panel in france, 1000 people, monthly questionnaires, internet access given to each member [we often wonder about the effect of people being on panels since they get used to and learn how to answer surveys, have we forgotten this happens in probability panels too? especially when they are often very small panels]
  • used different contact mdoes including letters, phone, face to face
  • underrepresented on youngest, elderly, less educated, offline people
  • reason for participatign in order – trust in ELIPSS 46%, originality of project 37%, interested in research 32%, free internet access 13%
  • 16% attiriont after 30 months (that’s amazing, really low and really good!), response rate generally above 80%
  • automated process – invites on thursday, sustematic reminders, by text message, app message and email
  • individual followups by phone calls and letters [wow. well that’s how they get a high response rate]
  • individual followups are highly effective [i’d call them stalking and invasive but that’s just me. i guess when you accept free 4g internet and a tablet, you are asking for that invasiveness]
  • age becomes less representative over time, employment status changes a lot, education changes the most but of course young people gain more education over time
  • need to give feedback to panel members as they keep asking for it
  • want to broaden use of panel to scientific community by expanding panel to 3500 people



the pretest of wave 2 of the german health interview and examination survey for children and adolescents as a mixed mode survey, composition of participant groups

  • mixed mode helps to maintain high response, web is prefered by younger people, representativeness could be increased by using multiple modes
  • compared sequential and simultaneous surveys
  • single mode has highest response rate, mixed mode simultaneous was extremely close behind, mixed mode multi-step had the lowest rate
  • paper always gave back the highest porportion of data even when people had the choice of both, 11% to 43% chose the paper among 3 groups
  • sample composition was the same among all four groups, all confidence intervals overlap – age, gender, nationality, immigration, education
  • metaanalysis – overall trend is the same
  • 4% lower response rate in mixed mode – additional mode creates cognitive burden, creates a break in response process, higher breakoffs
  • mixed mode doesn’t increase sample composition nor response rates [that is, giving people multiple options as opposed to just one option, as opposed to multiple groups whereby each groups only knows about one mode of participation.]
  • current study is now a single mode study



Sample composition in online studies #ESRA15 #MRX 

Live blogged at #ESRA15 in Reykjavik. Any errors or bad jokes are my own.

I’ve been pulling out every ounce of bravery I have here in Iceland and I went to the pool again last night (see prevoius posts on public nakedness!). I could have also broken my rule about not traveling after dark in strange cities but since it never gets dark here, I didn’t have to worry about that! The pool was much busier this time. I guess kiddies are more likely to be out and about after dinner on a weekday rather than sunday morning at 9am.  All it meant is that I had a lot more people watching to do. All in all good fun to see little babies and toddlers enjoying a good splash and float!

This morning, the sun was very much up and the clouds very much gone. I’ll be dreaming of breaktime all morning! Until then however, i’ve got five sessions on sample composition in online surveys, and representativeness of online studies to pay attention to. It’s going to be tough but a morning chock full of learning will get me a reward of more pool time!  what is the gain in a probability based online panel to provide internet access to sampling unites that did not have access before

  • germany has GIP, france has ELPSS, netherlands has LISS as probability panels
  • weighting might not be enough to account for bias of people who do not have internet access
  • but representativeness is still a problem because people may not want to participate even if they are given access, recruitment rates are much lower among non-interenet households
  • probaility panels still have problems, you won’t answer every survey you are sent, attrition
  • do we lose much without a representative panel? is it worth the extra cost
  • in Elipss panel, everyone is provided a tablet, not just people without access. the 3G tablet is the incentive you get to keep as long as you are on the panel. so everyone uses the same device to participate in the research
  • what does it mean to not have Internet access – used to be computer + modem. Now there are internet cafes, free wifi is everywhere. hard to define someone as no internet access now. We mean access to complete a survey so tiny smartphones don’t count.
  • 14.5% of adults in france were classified as not having internet. turned out to be 76 people in the end which is a bit small for analytics purposes. But 31 of them still connected every day.
  • non-internet access people always participated less than people who did have internet.
  • people without internet always differ on demographics [proof is chi-square, can’t see data]
  • populations are closer on nationality, being in a relationship, and education – including non-internet helps with these variables, improves representativity
  • access does not equal usage does not equal using it to answer surveys
  • maybe consider a probability based panel without providing access to people who don’t have computer/tablet/home access

parallel phone and web-based interviews: comparability and validity

  • phones are relied on for research and assumed to be good enough for representativeness, however most people don’t answer phone calls when they don’t recognize the number, cant use autodialler in the USA for research
  • online surveys can generate better quality due to programming validation and ability to only be able to choose allowable answers
  • phone and online have differences in presentation mode, presence of human interviewer, can read and reread responses if you wish, social desirability and self-presentation issues – why should online and offline be the same
  • caution about combining data from different modes should be exercised [actually, i would want to combine everything i possibly can. more people contributing in more modes seems to be more representative than excluding people because they aren’t identical]
  • how different is online nonprobability from telephone probability  [and for me, a true probability panel cannot technically exist. its theoretically possible but practically impossible]
  • harris did many years of these studies side by side using very specific methodologies
  • measured variety of topics – opinions of nurses, bug business trust, happiness with health, ratings of president
  • across all questions, average correlation between methods was .92 for unweighted means and .893 for weighted means – more bias with weighted version
  • is it better for scales with many response categories – corrections go up to .95
  • online means of attitudinal items were on average 0.05 lower on scale from 0 to 1. online was systematically biased lower
  • correlations in many areas were consistently extremey high, means were consistently very slightly lower for online data; also nearly identical rank order of items
  • for political polling, the two methods were again massively similar, highly comparable results; mean values were generally very slightly lower – thought to be ability to see the scale online as well as social desirability in telephone method, positivity bias especially for items that are good/bad as opposed to importance 
  • [wow, given this is a study over ten years of results, it really calls into question whether probability samples are worth the time and effort]
  • [audience member said most differences were due to the presence of the interviewer and nothing to do with the mode, the online version was foudn to be truer]

representative web survey

  • only a sample without bias can generalize, the correct answer should be just as often a little bit higher or a little bit lower than reality
  • in their sample, they underreprested 18-34, elementary school education, lowest and highest income people
  • [yes, there are demographic differences in panels compared to census and that is dependent completely on your recruitment method. the issue is how you deal with those differences]
  • online panel showed a socially positive picture of population
  • can you correct bias through targeted sampling and weighting, ethnicity and employment are still biased but income is better [that’s why invites based on returns not outgo are better]
  • need to select on more than gender, age, and region
  • [i love how some speakers still have non-english sections in their presentation – parts they forgot to translate or that weren’t translatable. now THIS is learning from peers around the world!]

measuring subjective wellbeing: does the use of websurveys bias the results? evidence from the 2013 GEM data from luxembourg

  • almost everyone is completely reachable by internet
  • web surveys are cool – convenient for respondents, less social desirability bias, can use multimedia, less expensive, less coding errors; but there are sampling issues and bias from the mode
  • measures of subjective well being – i am satisfied with my life, i have obtained all the important things i want in my life, the condition of my life are excellent, my life is close to my ideal [all positive keyed]
  • online survey gave very slightly lower satisfaction
  • the results is robuts to three econometric techqnies
  • results from happiness equations using differing modes are compatible
  • web surveys are reliable for collecting information on wellbeing

America: You aren’t representative of the world #MRX #WAPOR

At one of yesterday’s talks, someone mentioned they sometimes felt that #AAPOR didn’t realize the rest of the world isn’t American.

A second presentation this morning was a brutal reminder. Here are a few interesting tidbits I heard from Gina Cheung at the University of Michigan.

– they sometimes needed to send two interviewers to every house because men could only talk to men and women could only talk to women
– they didn’t understand why there were such long pauses in the interview times until they realized people had to stop to pray
– though many people had good bandwidth, they still needed to find internet cafés to upload data at reasonable speeds.

Researchers really need to remember that the world does not live in neighbourhoods like theirs, have jobs like theirs, eat like they do, or have access to services like they do. Even in our own country. Think outside your country. Think drastically less privileged.

Food for thought.

I love social media research because: You listen to people who haven’t participated in research #6 #MRX

focus group venn diagramI recently wrote a blog post citing ten of the things I love about social media research. Today I address love #6.

Social media research lets you listen to people who may have never participated in traditional forms of research.

Look to your left and look to your right. Neither of those people are on survey panels. Neither of those people are answering questions about brands, products, or services. Neither of those people are purposefully helping brands improve the products that they use every day.

Now look to your left and look to your right. Both of those people use Facebook. One of them probably uses Twitter. One of them probably leaves silly comments on YouTube or Flickr. One of them might even have their own blog where they write volumes of  opinions about all sorts of things.

The best way to get a well-rounded opinion about a brand is not to conduct a survey nor to conduct a focus group. They best way is to gather opinions from all kinds of people, those who know what a survey panel is and like answer surveys, those who have enough guts and spare time to take 3 or 4 hours out of their day to participate in a focus group, AND those who like to chat in social media.

Those three groups of people aren’t 100% overlapping so when you take advantage of all three groups, you’ll get a clearer picture of reality. And that’s all researchers want.

I hate social media research because: It’s not a rep sample #2 #MRX


I recently wrote a blog post citing ten of the biggest complaints about social media research. Today I address complaint #2.

It’s not a representative sample.

Part 1. Are we really going to go there? I guess we ought to. In 99.9% of market research, we aren’t using a representative sample in the strict sense of the word. Survey panels aren’t probability samples. Focus groups aren’t probability samples. Market research generally uses convenience samples and social media research is no different.

But here is the difference. We’ve all heard the statistic that a tiny percentage of people answer the majority of all market research surveys. In other words, most people aren’t participating in the survey experience and we never hear their opinion. Similarly, when we conduct social media research, we only listen to people who wish to share their opinions on Facebook, Twitter, YouTube, or any of the other millions of websites where they can write out their opinions. No matter what research method you choose, you only hear the people who wish to contribute their opinion in that mode.

Part 2. Who is talking about the brand anyways? Alright, so we know SMR doesn’t use rep samples. Big deal. One of the reasons we use rep samples in traditional research is to ensure we are talking to the right people. We do a rep sample because a product is used by a rep sample. We do a male only sample because a product is used by males only. In both cases, we choose a particular sample because it is most likely to reflect product triers and users. Guess what. The only people talking about your brand in social media are the people who care about your brand. Whether they hate your brand or love your brand, you have instantly reached the people who are relevant to your brand. They have raised their hand to tell you, “Listen to me. I have an opinion about your brand.”

If you require a rep sample, you ought to use a survey because that is the closest approximation. Always use the right method for the job.

True or False: True, but does it matter?

I’m sorry but representative samples are 100% unattainable

[tweetmeme source=”lovestats” only_single=false]Statistics are just numbers. 1 + 2 is always 3 even if the 2 was written in a disgusting colour. People, on the other hand, have crappy days all the time. It could be because a lunch was packed without cookies or because horrible tragedy has struck.

So why does it matter? Because crappy days mean someone:

  • doesn’t answer a phone survey
  • lies on their taxes
  • makes a mistake on the census survey
  • accidentally skips page 2 on a paper survey
  • drips sarcasm all over their facebook page

You recognize these. We call them data quality issues.

Statistics lull us into a false sense of accuracy. Statistics are based on premises which do not hold true for beings with independent thought. Statistics lead us to believe that representative samples are possible when theory dictates it is impossible. Though a million times better than the humanities can ever dream of achieving, even “real” science can’t achieve representative samples. The universe is just far too big to allow that.

In other words, even when you’ve done everything statistically possible to ensure a rep sample, humans and their independent thought have had a crappy day somewhere in your research design.

There is no such thing as a rep sample. There are only good approximations of what we think a rep sample would look like.

And because I AM CANADIAN, I apologize if I have crushed any notions.

Read these too

  • #Netgain5 Keynote Roundup: Last Thoughts #MRX #li
  • The Death of Social Media Research #MRX
  • Will it blend?
  • The Dumbing Down of America (and Canada)
  • 10 items you must include in every successful list
  • %d bloggers like this: