WAPOR opened with a bang as David Fan described the statistical techniques he used to organize the accepted papers into relevant bunches. The key terms included cluster analysis and the traveling salesman approach as a number of presenters were asked to determine which of the other accepted papers were most similar to theirs. One of the methodological issues that had to be dealt with was that some presenters were forced to back out at the last minute such that the carefully designed grouping didn’t end up being perfect. Alas, as with every research project, errors creep in.
And in case you’re curious, no, there was no parade of WAPOR figure heads each welcoming us with a short prepared talk. There were no dance routines, fun videos, or Nice tourism representatives. Yes, a room full of data geeks got a truly geeky talk from the head geek. I’m still chuckling about it. 🙂
Rather than summarize the talks I went to, I’ll mention a few interesting tidbits and a few thoughts that came to mind for me.
- Do you ever consider responder needs, not your own needs? When you’re designing surveys, do you ever really think about what the responder needs as part of the research process? I know you want quality data and you want to design surveys that generate quality data, but do you really think about the fact that responders may want to answer a survey on a phone because they can take it to a private room or a quiet room? Similarly, do you realize that people may not want to answer a phone survey because there are other people in the room or it’s too noisy for them? Stop fussing over whether you do or don’t want people to take a survey on their phone. Give them the tools to give you the best data they can – from a quiet room, a private room, or anywhere.
- People don’t fan pages they don’t like. One of the speakers mentioned that people don’t fan brand pages if they aren’t truly fans of the brand. Well, that’s not completely true. Many people ‘fan’ or ‘like’ a page so that they can leave a complaint or criticism on it. Or, they want to monitor what the brand is doing to see how it compares to their loyal brands. Or, they like the page to learn about discounts and coupons that they can redeem with their own brand. Whether Facebook or Twitter, it doesn’t matter what the social network names the buttons – people will click on the button that suits their purpose.
- Social media data has yet to be validated. Someone also mentioned that social media data is taking a while to become widely used because the data itself hasn’t been validated yet. For instance, if someone tweets that they went to McDonald’s, did they really go to McDonald’s. I found this comment kind of funny coming from someone in the survey world. Hm… if someone says on a survey that they went to McDonald’s, did they really go to McDonald’s? Something to ponder!
- Why are Google, Facebook, and Microsoft so far ahead in research? This comment came up as a tangent and was never answered by the speaker, but I’ll take it on here. Why? Because they aren’t research companies. They don’t have to fuss and fret and worry that their norms and standards will be royally screwed up. They aren’t worried about fitting 412 questions into 5 minutes of survey time. They aren’t trying to figure out how to make their product ‘fun.’ We DO have to worry about these things. Actually, I disagree that we have to worry. If we keep worrying as we have been, then Google and Facebook and Microsoft will wipe our faces with their research. If we don’t get with the times and become our own thought leaders, that’s what’s going to happen. Be aware of your norms and be cautious as you change them. Make the research experience enjoyable as it should be. It’s your business at stake. Stop talking. Start doing. (me included!)
- Are AAPOR guidelines too American? You know, I never really thought of that before. There are a number of organizations in the research world that want to be global. Given that WAPOR is the world version of AAPOR, I must conclude that AAPOR does want to be global. Yes, as was mentioned during today’s talk, most of the AAPOR guidelines are drawn with first world, English countries in mind – everyone has a phone, everyone has a smart phone, everyone has a physical legal home. Do the AAPOR guidelines make it easy or even possible for people in other countries to conduct ‘good’ research? It’s worth a ponder.
- Let’s stop the probability/non-probability debate. Hear hear! I don’t believe there is such thing as a probability sample in the human world (generally speaking). Yet, AAPOR continues to promote the idea. You see, even if you COULD know an entire population and select a random sample, people will still decline to participate, quit participating, answer questions incorrectly, misread questions, lie on questions, etc. The assumption is that probability samples create perfect data and this is just never the case. I would love it if we could just drop the whole probability superiority complex and get on with our work.
- Candy is a legitimate snack. Breaktime featured a fine selection of…. candy? yes, candy. For the second time today, I was happily shocked. Someone later mentioned that fruit was also available but I don’t know what that is and I didn’t see it. So they lied.
And that, my friends, is the Day 1 wrap!
GUN control, he said GUN control! 🙂
- The Conference Presenter Gender Gap #WAPOR #AAPOR #MRX (lovestats.wordpress.com)
- I poo poo on your significance tests #AAPOR #MRX (lovestats.wordpress.com)
- How many survey contacts is enough? #AAPOR #mrx (lovestats.wordpress.com)
- President of American Association of Buggy-Whip Manufacturers takes a strong stand against internal combustion engine, argues that the so-called “automobile” has “little grounding in theory” and that “results can vary widely based on the particular fuel t (andrewgelman.com)
The Web Within Us: When Minds and Machines Become One
Ray Kurzweil, Author, Inventor, Futurist, Director of Engineering, GOOGLE
- data is amazingly predictable
- brains are designed to make linear predictions, originally for personal safety and survival
- step 30 of linear progression is prediction, step 30 of exponential progression is a billion and far more than a prediction
- we can change any outdated software in our bodies – genes – insulin receptor genes need to change because we know the next hunting season at the supermarket will be good; we can now fix damaged hearts, we’ve modified stem cells
- Moore’s Law – over the history of computing hardware, the number of transistors on integrated circuits doubles approximately every two years, progression is very predictable
- will i double my consumption? yes, we more than double it each year. increases 18% per year in constant currency, reason is innovation and invention
- Law of Accelerating Returns – An analysis of the history of technology shows that technological change is exponential, contrary to the common-sense intuitive linear view
- 3D printing is in the hype phase right now, won’t make it big until 2020 because resolution isn’t good enough yet, will be able to print out clothing by 2020, pennies per pound, there goes the fashion industry… no?
- we used to have to send books, movies, and music by fedex but now we can do it in an email, much of it is free but people still spend money on these things, revenues in these industries is going up fueled by ease of transportation
- body doesn’t yet recognize cancer but maybe we’ll be able to download an app that can do that for us
- neocortex used to be the size of a postage stamp but it was capable of a new kind of thinking, it allowed invention and innovation – invent a new path of escape, inventions within a person’s lifetime not within generations of lifetimes
- neocrotex is still a flat structure but now it’s the size of a table napkin, the curves and ridges allow it to expand the surface area and now it’s 80% of the brain
- amygdala no longer decides what to be afraid of, the neocortex does
- by adult, many of the connections that were there but never used have died out
- when a person is blind, does the visual cortext die off? no, it moves on to help out with language
- WATSON was a test of human intelligence on Jeopardy, those queries involve humor, jokes which we think only humans can do; got it’s knowledge from reading wikipedia and many other websites, not from the engineers; it makes up for weak understanding of what it’s read with the volume of pages read
- eventually we will communicate directly with others using nanobots that communicate with neurons
- we will become a hybrid of biological and nonbiological thinking
- Have you backed up your laptop lately? what about your ‘mindfile’
- Enjoy this video of income and life expectancy over time
- [I’ve done a poor job of transcribing his thoughts. Buy his book. 🙂 ]
Probability and Non-Probability Samples in Internet Surveys
Moderator: Brad Larson
Understanding Bias in Probability and Non-Probability Samples of a Rare Population John Boyle, ICF International
- If everything was equal, we would choose a probability sample. But everything is not always equal. Cost and speed are completely different. This can be critical to the objective of the survey.
- Did an influenza vaccination study with pregnant women. Would required 1200 women if you wanted to look at minority samples. Not happening. Influenza data isn’t available at a whim’s notice and women aren’t pregnant at your convenience. Non-probability sample is pretty much the only alternative.
- Most telephone surveys are landline only for cost reasons. RDD has coverage issues. It’s a probability sample but it still has issues.
- Unweighted survey looked quite similar to census data. Looked good when crossed by age as well. Landline are more likely to be older and cell phone only are more likely to be younger. Landline more likely to be married, own a home, be employed, higher income, have insurance from employer.
- Landline vs cell only – no difference on tetanus shot, having a fever. Big differences by flu vaccination though.
- There are no gold standards for this measure, there are mode effects,
- Want probability samples but can’t always achieve them
A Comparison of Results from Dual Frame RDD Telephone Surveys and Google Consumer Surveys
- PEW and Google partnered on this study; 2 question survey
- Consider fit for purpose – can you use it for trends over time, quick reactions, pretesting questions, open-end testing, question format tests
- Not always interested in point estimates but better understanding
- RDD vs Google surveys – average different 6.5 percentage points, distribution closer to zero but there were a number that were quite different
- Demographics were quite similar, google samples were a bit more male, google had fewer younger people, google was much better educated
- Correlations of age and “i always vote” was very high, good correlation of age and “prefer smaller government”
- Political partisanship was very similar, similar for a number of generic opinions – earth is warming, same sex marriage, always vote, school teaching subjects
- Difficult to predict when point estimates will line up to telephone surveys
A Comparison of a Mailed-in Probability Sample Survey and a Non-Probability Internet Panel Survey for Assessing Self-Reported Influenza Vaccination Levels Among Pregnant Women
- Panel survey via email invite, weighted data by census, region, age groups
- Mail survey was a sampling frame of birth certificates, weighted on nonresponse, non-coerage
- Tested demographics and flu behaviours of the two methods
- age distributions were similar [they don’t present margin of error on panel data]
- panel survey had more older people, more education
- Estimates differed on flu vaccine rates, some very small, some larger
- Two methods are generally comparable, no stat testing due to non-prob sample
- Trends of the two methods were similar
- Ppanel survey is good for timely results
Probability vs. Non-Probability Samples: A Comparison of Five Surveys
- [what is a probability panel? i have a really hard time believing this]
- Novus and TNS Sifo considered probability
- YouGov and Cint considered non-probability
- Response rates range from 24% to 59%
- SOM institute (mail), Detector (phone), LORe (web) – random population sample, rates from 8% to 53%
- Data from Sweden
- On average, three methods differ from census results by 4% to 7%, web was worst; demos similar expect education where higher educated were over-represented, driving licence over-rep
- Non-prob samples were more accurate on demographics compared ot prob samples; when they are weighted they are all the same on demographics but education is still a problem
- The five data sources were very similar on a number of different measures, whether prob or non-prob
- demographic accuracy of non-prob panels was better. also closer to political atittudes. No evidence that self recruited panels are worse.
- Need to test more indicators, retest
Modeling a Probability Sample? An Evaluation of Sample Matching for an Internet Measurement Panel
- “construct” a panel that best matches the characteristics of a probability sample
- Select – Match – Measure
- Matched on age, gender, education, race, time online, also looked at income, employment, ethnicity
- Got good correlations and estimates from prob and non-prob.
- Sample matching works quite well [BOX PLOTS!!! i love box plots, so good in so many ways!]
- Non-prob panel has more heavy internet users
- Thoughts on the CMRP designation #MRX #NewMR (mriablog.wordpress.com)
- Minimizing Nonresponse Bias (GREAT session) #AAPOR #MRX (lovestats.wordpress.com)
- The Roles of Blogs in Public Opinion Research Dissemination #AAPOR #MRX (lovestats.wordpress.com)
- AAPOR Women Leaders Share Their Insights #AAPOR #MRX (lovestats.wordpress.com)
… Live blogging from downtown Toronto…
Surveys in a Snap!
Paul McDonald, Product Manager, Google Consumer Surveys
Practical applications of Big Data – How Google uses data
to make better products
- They work with anonymized data, don’t want to learn everything about individual users
- Passively collect data to create predictive models with probabilistic outcomes
- 30 trillion web pages in their index, 1.2 trillion search queries per year, 400 million transactions per year through google wallet
- information becomes insight when it’s placed in context
- Start with a business purpose
- Bayesian statistics work with initial probabilities and then adjust the probabilities based on new data
- They use Bayesian big data statistics to predict demographics on Google surveys
- Center of Disease Control data for the flu matches Google searches perfectly – can predict how intense the flu season is; indeed CDC uses this data to predict flu trends
- 15% of search queries every day have never been seen before – they use distance between words, query history, frequency of words and phrases
- Use bayesian stats to automatically code open ends in survey forms, e.g., finish off words or phrases
- Mileage from your Dashboard: Adam Froman #Netgain7 #MRX (lovestats.wordpress.com)
- ABCs of the Olympic Games: Melanie Courtright, Prince De #Netgain7 #MRX (lovestats.wordpress.com)
- Data Mining and Predictive Modeling: Andrew Grenville, Kevin Dang #Netgain7 #MRX (lovestats.wordpress.com)
- Socializing Research: Mike Rodenburgh #Netgain7 #MRX (lovestats.wordpress.com)
- Small vs Big Data: Fabien Rolland, Michel Girard #Netgain7 #MRX (lovestats.wordpress.com)
- Future Trends in the Industry: Simon Chadwich, Cambiar #Netgain7 #MRX (lovestats.wordpress.com)
- Fear of Google surveys. Google surveys are simply a tool, and a tool is not research. For results from a Google survey to make sense, they need to be accompanied by a qualified researcher who understands what the appropriate sampling frame is, and how to best interpret the results so as to not exceed the level of validity offered by the tool. A tool without a qualified researcher is a prescription for failure.
- Fear of DIY surveys. Similarly, researchers have nothing to fear in other DIY tools, regardless of how much more flexibility they offer beyond the simple Google tool. A DIY study can have no more validity than the person who designs and administers the research. Poor sampling, poor design, poor analysis, and poor interpretation are all that will result from a DIY study that does not include a competent researcher. CEOs, brand managers, and marketing managers need validity and reliability not random chunks of data.
- Fear of new research methodologies. As social media research becomes a generally recognized methodology, and gamification starts to become more recognizable, some researchers are hunkering down into their faithful and familiar methodologies. New is unknown. New is risky. New must be feared. Well, new must be feared if you are prepared to watch your business slowly whittle down as other research companies step in to offer those new options. Don’t be fearful. Get in on the action. Learn the new and how it can make your existing offering even better. There’s much good to be found in the new.
- Fear of losing norms. By trying a new methodology, any study on a tracking or templated design is bound to lose all normative data. How terrible. How terrible that you’ve decided to maintain old, less valid, and less useful methodology than create new norms. Be prepared to fear the day when your results cease to make sense because they have lost all validity in the new world.
- Fear of saying no to a client. 60 minute surveys, 30 items grids, 10 point scales, and more. We consistently hate on these things and yet those surveys get programmed, their response rates drop, and we complain about their data quality. Don’t be scared of your clients. Demand quality on their behalf. Create a reputation of quality not complacency.
- Fear of statistics. I’m really tired of researchers, whether qualitative or quantitative, joke about being scared of numbers and statistics. There’s nothing to be proud of there. Actually, there’s a whole lot to be ashamed of. Researchers are supposed to know a lot about statistics so that we can be smart about how we use them, when we use them, how to interpret them, and when to abandon them. Stop being fearful and start being qualified researchers.
- Twinkies eBay: Hostess Treats On Sale For $200,000 Amid Twinkie-pocalypse Fears (jtm71.wordpress.com)
- The Day The Twinkie Died (geekalabama.com)
- The Twinkie Apocalypse Has Begun! (confessionsofapsychotichousewife.com)
- Twinkies On Sale For $200,000 (huffingtonpost.com)
When Google announced their survey capabilities, the market research space was abuzz with anticipation. Oh, the possibilities! Clients, of course, were eager to learn about a new option that might be better and cheaper than what market research organizations have to offer. On the other hand, market researchers wondered if they ought to be fearful of the competition. Whichever side of the fence you’re on, it was clear that when Google spoke at MRMW, the room would be full.
Paul McDonald, the Google rep, shared lots of great information about the tool and the audience was genuinely impressed. How could you not be impressed with the smooth and clean design and the quick responses.
But we’re market researchers. We know (or we should know) about statistics and probability sampling and what makes good quality data. So it puzzled me when I saw margin of error reported on their survey results. Margin of error shouldn’t be reported on non-probability samples.
During the break, I asked the person manning the Google demo table about the reason for reporting margin of error. But alas, no answer for me.
However, Google is monitoring the MRMW tweets for they provided this answer to me.
Unfortunately, “stratified sampling according to census rep” has nothing to do with probability sampling. Margin of error can only be reported on probability samples whereby all people have an equal and independent chance of being selected for inclusion. So, if Google wants to report margin of error, then they must insist that their research results only be generalized to people who use Google, people who use the websites on which Google displays the surveys, and people who don’t use ad-block (I’m guessing). There are probably some other conditions in there but I’m obviously not familiar with the technicalities of how Google does their research. Regardless, as soon as you stray from the very basic conditions, you have fallen into convenience sampling territory and margin of error is no longer appropriate to display.
Google has kindly prepared a white paper (Comparing Google Consumer Surveys to Existing Probability and Non-Probability Based Internet Surveys) for those of us interested in the details of their product. I enjoyed reading all the criteria that explained why Google surveys don’t use probability sampling. Do read the white paper as you’ll probably be impressed with the results regardless. And keep in mind that survey panels can’t provide probability samples. Even though someone claimed that’s what they gave Google.
But really, who CARES if it’s a probability sample? 99.9%(a) of all market research does not use probability samples and we get along pretty well. Market researchers understand the issues of not using probability sampling, they understand how to interpret and analyze non-probability results, they know how to create clear and unbiased market research, etc. It’s not that we want probability samples. It’s that we want the smarts to tell us when our non-probability samples aren’t good enough.
I’ll let you know if Google follows up…
Postscript: A google rep and I are in the midst of emails about what type of data warrants use of the margin of error. I’ve been sent this link. If you’re statistically inclined, do have a read. ftp://ftp.eia.doe.gov/electricity/mbsii.pdf
(a) I totally made up that number. I have no clue what percentage of market research uses probability sampling. But since most of us use survey panels, focus groups, mall intercepts, mobile surveys etc you get my point.
- Google surveys, and oh, some other people too #MRMW #MRX (lovestats.wordpress.com)
- The Top 10 Things We Love About Social Media Research #MRX (lovestats.wordpress.com)
- I hate social media research because: It’s not a rep sample #2 #MRX (lovestats.wordpress.com)
- What Market Research in the Mobile World means to me #MRX #MRMW (lovestats.wordpress.com)
Welcome to this series of live blogs from the Market Research in the Mobile World Conference in Cincinnati. With so many sessions, I’m only blogging about a few sessions each day. All posts appear within minutes after the speaker has finished. Any errors, omissions, or silly side comments are my own. I’ll also be providing end of day summary blog posts for Esomar so keep your eyes peeled for those as well.
Thaddeus Fulford-Jones, Locately; Understanding the TRUE shopper journey
- Mobile data is real behaviour not claimed behaviour
- Retailer had a problem doing in store shopper intercepts
- GPS helped them discover how shopping choices worked
- Watch on a google map as a shopper drives from walmart to target. See the roads they choose, the turns they make.
- Walk out empty-handed due to browse in store, spend online? simply browsing? Out of stock?
Zoe Dowling, Aliza Pollack, Added Value: How we got over the incredibly inspiring awesomeness of mobile qual and learned to develop some killer new insights approaches
- [great title :)]
- We can’t be with consumers are the time. Asking people about an experience changes the experience.
- What about text notes, voice, video, gps, push notifications, barcode scans, checkins, photos
- Is mobile the holy grail? No 😦 It’s just a tool.
- Consumer may show a photo of his cologne or his ties but a video captures the photo of tupac in the background
- Do pre post and during work [and if you forgot to do pre, then you’ve got social media research for that]
- it’s a tool to capture the right data
- “This isn’t cheap” [why is anyone interested in cheap research? how do you use crap data?]
Paul McDonald Google Surveys: Y U No Like Me?
- [Paul’s been given permission to demo his product. I like being told that ahead of time.]
- Memes all have a little nugget of truth in them
- response rate with 1 question 35%
- 2 answer options 20% response rate, 3 answer options 30% response rate, not linear at all
- Google survey puts a survey right in the middle of an article, even two questions together, and the question just disappears when you’re done
- Nice demographic sliders on the questions, just move the slider closer to male and you get more weighting on males
- Release to production 8 times per day, i.e., new code is uploaded regularly throughout the day. Often it is client bugs or trying something new. Less than a day from hearing about it to implementing it.
- More smartphones than PCs shipped in 2011
- in 2012, 400 million new android phones activated – 1 million per day
- Why answer a survey on a phone? To get an app for free. [i’m the outlier here. that’s no incentive for me]
- Mobile phones let you know location, apps they use, transactions, music they own, books they read, websites they visit, date, time [doesn’t this scare you? scares me]
- Voice recognition and speech to text might make openends a lot more rich
- Market research was not the original goal. This was a way to get people to access content. Small and medium businesses are the main users. “The power of sample”
- People who didn’t have access to research before now have some access
- A Plea to the Canadian Government #MRX (lovestats.wordpress.com)
- The Top 10 Things We Love About Social Media Research #MRX (lovestats.wordpress.com)
- What Market Research in the Mobile World means to me #MRX #MRMW (lovestats.wordpress.com)
- I hate social media research because: It doesn’t measure awareness #3 #MRX (lovestats.wordpress.com)
- I love social media research because: You get instant results #2 #MRX (lovestats.wordpress.com)
Chances are I didn’t give you permission to read this. You probably found this blog because of a google search or a twitter search or some other online search service. You found it through google because I checked a box on my wordpress profile page that said yes, google has permission to index this. Or, you saw a link that I personally posted on twitter and you clicked through it. Either way, you only got here because I let you, I gave you permission.
I fully expect that someone at some point will read this. Maybe even a few people. That’s what blogs are for. But, the average person doesn’t maintain a blog and they probably don’t tweet all day long. The average person probably makes the occasional update to their facebook or myspace page, or the occasional comment on youtube, or writes some scathing remark on an opinion site about some crappy product they just bought.
The more enlightened folks might know that companies actually go online and search for comments naming their products to see what people think of them. I do think, however, that most people probably don’t know that companies use automated systems to seek, collect, and evaluate those comments.
As researchers, it might seem obvious that companies would gather data online. Given the current state of the monitoring industry, if I tweet “Scor Bars suck monkeys” or “I abhor Great Canadian Superstores,” someone from those companies might try to get ahold of me to diffuse a potentially negative situation.
But wait… When I tweeted those opinions, did I actually want someone from the chocolate factory to seek me out and try to convince me that Skor Bars are totally rad? Definitely not. If I really wanted a reply from the company, I would have gone to their website and made my request there. I’m pretty sure I can find their website and I’m pretty sure I know how to fill in a form. For me personally, to have a company contact me because of a tweet I wrote would be an invasion of my personal space. Unless, of course, my tweet was @ them specifically.
Everything on the internet is there because each person agreed to put it there. If you didn’t want someone to read it, you wouldn’t have typed it out on youtube or blogger or flickr. You would also password protect your blog and turn off the search engine and RSS features. I think reading the internet for brand information is fair. Companies should read and learn and make solid business decisions based on data that is readily available. Companies should also know when to draw the line between someone who is actually seeking answers and someone who is just yapping off at the mouth.
I think there are still missing pieces though. People need to be educated that whatever they put online will actually be collected. If it isn’t password protected, it will be collected. If they don’t tell google to not index them, it will be collected. I think a lot of people will be peeved that their data is being collected without payment and then analyzed and sold for profit. It will require a lot of education, and it will be a bumpy ride as people struggle to protect their privacy rights. I think this is our responsibility as researchers and we need to spend the time to do it right. But, our industry will be better for it.
Aren’t you excited? If you do a quick search on google or twitter for “paid surveys,” you will find a huge listing of sites that claim you can make a living or makes lots of money taking surveys. Google found me 1,670,000 opportunities. WordPress found 845 blog entries hoping to entice me. Please, please, please, accept my public service announcement. You CANNOT make a living taking surveys. Most of these claims are exaggerations, some are outright fabrications. If it sounds too good to be true, it is.
That’s not to say there are no benefits to answering surveys. You might earn $5 or $10 now and then. You might even be so lucky as to win a TV or a car, but of course, though your chances of winning these prizes are far better than your chances of winning the lottery (trust me, I know), your chances of making it big are pretty much null.
So why bother answering surveys if you CAN’T rake in the bucks? Well, they are interesting, you learn about new products on the market, you might get to try new products, and you are helping manufacturers improve products so that they better meet your needs. Look at the money as a bonus. It may not be billions and trillions of dollars but ‘ll take 5 bucks now and then!