Do Google Surveys use Probability Sampling? #MRX #MRMW

When Google announced their survey capabilities, the market research space was abuzz with anticipation. Oh, the possibilities! Clients, of course, were eager to learn about a new option that might be better and cheaper than what market research organizations have to offer. On the other hand, market researchers wondered if they ought to be fearful of the competition. Whichever side of the fence you’re on, it was clear that when Google spoke at MRMW, the room would be full.

Paul McDonald, the Google rep, shared lots of great information about the tool and the audience was genuinely impressed. How could you not be impressed with the smooth and clean design and the quick responses.

But we’re market researchers. We know (or we should know) about statistics and probability sampling and what makes good quality data. So it puzzled me when I saw margin of error reported on their survey results. Margin of error shouldn’t be reported on non-probability samples.

During the break, I asked the person manning the Google demo table about the reason for reporting margin of error. But alas, no answer for me.

However, Google is monitoring the MRMW tweets for they provided this answer to me.

Unfortunately, “stratified sampling according to census rep” has nothing to do with probability sampling. Margin of error can only be reported on probability samples whereby all people have an equal and independent chance of being selected for inclusion. So, if Google wants to report margin of error, then they must insist that their research results only be generalized to people who use Google, people who use the websites on which Google displays the surveys, and people who don’t use ad-block (I’m guessing). There are probably some other conditions in there but I’m obviously not familiar with the technicalities of how Google does their research. Regardless, as soon as you stray from the very basic conditions, you have fallen into convenience sampling territory and margin of error is no longer appropriate to display.

Google has kindly prepared a white paper (Comparing Google Consumer Surveys to Existing Probability and Non-Probability Based Internet Surveys) for those of us interested in the details of their product. I enjoyed reading all the criteria that explained why Google surveys don’t use probability sampling. Do read the white paper as you’ll probably be impressed with the results regardless. And keep in mind that survey panels can’t provide probability samples. Even though someone claimed that’s what they gave Google.

But really, who CARES if it’s a probability sample? 99.9%(a) of all market research does not use probability samples and we get along pretty well. Market researchers understand the issues of not using probability sampling, they understand how to interpret and analyze non-probability results, they know how to create clear and unbiased market research, etc. It’s not that we want probability samples. It’s that we want the smarts to tell us when our non-probability samples aren’t good enough.

I’ll let you know if Google follows up…

Postscript: A google rep and I are in the midst of emails about what type of data warrants use of the margin of error. I’ve been sent this link. If you’re statistically inclined, do have a read.


(a) I totally made up that number. I have no clue what percentage of market research uses probability sampling.  But since most of us use survey panels, focus groups, mall intercepts, mobile surveys etc you get my point.

9 responses

  1. […] Market Research in the Mobile World Event Interview with Paul McDonald, Google Consumer Surveys Do Google Surveys use Probability Sampling? Guest Blog: Sean Conry – Report from Market Research in the Mobile World 8 Things, Revisited […]

  2. Michael J Deis

    It is interesting that the engineers at Google, Jana, etc., who are probably more conversant with math than we opinion researchers, don’t seem to be too concerned about probability sampling. It seems to be an issue that AAPOR, CASRO, WAPOR, ESOMAR ought to be addressing. Then again, the marketing research community doesn’t really seem to worry too much about the issue either.

    As a high level TNS executive once told me, and I paraphrase — my clients aren’t worried about probability sampling, they just want consistency in the sampling we provide. Its hard to criticize Google or Jana, if we don’t have our act together as an industry on this issue.

    1. I think your first comment says it all. “more conversant with math than we opinion researcher” How many real statisticians are in your company who don’t simply plug data into models?

    2. Michael J Deis

      It is great that you are writing on this and I take your point.

      To further clarify, I am would like to make a broader statement regarding industry standards. We as an industry no longer support probability sampling and the capacity to project to populations, instead we sell self-selected samples as though they were “probability” samples telling ourselves that it is “good enough”.

      Those of us conducting face-to-face probability samples with Kish selection procedures are dying out. Some of us still want to make population projections, but the industry considers us dinosaurs. Instead the mantra of the past 10 to 15 years or so has been that it does not matter as our samples are “good enough”. This process was exacerbated by the advent of the internet and web based surveys. (The exception to this was KnowledgeNetworks which made a broad and concerted intellectual effort to try and produce probability samples in the web space.)

      If we tell ourselves that our self-selected internet panels are good enough, why should an engineer at Google be held to a different standard. Instead, we ought to be looking at our own house to see if it is in order. An individual researcher at any company can only purchase the sampling that is available and industry standard, unless he has very deep pockets and can make the investment in something better. It would be nice if Google or Jana offered us something more, but I don’t see it as their raison d’être to re-invent an industry that only pays lip service to sampling quality. Rather, the folks at Google are just offering another “good enough” sample which is more or less in line with what the industry tends to offer for commercial marketing research.

      As an aside, I note that they are very careful to refer to their product as “Consumer Surveys” not population surveys, so someone is at least being careful not to over promise in regard to what is possible in terms of projections.

  3. I talked to Google representatives further about this. On Google+ of course. 🙂

  4. I’m no statistician so I’m not going to wade into THAT part of the debate. But I understand the implications of what’s being discussed here. I can’t help but wonder, however, if it’s simply a matter of end user expectations; when corporate clients receive research reports, there’s always mention of (or question about) “what’s to margin of error.” Seems like Google is making a statement that they feel their approach is so strong/representative enough as to be seen as a probabilistic sample.

    1. That could very well be a legitimate claim. It’s just that there are so many misconceptions around what margin of error means and when it can be used that I hate to muddy the waters even further. I’m sure the real statisticians out there will tell us what’s really going on.

  5. From their white paper: “Since this inferred demographic and location information can be determined in real time, allocation of a respondent to a survey is also done
    in real time, enabling a more optimal allocation of respondents across survey questions.” So if I understand this correctly then Google routes respondents to particular surveys based on demographic needs (Google weights by age, gender and location), which means it is no longer a truly random sample.

  6. Annie–thanks so much for tackling this painfully common misunderstanding! And for keep us (and Google) honest. I earnestly believe we have to be precise about how we use terms like “margin of error” and “confidence” not use them as tortured synonyms for “kinda reliable”.

%d bloggers like this: