Live note taking at #MRIA16 in Montreal. Any errors or bad jokes are my own.
Panel with Sean Copeland, Evan Lyons, Anil Saral, Ariel Charnin, Melanie Kaplan
- Timelines are very compressed now, instead of two or three months people are asking for hours to get answers
- It’s no longer 20 minute questions but quick questions
- Market research is often separate from data science and analytics but this team has put them together
- They don’t have to answer questions with surveys because they have the raw data and they know the surveys probably won’t be able to answer them accurately; they know when to use market research so that it is most effective
- When is MR the right solution and when do they partner with data scientists
- There is a divide between MR and data science which is strange because our goal of understanding consumers is the same
- We can see all th transactional data but without MR you miss the why, the motivator, one method doesn’t answer the entire question
- We need to train and mentor younger researchers [please join http://researchspeakersclub.com ]
- Some mistrust of quantitative data, are panels rep, why do the numbers change month to month, reexploring Qual to understand the needs and wants, clients remember specific comments from specific focus groups which helps the time to see the issues
- A doctor is still a doctor even when they use a robot, the same is true for consumer insights with surveys and data science
- Don’t be protective of your little world, if a project comes to you and is better answered by another method then you are wise to pass it to those people
- You need to appreciate what MR offers and what analytics offers, both have strengths and weaknesses you need to understand
- A new language may be morphing out of the combination of MR and data science
- Everyone believes they are providing insight, of course both sides can do this whether it’s projects and models and understanding the why, insights need to be both of these
- Still need to be an advocate for MR, can’t just go to data science very time even if it’s the new great toy
- Live Flow Data – is this a reality, it will happen, can already see 5 day forecast of weather and know about upcoming conferences and how many tickets were sold for a week from now; monthly assumptions from data could happen
- They can see the effects of ads immediately in live data
- They don’t want to hear what happened yesterday, need to know what’s happening now
- Future of our business is understanding people and solving problems, you always need more information to do this; if you learn new things, you can do more things and solve more problems
- Need more skills in strategy and merging with insights, don’t just hand off reports, help clients take insights and turn them into the next initiative
- Is it one story or multiple stories after you’ve got all the data put together
- Don’t just deliver a product and then leave it, our results are only as accurate as the people who interpret it; research can say a hamburger should look exactly like this but when the end product designers change all the tiny little things to be more convenient then you wine up with a completely wrong hamburger in the end
Enjoy my live note taking at AAPOR in Austin, Texas. Any bad jokes or errors are my own. Good jokes are especially mine.
Moderator: Masahiko Aida, Civis Analytics
Employing Machine Learning Approaches in Social Scientific Analyses; Arne Bethmann, Institute for Employment Research (IAB) Jonas F. Beste, Institute for Employment Research (IAB)
- [Good job on starting without a computer being ready. Because who needs computers for a talk about data science which uses computers:) ]
- Demonstration of chart of wages by age and gender which is far from linear, regression tree is fairly complex
- Why use machine learning? Models are flexible, automatic selection of features and interactions, large toolbox of modeling strategies; but risk is overfitting, not easily interpretable, etc
- Interesting that you can kind of see the model in the regression tree alone
- Start by setting every case in a sample to 0, e.g., male and female are both 0; then predict responses for every person; calculate AME/APE as mean difference between predictions for all cases
- Regression tree and linear model end up with very different results
- R package for average M effects – MLAME on github
- MLR package as well [please ask author for links to these packages]
- Want to add more functions to these – conditional AME, SE estimation, MLR wrapper
Using Big Census Data to Better Understand a Large Community Well-being Study: More than Geography Divides Us; Donald P. Levy, Siena College Research Institute Meghann Crawford, Siena College Research Institute
- Interviewed 16000 people by phone, RDD
- Survey of quality of community, health, safety, financial security, civic engagement, personal well being
- Used factor analysis to group and test multiple indicators into factors, did the items really rest within in each factor [i love factor analysis. It helps you see groupings that are invisible to the naked eye. ]
- Mapped out cities and Burroughs, some changed over time
- Rural versus urban have more in common than neighbouring areas [is this not obvious?]
- 5 connections – wealthy, suburban, rural, urban periphery, urban core
- Can set goals for your city based on these scores
- Simple scoring method based on 111 indicators to help with planning and awareness campaigns, make the numbers public and they are shared in reports and on public transportation so the public knows what they are, helps to identify obstacles, help to enhance quality of life
Using Machine Learning to Infer Demographics for Respondents; Noble Kuriakose, SurveyMonkey; Tommy Nguyen, SurveyMonkey
- Best accuracy for gender inferring is 80%, Google has seen this
- Use mobile survey, but not everyone fills out the entire demographic survey
- Works to find twins, people you look like based on app usage
- Support vector machines try to split a scatter plot where male and female are as far apart as possible
- Give a lot of power to the edges to split the data
- Usually the data overlaps a ton, you don’t see men on the left and women on the right
- “Did this person use this app?” Split people based on gender, Pinterest is often the first node because it is the best differentiator right now, Grindr and emoticon use follow through to define the genders well, stop when a node is all one specific gender
- Men do use Pinterest though, ESPN is also a good indicator but it’s not perfect either, HotOrNot is more male
- Use time spend per app, app used, number of apps installed, websites visited, etc
- Random forest works the best
- Feature selection really matters, use a selected list not a random list
- Really big differences with tree depth
- Can’t apply the app model to the android model, the apps are different, the use of apps is different
Dissonance and Harmony: Exploring How Data Science Helped Solve a Complex Social Science Problem; Michael L. Jugovich, NORC at the University of Chicago; Emily White, NORC at the University of Chicago
- [another speaker who marched on when the computer screens decided they didn’t want to work 🙂 ]
- Recidivism research, going back to prison
- Wanted a national perspective of recidivism
- Offences differ by state, unstructured text forms means a lot of text interpretation, historical data is included which messes up the data if it’s vertical or horizontal in different states
- Have to account for short forms and spelling errors (kinfe)
- Getting the data into a useable format talks the longest time and most work
- Big data is often blue in pictures with spirals [funny comments 🙂 ]
- Old data is changed and new data is added all the time
- 30 000 regular expressions to identify all the pieces of text
- They seek 100% accuracy rate [well that’s completely impossible]
- Added in supervised learning and used to help improve the speed and efficiency of manual review process
- Wanted state specific and global economy models, over 300 models, used brute force model
- Want to improve with neural networks, auto make data base updates
Machine Learning Our Way to Happiness; Pablo Diego Rosell, The Gallup Organization
- Are machine learning models different/better than theory driven models
- Using Gallup daily tracking survey
- Measuring happiness using the ladder scale, best possible life to worst possible life, where do you fall along this continuum, Most people sit around 7 or 8
- 500 interviews everyday, RDD of landlines and mobile, English and Spanish, weighted to national targets and phone lines
- Most models get an R share of .29. Probably because they miss interactions we can’t even imagine
- Include variables that may not be justified in a theory driven model, include quadratic terms that you would never think of, expanded variables from 15 to 194
- [i feel like this isn’t necessarily machine learning but just traditional statistics with every available variable crossed with every other variable included in the process]
- For an 80% solution, needed only five variables
- This example didn’t uncover significant unmodeled variables
- [if machine learning is just as fast and just as predictive as a theory driven model, I’d take the theory driven model any day. If you don’t understand WHY a model is what it is, you can’t act on it as precisely.]
I recently debated big data with a worthy opponent in Marc Alley at the Corporate Research Conference. He stood firm in his belief that big data is the best type of data whereas I stood firm in my position that traditional research is the only way to go. You can read a summary of the debate written by Jeffrey Henning here.
The interesting thing is that, outside of the debate, Marc and I seemed to agree on most points. Neither of us think that big data is the be all and end all. Neither of us think that market research answers every problem. But both of us were determined to present our side as if it was the only side.
In reality, the best type of data is ALL data. If you can access survey data and big data, you will be better off and have an improved understanding of thoughts, opinions, emotions, attitudes AND validated actions. If you can also access eye tracking data or focus group data or behavioural data, you will be far better off and have data that can speak to reliability or validity. Each data type will present you with a different view and a different perspective on reality. You might even see what looks like completely different results.
Different is not wrong. It’s not misleading. It’s not frustrating. Different results are enlightening, and they are indeed valid. Why do people do different than what they say? Why do people present contradictory data? That’s what is so fascinating about people. There is no one reality. People are complex and have many contradictory motivations. No single dataset can describe the reality of people.
There is no debate about whether big data has anything to offer. Though Marc and I did our best to bring you to our dark side, we must remember that every dataset, regardless of the source, has fascinating insights ready for you to discover. Grab as much data as you can.
By now you’ve heard about the three Vs of big data. Whether your concern is millions of research panel records, billions of transactional records, or trillions of web tracking records, we all have the same problem. The volume of data increases exponentially, the variety of data keeps increasing, and the speed, well, let’s think lightspeed. These issues alone make big data a worthy opponent.
Big data is also rife with missing data. It’s incomplete data, and it’s complicated data. It needs specialized analytical tools and specialized analysts. But those problems are also not the reason we’re failing.
Why are we failing at big data? Well, let’s take a step back to the survey and focus group world that market researchers love so much. When I think back to the last survey I wrote, it too was quite the beast. For just twelve minutes of respondent time, I spent many hours dreaming of, writing, tweaking, rewriting, and retweaking every single question and answer. I pondered every the, or, if, they, you, and why. I argued with myself about the possible ramifications that every single word might have on my results. In every case, I settled on the best solution, not the right solution. In the end, I had a survey that would carefully address every single hypothesis and research objective on my list. This survey was a beauty and the analysis was quick and easy.
Let’s move forward to our big data project. You know, the one where someone dumped a giant SQL database with thousands of variables and billions of records on your plate and said, “Make our program better.” You weren’t really sure what the program was, you didn’t know what was currently good or bad about it, and none of the database variables matched up with any project plans or research objectives. Actually, there were no research objectives. Except for “make better.” I can assure that is NOT a solid research objective.
Imagine if someone collected together a hundred surveys from a hundred projects and told you to “make better.” I can guarantee you would fail at that survey analysis regardless of how many years of survey analysis you had behind you.
The simple reason we continue to fail at big data is that we fail to create concrete and specific research plans and objectives as we do for every other research project. We know very well that a survey project will fail without carefully operationalized objectives but when we work with big data, we ignore this essential step. We don’t plan ahead with specific variables, we don’t list out potential hypotheses, we don’t have a game plan. “Find something cool” isn’t a game plan. Nor is “how can we improve?” Big data needs big brains to plan and organize and be specific.
Do you want to succeed at big data? Then stop treating it like a magical panacea and do the work. Do the hard work.
Live blogged from #ESOMAR in Dublin, Ireland. Any errors or bad jokes are my own.
- The human mind always makes progress but it is progress in spirals
- does how we pay affect GDP? cash, cheque, credit. this was answered by going to market researchers and it affects capital expenditures, government policy, innovation spending, central bank decisions, taxation externalaties, trade balance, security spending, real estate requirments, bookkeeping productivity, marginal propensity to consume
- If you want to know th road ahead, ask those coming back
- what is thought leadership?
- training – what will happen in the future
- consulting –
- bloggers – they have immense power, inch wide but a mile deep, people turn here for news not traditional media [oooo, could that be me?]
- retention – becoming bigger, companies want to hold nice events but they want thought leaders in their industry to be there
- leverage – send people something of value that doesn’t cost a thing, use anniversaires of events as reminders, don’t make it a sales pitch, just make it information [bingo word :)]
- marketing – companies realize they can’t be left behind in the content area, maybe don’t have the resources to do it, partner with outside companies to produce niche content
- ecosystem – people issue more white papers as lead generation [except it’s kind of overdone now, everyone wants your email address before you can read it ]
- business models – businesses that are simply thought leadership
- speaking – people want to hear about what’s five years from now
- People are often scared by big data, you feel like you should use it all but why can’t you just pick up what’s relevant and synthesize that part, we’re on the cusp of big data becoming huge
- companies are turning to market researchers for help with big data [actually, no they aren’t. they’re turning to big data companies because market researchers have their head in the sand]
- all the young people have turned to snapchat and facebook is only used for stalking
- big data turns into new models, new language, business insights, marketable commodity, content management, market valuations, informal mentoring, new language like impressions and click through rates
- why is facebook valued more richly than a company like toyota or disney, we want to know what’s happening RIGHT NOW, we want to get in on the conversation, we are watching the hashtag for esomar
- RonR – return on relatonship [i thought that meant research on research, yeah, that’s better]
- 1 – number of impressions
- 2 – click through rate of the company
- 3 – conversion rate – how many email address will be left once you click through to the landing page – for 1 email address i need 1000 impressions
- 4 – how much will this cost
- 5 – first time purchasers are more likely to look online for researchers, count them as two
- 6 – for investment services, people turn to internet for reassurance, fount them as more then two
- top 3 divided by bottom 3 – created a proprietary indicator, this gives her the x-factor dealing with customers
- work ON your business, not IN your business while at congress
Scotiabank – Big Data, Small Brains: Making Effective Decisions in a World of Data Overload by Lisa Ritchie #MRIA15 #MRX
Live blogged from the 2015 MRIA National Conference in Toronto. Any errors or bad jokes are my own.
Scotiabank – Big Data, Small Brains: Making Effective Decisions in a World of Data Overload
Lisa Ritchie, Scotiabank
- Started department with 2 people many years ago and now there are about 180 people – data warehouse, data support, campaign execution, analytics and insights, brand and communication research
- Major changes in big data – market research was the core, the catalyst. Companies started getting repositories, more information became relevant. It morphed and became bigger. Reliance on information became bigger.
- Quant is rational, concise, part of the why; Qual is exploratory of the why, more emotive
- Big data – is there such a thing as little data, little information? Vendors made this thing sound so much bigger. Technology gives us information at a faster pace. The notion of big data came from technology vendors.
- The information was always there, we’re just getting it faster. It is blinding. The big brain comes into play here.
- Big data is now at your fingertips. You know how many times people have walked into your branch or made a savings deposit or what channel they use to deposit.
- Were able to understand that their bank wasn’t doing well with young customers. Had to figure that out – how to attract young customers to a bank, any bank. This information at their fingertips helped. Needed to see that people open an account because their parents opened it, and then they go off to college. Now, it is the best at attracting young people.
- Canadians are the highest users of loyalty programs.
- Biggest key to success is going back to basics and creating a structure. Data can give you any answer so research needs discipline.
- Can you tell a powerful story with big data? It’s different for structured and unstructured data. Piecemeal doesn’t work. Some analytics people are brilliant at the data but can’t tell a story. And vice versa.
- Secret to integration – need to hypothesis, synthesize, know what you’re looking for. It’s not about looking for one thing
- Most data has 8 to 10 years of history. You can go back.
- Don’t lose sight that a project from ten years ago might reveal new insights.
- The problem is knowing what to do with the big data. Need to learn how to ask questions to use that big data. “Yeah, yeah, yeah, but what I meant was…”
- Researchers need to be integrators, be proactive. Need to be real consultants. This is where we’re losing business – to people who ARE consultants.
- Journey will be fast and furious. The googles of the world are using data, and we need to be intuitive as well.
- She just got an email “Welcome to Metcan” but she’s been a customer for 10 years. They just sent a generic email even though their big data could tell them very different.
- Learn the skill of synthesizing. Think forward.
- Suppliers need to work with clients in a tighter relationship. Bring learning from elsewhere. This is what consultants do.
- It’s not just presentations. It’s communicating and interpreting and suppliers can help do this.
- She says her success has been luck [You make your own luck!]
- 330 terabytes of data [wish my laptop had that!]
- There is a push for everyone to have access to data. Need to make sure data is anonymous. [READ ONLY! READ ONLY!]
- When everyone has access to data, interpretation becomes really key.
- Don’t archive things so fast – what’s old is new again. Don’t underestimate the usefulness of old research.
- Does technology mean faster and cheaper – no. It’s a myth. Asking questions takes time.
Brand Building in a Digital, Social and Mobile Age Joel Rubinson, Rubinson Partners Inc. #NetGain2015 #MRX
Live blogging from the Net Gain 2015 conference in Toronto, Canada. Any errors or bad jokes are my own.
Brand Building in a Digital, Social and Mobile Age
Joel Rubinson, President and Founder of Rubinson Partners Inc.
- Picture of brand success has to change
- We are no longer in a push word, consumers pull information at their leisure
- We engage in shopping behaviours even when we aren’t really shopping, we are always IN the path to purchase
- Brands must become media
- Starbucks is the best example of a marketer that gets it. 40 million fans on facebook. millions of website visits. millions have downloaded their app. Every interaction generates data they can use, can be used for personalization, to amplify brand communications. They no longer have to pay for every message.
- The rise of math experts in advertising – lift from using math to place advertising is a repeatable success
- Programmatic messaging is key. Think about impressions that are served up one user at a time. marketers goal is serve the most relevant ad at the right price. And this needs to scale.
- Research is missing in action when it comes to math – we lack digital metrics, still rely on survey based tracking, we have a post-mortem mind set, we are failing to change how marketing works
- We must get serious about integrating digital – why isn’t this happening, why are we locked in a survey world
- Our comfort zone is surveys. We know how to construct 20 minute surveys. Our learning zone is the mobile area where we unpack our surveys into smaller pieces.
- The panic zone is digital, we don’t understand it. We must move digital into the comfort zone.
- lets start by just looking at the data, look at page views, look at themes in social media, how big is your brand audience, how many likes on facebook, how many twitter followers, how many newsletter signups. These are unambiguous measures. Look at clicking and sharing and conversions.
- Stop treating social media as a hobby, not specialty projects, not ancillary thing to look at. You must find ways to increase positive word of mouth.
- Do we really need feedback from consumers every single day on attributes they never consider? Can’t social media which is much more organic do this?
- Bring in data that you can’t get from a survey that has action value. Some online panel companies already use a social login called OAuth. Append all the Facebook data to your survey and use it for targeting.
- Data aggregators have lots of profiling information for targeting ads throughout the web which means different people get different ads based on cookies from their browser
- You can also link in frequent shopper data to your survey data.
- You don’t have to guess whether an ad is working. You can run an experiment and serve the ad to one group of people and see the change in group behaviour.
- MR needs to know that brand meaning is done completely different now. People seek out knowledge, digital delivers information in real time. But marketing research hasn’t changed.
- Think digital and do something big. Shift some money into datascience or integration. Conduct in the moment research with smartphones.
Emerging Technologies – Are They Still Emerging? Lenny Murphy, GreenBook Blog and GRIT Report #NetGain2015 #MRX
Live blogging from the Net Gain 2015 conference in Toronto, Canada. Any errors or bad jokes are my own.
Emerging Technologies – Are They Still Emerging?
Lenny Murphy, Editor-in-Chief of GreenBook Blog and GRIT Report
- Attitudinal, behavioural, and intrinsic data
- Foundational research is no longer taking months but hours
- Moving from questioning to discussing, from asking to observing, from data to insight, from understanding to predicting, from the big survey to data streams, from rational to behavioural, from quarterly to real time, from siloed to converged
- the traditional survey as the primary driver of information will decline
- Data science is not a hoity toity term for a statistician. It’s information technology and algorithms and languages and hadoop and R. It’s statistics on steroids.
- The future looks very different.
- Over the next five years, we are in the realm of DIY, non-conscious measurement is emerging such as facial scanning and automated emotion measuring, automation and AI in terms of very very smart devices, internet of things where all of your things will collect and share data from your shoes to your car, virtual and augmented reality will change our media habits
- DIY – there are many free DIY tools
- The ‘make it’ revolution – consumers can ‘print’ their own things, print some shoes, do an ideation session using a printer. cost of these devices can be as low as $100.
- Emotional measurement – facial scanning, shopping behaviour videos, eye tracking
- AI – tons of money going here, google has spent millions on quantum computers, these will just be part of everything we do
- Internet of Things – Internet as we know it might disappear. Daily lives are just always all connected. e.g., Microsoft’s hololense.
- Do a virtual shopping experience without a computer. But you still feel like you are in the store.
- Imagine a connect fridge [will it shop for me once it notices I’m out of BREAD AND MILK!!]
- Google Glass succeeded in every aspect they hoped. The real product will come out in the next couple of years.
- Gamification has never taken hold but many companies are working in this area. Game to map out neurons.
- Which companies will be our competitors for clients and budget? Google, IBM, Apple, facebook, AOL, Verizon, Comcast, Disney, at&t, GE, groupm, WPP, amazon
I did it. Yes. I broke down and spent my Christmas money. Let’s put aside the fact that I still get Christmas money from the moms and move on to what I spent it on.
In just six to eight weeks, this pretty little plum coloured Fitbit will arrive at my door. (The “make it pink so girls will buy it” marketing scheme works on me but plum is just as good.)
Supposedly, it will monitor my heart rate all the time including when I am awake and asleep. It would have been cool to have it a few weeks ago when my four wisdom teeth were ripped out of my face but I’m sure some other quite unpleasant event will greet me soon enough.
I’m quite looking forward to learning:
– how consistent my sleep is, and how many times I wake up at night
– whether my heart rate speeds up or slows down when I get ready for work or leave work, or when I go toy awesomely fun ukulele class
– how incredibly nuts my heart rate is when I speak at conferences, show up at cocktail hour, plow through a crowded exhibit hall. Though I may seem calm and relaxed, it really takes a ton of mind games to turn quiet me into loud me.
And at the same time, I’ll be wondering… If someone gets their hands on my data, what will they do with it? What products will they develop as they learn about me? What heart rate medications will they need to sell to me? What fitness products will they need to sell to me? Will I need to buy the shirt version to measure electrical outputs? The sock version to measure sweat outputs? The earbud version to measure brainwaves? What will marketers and brand managers learn about me and my lifestyle?
Now that I think about it, this is MY form of gamification. I can’t wait to see charts, watch trends, and compare Norms. And now that I’m learning Python and rstats, I would love to get my hands on the dataset of millions of people and millions more records. With permission of course.
Big Data and Privacy: The Legal Landscape Affecting Corporate Research by Shannon Harmon, JHC #CRC2014 #MRX
- our lives are a series of data points
- more opportunity vulnerability and the potential for greater abuse
- smaller entity might purchase data from 3rd party
- who owns the data, who has the right to access the data, what steps are taken to keep it secure
- goal of any regulation is to protect personally identifiable information form breach and misuse
- you can identify people with very little information so keep in mind a lot of information is PII
- Notice and consent: need to provide notice of how the data will be used, and then obtain consent – this is the core of the law related to privacy, you need to make sure the right practices were followed to do this
- Where do we look for oversight? Right now, state attorney general, FTC, FCC, FDA
- Fair information practice principle – only collect what you need to collect and only retain it for as long as is necessary to fulfill the specified purpose
- FIPP – data quality and integrity – organizations should ensure that the PII is accurate, relevant, timely and complete and this is difficult if you’ve purchased the data, supplier should have a structure in place to ensure this
- Consumer privacy protection bill of rights – google search this – things corporations should do to protect privacy, this area will become increasingly more regulated so think ahead
- Fair Credit Reporting Act – example of what big data protection framework should look like, right to review your credit report and make sure it’s accurate and get it fixed if it’s not correct, this is where we’re headed, your digital dossier is being collected and you don’t know how decisions about you are being made, you can’t contest your big data points… right now
- special considerations for health data – apple has stated that any app developers cannot use any of the health data for advertising, or data-mining except to help an individual manage their health or for medical research. but is apple responsible for developer compliance? what if a data broker got the data from someone who wasn’t supposed to have it in the first place?
- considerations for researchers
- where is the data being obtained, what are the sources
- what practices are being used to obtain it and what is your confidence in your aggregator
- how is the data being trained to arrive at conclusions? what algorithms? what human manipulation?
- think about the vendor/subcontractor relationship, is the contractor independent? a substandard contractor impacts you
- we need
- use restrictions – can’t use big data to discriminate on age, race, etc
- oversight – protect against unregulated digital dossiers
- KNOW YOUR INFORMATION SOURCE
- be intimately knowledgeable about your company’s data gathering practices – informed consent, opt-out, internal user access controls
- be ready to evolve as the law is only beginning to be developed in this area
- The Oscars of Marketing Research: Peanut Labs’ Chief Research Officer wins ESOMAR’s Excellence Award for the Best Paper
- Why do people like marketing research surveys?
- In which I rant about showing data in presentations #MRX #CRC2014
- How marketing researchers can start being more ethical right now #MRX
- Discover the Science of Fascination by Sally Hogshead, Fascinate, Inc. #CRC2014 #MRX