Tag Archives: data science

Fusing Marketing Research and Data Science by John Colias at @DecisionAnalyst

Live note-taking of the November 9, 2016 webinar. Any errors are my own.

  • Survey insights have been overshadowed in recent years, market research is struggling to redefine itself, there is an opportunity to combine big data and surveys
  • Preferences are not always observable in big data, includes social data, wearable data
  • Surveys can measure attitudes, preferences, and perceptions
  • Problem is organizational – isolation, compartmentalization of market research and big data functions
  • Started with a primary research survey about health and nutrition, one question is how often do you consume organic foods and beverages; also had block-group census data from American community survey five-year summary data with thousands of variables
  • Fused survey data and block group data using location of survey respondent from their address and matched to block group data, brought in geo data for that block group
  • Randomly split the data, built predictive model on training model, determined predictive accuracy using validation data (the hold-out data), 70% of data for model development, 30% for validation – independent objective model
  • Created a lift curve, predictive model identified consumers of organic foods more than 2.5 times better than chance
  • When predictive models bows out from random model, you have achieved success
  • Which variables were most predictive, not that they’re correlated but they predict behaviour – 26 or older, higher income, higher education, less likely Hispanic; this may be known but now they have a model to predict where these people are
  • Can map against actual geography and plot distances to stores
  • High-tech truck research
  • Used a choice modeling survey, design attributes and models to test, develop customer level preferences for features of the truck
  • Cargo space, hitching method, back up cameras, power outlets, load capacity, price
  • People chose preferred model from two choices, determined which people are price sensitive, or who value carrying capacity, biggest needs were price, capacity, and load
  • How to target to these groups of people
  • Fused in external data like previously, but now predicting based on choice modeling not based on survey attitudes, lift curve was again bowed to left, 1.8 times better than chance – occupation, education, income, and household size were the best predictors
  • [these are generic results – rich people want organic food and trucks, but point taken on the method. If there is a product whose users are not obvious, then this method would be useful]
  • Fusion can use primary and secondary data, also fuses technology like R dashboards and google maps, fuses survey and modeling, fuses consumer insights database marketing and big data analytics
  • Use this to find customers whose preferences are unobserved, improve targeting of advertising and promotions, optimize retail location strategies, predict preferences and perceptions of consumers, collaboration of MR departments with big data groups would benefit both entities
  • In UK and Spain, demographics are more granular, GPS tracking can be used in lesser developed countries
  • Used R to query public data set, beauty of open-source code and data

Panel: People as Proxy #MRIA16 

Live note taking at #MRIA16 in Montreal. Any errors or bad jokes are my own.

Panel with Sean Copeland, Evan Lyons, Anil Saral, Ariel Charnin, Melanie Kaplan

  • Timelines are very compressed now, instead of two or three months people are asking for hours to get answers
  • It’s no longer 20 minute questions but quick questions
  • Market research is often separate from data science and analytics but this team has put them together
  • They don’t have to answer questions with surveys because they have the raw data and they know the surveys probably won’t be able to answer them accurately; they know when to use market research so that it is most effective
  • When is MR the right solution and when do they partner with data scientists 
  • There is a divide between MR and data science which is strange because our goal of understanding consumers is the same
  • We can see all th transactional data but without MR you miss the why, the motivator, one method doesn’t answer the entire question
  • We need to train and mentor younger researchers [please join http://researchspeakersclub.com ]
  • Some mistrust of quantitative data, are panels rep, why do the numbers change month to month, reexploring Qual to understand the needs and wants, clients remember specific comments from specific focus groups which helps the time to see the issues
  • A doctor is still a doctor even when they use a robot, the same is true for consumer insights with surveys and data science
  • Don’t be protective of your little world, if a project comes to you and is better answered by another method then you are wise to pass it to those people
  • You need to appreciate what MR offers and what analytics offers, both have strengths and weaknesses you need to understand
  • A new language may be morphing out of the combination of MR and data science
  • Everyone believes they are providing insight, of course both sides can do this whether it’s projects and models and understanding the why, insights need to be both of these
  • Still need to be an advocate for MR, can’t just go to data science very time even if it’s the new great toy
  • Live Flow Data – is this a reality, it will happen, can already see 5 day forecast of weather and know about upcoming conferences and how many tickets were sold for a week from now; monthly assumptions from data could happen
  • They can see the effects of ads immediately in live data
  • They don’t want to hear what happened yesterday, need to know what’s happening now
  • Future of our business is understanding people and solving problems, you always need more information to do this; if you learn new things, you can do more things and solve more problems
  • Need more skills in strategy and merging with insights, don’t just hand off reports, help clients take insights and turn them into the next initiative 
  • Is it one story or multiple stories after you’ve got all the data put together
  • Don’t just deliver a product and then leave it, our results are only as accurate as the people who interpret it; research can say a hamburger should look exactly like this but when the end product designers change all the tiny little things to be more convenient then you wine up with a completely wrong hamburger in the end

Rise Of The Machines: DSc Machine Learning In Social Research #AAPOR #MRX #NewMR 

Enjoy my live note taking at AAPOR in Austin, Texas. Any bad jokes or errors are my own. Good jokes are especially mine.  

Moderator: Masahiko Aida, Civis Analytics

Employing Machine Learning Approaches in Social Scientific Analyses; Arne Bethmann, Institute for Employment Research (IAB) Jonas F. Beste, Institute for Employment Research (IAB)

  • [Good job on starting without a computer being ready. Because who needs computers for a talk about data science which uses computers:) ]
  • Demonstration of chart of wages by age and gender which is far from linear, regression tree is fairly complex
  • Why use machine learning? Models are flexible, automatic selection of features and interactions, large toolbox of modeling strategies; but risk is overfitting, not easily interpretable, etc
  • Interesting that you can kind of see the model in the regression tree alone
  • Start by setting every case in a sample to 0, e.g., male and female are both 0; then predict responses for every person; calculate AME/APE as mean difference between predictions for all cases
  • Regression tree and linear model end up with very different results
  • R package for average M effects – MLAME on github
  • MLR package as well [please ask author for links to these packages]
  • Want to add more functions to these – conditional AME, SE estimation, MLR wrapper

Using Big Census Data to Better Understand a Large Community Well-being Study: More than Geography Divides Us; Donald P. Levy, Siena College Research Institute Meghann Crawford, Siena College Research Institute

  • Interviewed 16000 people by phone, RDD
  • Survey of quality of community, health, safety, financial security, civic engagement, personal well being
  • Used factor analysis to group and test multiple indicators into factors, did the items really rest within in each factor [i love factor analysis. It helps you see groupings that are invisible to the naked eye. ]
  • Mapped out cities and Burroughs, some changed over time
  • Rural versus urban have more in common than neighbouring areas [is this not obvious?]
  • 5 connections – wealthy, suburban, rural, urban periphery, urban core
  • Can set goals for your city based on these scores
  • Simple scoring method based on 111 indicators to help with planning and awareness campaigns, make the numbers public and they are shared in reports and on public transportation so the public knows what they are, helps to identify obstacles, help to enhance quality of life

Using Machine Learning to Infer Demographics for Respondents; Noble Kuriakose, SurveyMonkey; Tommy Nguyen, SurveyMonkey

  • Best accuracy for gender inferring is 80%, Google has seen this
  • Use mobile survey, but not everyone fills out the entire demographic survey
  • Works to find twins, people you look like based on app usage
  • Support vector machines try to split a scatter plot where male and female are as far apart as possible 
  • Give a lot of power to the edges to split the data 
  • Usually the data overlaps a ton, you don’t see men on the left and women on the right
  • “Did this person use this app?” Split people based on gender, Pinterest is often the first node because it is the best differentiator right now, Grindr and emoticon use follow through to define the genders well, stop when a node is all one specific gender
  • Men do use Pinterest though, ESPN is also a good indicator but it’s not perfect either, HotOrNot is more male
  • Use time spend per app, app used, number of apps installed, websites visited, etc
  • Random forest works the best
  • Feature selection really matters, use a selected list not a random list
  • Really big differences with tree depth
  • Can’t apply the app model to the android model, the apps are different, the use of apps is different

Dissonance and Harmony: Exploring How Data Science Helped Solve a Complex Social Science Problem; Michael L. Jugovich, NORC at the University of Chicago; Emily White, NORC at the University of Chicago

  • [another speaker who marched on when the computer screens decided they didn’t want to work 🙂 ]
  • Recidivism research, going back to prison
  • Wanted a national perspective of recidivism
  • Offences differ by state, unstructured text forms means a lot of text interpretation, historical data is included which messes up the data if it’s vertical or horizontal in different states
  • Have to account for short forms and spelling errors (kinfe)
  • Getting the data into a useable format talks the longest time and most work
  • Big data is often blue in pictures with spirals [funny comments 🙂 ]
  • Old data is changed and new data is added all the time
  • 30 000 regular expressions to identify all the pieces of text
  • They seek 100% accuracy rate [well that’s completely impossible]
  • Added in supervised learning and used to help improve the speed and efficiency of manual review process
  • Wanted state specific and global economy models, over 300 models, used brute force model
  • Want to improve with neural networks, auto make data base updates

Machine Learning Our Way to Happiness; Pablo Diego Rosell, The Gallup Organization

  • Are machine learning models different/better than theory driven models
  • Using Gallup daily tracking survey
  • Measuring happiness using the ladder scale, best possible life to worst possible life, where do you fall along this continuum, Most people sit around 7 or 8
  • 500 interviews everyday, RDD of landlines and mobile, English and Spanish, weighted to national targets and phone lines
  • Most models get an R share of .29. Probably because they miss interactions we can’t even imagine
  • Include variables that may not be justified in a theory driven model, include quadratic terms that you would never think of, expanded variables from 15 to 194
  • [i feel like this isn’t necessarily machine learning but just traditional statistics with every available variable crossed with every other variable included in the process]
  • For an 80% solution, needed only five variables
  • This example didn’t uncover significant unmodeled variables
  • [if machine learning is just as fast and just as predictive as a theory driven model, I’d take the theory driven model any day. If you don’t understand WHY a model is what it is, you can’t act on it as precisely.]

Brand Building in a Digital, Social and Mobile Age Joel Rubinson, Rubinson Partners Inc. #NetGain2015 #MRX

Netgain 2015Live blogging from the Net Gain 2015 conference in Toronto, Canada. Any errors or bad jokes are my own.

Brand Building in a Digital, Social and Mobile Age

Joel Rubinson, President and Founder of Rubinson Partners Inc.

  • Picture of brand success has to change
  • We are no longer in a push word, consumers pull information at their leisure
  • We engage in shopping behaviours even when we aren’t really shopping, we are always IN the path to purchase
  • Brands must become media
  • Starbucks is the best example of a marketer that gets it. 40 million fans on facebook. millions of website visits. millions have downloaded their app. Every interaction generates data they can use, can be used for personalization, to amplify brand communications. They no longer have to pay for every message.
  • The rise of math experts in advertising  – lift from using math to place advertising is a repeatable success
  • Programmatic messaging is key. Think about impressions that are served up one user at a time. marketers goal is serve the most relevant ad at the right price. And this needs to scale.
  • Research is missing in action when it comes to math – we lack digital metrics, still rely on survey based tracking, we have a post-mortem mind set, we are failing to change how marketing works
  • We must get serious about integrating digital – why isn’t this happening, why are we locked in a survey world
  • Our comfort zone is surveys. We know how to construct 20 minute surveys. Our learning zone is the mobile area where we unpack our surveys into smaller pieces.
  • The panic zone is digital, we don’t understand it. We must move digital into the comfort zone.
  • lets start by just looking at the data, look at page views, look at themes in social media, how big is your brand audience, how many likes on facebook, how many twitter followers, how many newsletter signups. These are unambiguous measures. Look at clicking and sharing and conversions.
  • Stop treating social media as a hobby, not specialty projects, not ancillary thing to look at. You must find ways to increase positive word of mouth.
  • Do we really need feedback from consumers every single day on attributes they never consider? Can’t social media which is much more organic do this?
  • Bring in data that you can’t get from a survey that has action value. Some online panel companies already use a social login called OAuth.  Append all the Facebook data to your survey and use it for targeting.
  • Data aggregators have lots of profiling information for targeting ads throughout the web which means different people get different ads based on cookies from their browser
  • You can also link in frequent shopper data to your survey data.
  • You don’t have to guess whether an ad is working. You can run an experiment and serve the ad to one group of people and see the change in group behaviour.
  • MR needs to know that brand meaning is done completely different now. People seek out knowledge, digital delivers information in real time. But marketing research hasn’t changed.
  • Think digital and do something big. Shift some money into datascience or integration. Conduct in the moment research with smartphones.

The Internet of Annie #MRX #IOT

I did it. Yes. I broke down and spent my Christmas money. Let’s put aside the fact that I still get Christmas money from the moms and move on to what I spent it on.

In just six to eight weeks, this pretty little plum coloured Fitbit will arrive at my door. (The “make it pink so girls will buy it” marketing scheme works on me but plum is just as good.)

2015/01/img_0065.pngSupposedly, it will monitor my heart rate all the time including when I am awake and asleep. It would have been cool to have it a few weeks ago when my four wisdom teeth were ripped out of my face but I’m sure some other quite unpleasant event will greet me soon enough.

I’m quite looking forward to learning:
– how consistent my sleep is, and how many times I wake up at night
– whether my heart rate speeds up or slows down when I get ready for work or leave work, or when I go toy awesomely fun ukulele class
– how incredibly nuts my heart rate is when I speak at conferences, show up at cocktail hour, plow through a crowded exhibit hall. Though I may seem calm and relaxed, it really takes a ton of mind games to turn quiet me into loud me.

And at the same time, I’ll be wondering… If someone gets their hands on my data, what will they do with it? What products will they develop as they learn about me? What heart rate medications will they need to sell to me? What fitness products will they need to sell to me? Will I need to buy the shirt version to measure electrical outputs? The sock version to measure sweat outputs? The earbud version to measure brainwaves? What will marketers and brand managers learn about me and my lifestyle?

Now that I think about it, this is MY form of gamification. I can’t wait to see charts, watch trends, and compare Norms. And now that I’m learning Python and rstats, I would love to get my hands on the dataset of millions of people and millions more records. With permission of course.

What #MRX software should you learn?

I see the writing on the wall and it says data science. As more and more devices join the internet of things, as more shoes and fridges and chairs and hairbrushes upload data about frequency, duration, latency and more to the interweebs, it becomes more and more clear to me that manipulating ridiculous volumes of data is the future of marketing research. No more will we ask people how often they buy and wear shoes, or which shoes they wear in which weather. We will simply read the writing in the cloud. Marketing researchers cannot rely on their old standbys while everyone else learns the always evolving tools of the research trade.

I see the writing on the wall and it says Python. A few days after internalizing that writing, I made a purchase of two paper and ink products that will never break upon being dropped on concrete. These two things, ancient learning tools called ‘books’ will be my friends for a while this year.

And interestingly, shortly after these ‘books’ came into my possession, I came across this post by Amy. I’m good with SQL and good with Excel but what about the two items in between? Well, R and Python, here we go!

programsI’ve downloaded and installed the software. And, as all good newbies do, I created my first bit of Python ‘code’. Here goes! 🙂
hello world python

Data Mining and Predictive Modeling: Andrew Grenville, Kevin Dang #Netgain7 #MRX

Netgain 7 MRIA
… Live blogging from downtown Toronto…

Predictive Modeling

Andrew Grenville, Chief Research Officer and Kevin Dang, Senior Research Manager, Vision Critical

Data Mining and Predictive Modeling to Drive Panel Management Strategies

  • Analytical panel management has key factors
    • data set-up
    • what business challenge are we trying to solve – e.g., valuable panelists, churn
    • methodology and analysis – which model, which variables, how to fine-tune, which statistic works better
    • evaluation and performance – is the model stable, can you move it through time
    • application – if we know who will churn, how can we act in time
  • static reports, drill down, ad hoc reports, forecasting, predictive modeling, optimization – researchers tend to stop before predictive modeling
  • Want fast turn around time, waiting a couple weeks for analysis isn’t going to work
  • Needs to incorporate longitudinal data, learn from the past
  • Needs to be flexible to incorporate both survey data and panel data
  • Needs to have as little ongoing IT support as possible
  • Identified all the interaction points along the panelist life cycle – disqualifies, over quota, incentives paid, other touch points
  • First cleaned out poor data, e.g., 115 year old people
  • Churn model – urban area or high income were more likely to churn, 18 to 30 more likely to churn, low response rates more likely to churn
  • Conducted exploratory and confirmatory analysis – predictive accuracy was 76% [awesome!]
  • Challenges with model include the quality of the data, lots of time was spent harmonizing the data for modeling
  • Challenge was making model useful and actionable – it had to be as simple as possible so that it was actionable
%d bloggers like this: