Changing the game: Sports Tech with the Toronto Argonauts and the Blue Jays, #BigDataTO #BigData #AI
Notes from the #BigDataTO conference inToronto
Panel: Mark Silver, @silveratola, Stadium Digital; Michael Copeland, @Mike_G_copeland, Toronto Argonauts; Jonathan Carrigan, @J_carrigan, MLSE; Andrew Miller, @BlueJays, Toronto Blue Jays
- There is a diverse fan base across all Toronto teams, and their preferences and values are diverse in terms of who are they and what drives them to watch and attend games. There are many segments of people not just ‘fans.’
- Fandom takes many shapes and sizes and you always need to grow and rebuild the fan base. You can’t appeal to only avid fans. You must also appear to casual fans. You need to go beyond the narrow focus of superfans.
- The strategy of loyalty programs is that they are an engagement tool to gather data for mining, generate in-game activation, let people win prizes by participating, help partners better understand the fans, and this creates wins across the board – for the team, the partners, and the fans.
- The teams want to learn what people are doing during the game as opposed to guessing. Which benefits do they use their points for, what do they choose at the concession stand, are they watching road games. And this is not just for season ticket holders but people across North America watching games. We need to use the data to learn how to scale beyond ticket holders.
- People want more meaningful and personal relationships with their sports teams. We need to learn what food they want, what environment they want in the venue, what relationships they want outside of the game. And we need to filter out the noise and deliver.
- We’ve all done the analogue research. It’s been done for 100 years and it’s not unique to sport. How do we use technology to do it better now. WHO – we need to stop guessing and start using more efficient research. This massive data we have will tell you many things like WHAT do they want. They might want NEW THINGS that you didn’t offer before, an app, an emoji. The data will also ASSIST your team with player recruitment and roster management. We’ve been doing all this for ages and now we want to do it better, more efficiently, most cost effectively.
- Big data is not free though. Not all stadiums have wifi to do wifi research and it’s expensive to invest in putting wifi in a stadium. We need to spread the cost among multiple agencies.
- This isn’t a technology project. Rather it’s a people project. For instance, a chef can do vastly more with ten ingredients than I can. We need to change the way we engage with fans and leverage partner relationships. Yes, we’re investing in technology but the focus is people. We need to translate tools for each part of the business, reimagine how we engage with fans, and how we make a profit. You can buy a beautiful car but you need to learn to drive to take full advantage of that new car.
Cognitive Analytics: Enabling assisted intelligence in human resources recruiting and hiring by Noel Webb, @CognitiveHR, Karen.ai, #BigDataTO #BigData #AI
Notes from the #BigDataTO conference in Toronto
- He realized that HR teams were spending too much time prescreening resumes before they could even meet with the best candidates
- Recruiters only spend 6 seconds reviewing a resume which means they end up accidentally discarding some of the best ones. Time crunches mean they may only be able to get through 20% of candidates. ML can solve these problems .
- 75% of candidates who apply to jobs do not hear back from the company because there are simply too many candidates and not enough time to do so. NLP and chatbots can solve this problem.
- AI will not steal all jobs but it will automate processes and allow you to engage with potential hires in a more meaningful way.
- Shortlisting is a huge challenge for HR as reducing a huge list of resumes into a screened list takes a lot of detailed attention. Technology such as direct keyword matches aren’t the best option as they eliminate people with relevant skills but not the exact words. For instance, know R is just as good as knowing SAS but a keyword search wouldn’t know that. NLP would work much better.
- Personality insights can also be collected using sentiment analysis to get a functional understanding of the Big 5 Personality traits. [Wow, I can’t imagine how valid it is to do personality assessments with resumes which are often written by third parties and without traditional grammar and style]
- Chatbots can take an applicant through hiring and onboarding processes by answering questions that would normally be asked of an operations officer. [imagine how many stupid questions the chatbot would be asked that new hires are too scared to ask people]
The ideology of data by Sasha Gryjicic, @SashaG, Dentsu Aegis Network, #BigDataTO #BigData #AI #Intelligence
Notes from the #BigDataTO conference in Toronto
- Data marketing and artificial intelligence are headed in the wrong direction
- Marketing is the pursuit of convincing someone they need something, marketing is a commercial outcome to propel the broader economy forward, marketing uses media and communications to convince, largely based on human language
- Data is a digital expression of something in the world, organized and stored in many ways. We are finally getting the external world to use a single language but we can’t read this language. Humans don’t read binary code or extrapolations of code.
- Data violates the notion of scarcity and data is almost always out of both time and space context for necessity. Data is necessarily incomplete and it is not that thing itself. Data has inherent biases, is super messy, and contradictory
- We use data to optimize things that have already happened, or we generalize what we learn from data to engineer more of those outcomes, e.g., when managing an online store, we optimize data to get optimal business outcomes but this doesn’t help us learn why or what drives the decisions
- Intelligence is the ability to gather, category, organize inputs, store, reflect, and respond to them. For humans, intelligence is innate, structured, organized, and process oriented. We have a fixed capacity of intelligence and are creative with it. It is not the result of external stimulus.
- Language is the best way for humans to get access to our intelligence. It’s the language we use when we think. We talk to ourselves more than we talk to others.
- The AI we’re building is like automated statistics. We brute force relationships and create a black box of intelligence. We don’t understand why a computer makes certain decisions because we cannot hold enough variables in our mind to understand. Are algorithms intelligence or optimization? We are drifting further from understanding what intelligence really is. It’s not AI at all.
- We’re accelerating the fatigue of positive reinforcement. We’re following bad after bad. We’re heading away from language which is the only way to understand ourselves.
- Intelligence seems to include morality, the ability to store and reflect and take decisions based on reflections.
- We need to back away from disorganized data. We need to pause and relfect on what we see in the data to understand ourselves better. We need to dive into our own intelligence better. Reflecting on something is more important that acting on something.
Future of the smart home by Emily Taylor and Manish Nargas, IDC Canada, #BigDataTO #SmartHome #ConnectedHome #AI
Notes from the #BigDataTO conference in Toronto
- By 2020, every home will have 40 connected devices – TV, appliances, health, assistance, security
- Wearables help consumers track and log their activities such as wellness goals, athletic training, weight loss monitoring, medication reminders, gamification of activities. 1 in 5 Canadians currently own a device as a wristband or a watch and 70% of those owners have no plans to upgrade or replace. 60% of consumers are not interested in wearables at all. Designs will be less obvious, have improved battery life, and use new materials like smart fabrics. Medical devices will have better reliability and validity and this will help the healthcare sector and be relevant for insurance companies
- Security devices – smoke alarms, motion sensors, doorbells, security systems, remote home monitoring. These devices offer peace of mind. It’s no longer about emergency services but monitoring to see if the kids are home, a window is opened, the jewelry box is still there, perhaps even see if it’s a friend or foe at your front door.
- Home automation – these devices will help us reduce energy usage, increase safety including devices such as thermostats, light switches, outlets, appliances. IKEA has launched a smart home lighting system with wire-free lighting at a lower cost than their competitor. They will bring this technology into every piece of furniture and curtains [window blankets 🙂 ]
- Personal health devices – These devices will result in increased awareness of monitoring. Health monitoring will take place from the home not a hospital and will result in fewer trips to the doctor and hospital. Connected clothing will help with this. Gym equipment brands now sync with health monitoring devices so you can monitor treadmill and walking together and get more consistent results.
- Intelligent assistants/bots – more natural way to interact with machines, removes the complexity of interconnections, vocalizes thought and activity, uses real time machine learning. Low adoption rates in Canada but many bots aren’t available in Canada. Connecting a speaker to the internet isn’t revolutionary but it can improve personalization. 60% of Canadians don’t care about bots but bots are here to stay. It is Alexa and soon will be your butler. It will be ubiquitous.
- There are gaps. Many devices are siloed right now. They have limited conversations with other smart home devices. The market is too focused on DIY right now as people want to solve specific problems not do the entire home in one shot. There is little support across the solutions.
- Do you need a smart-fork that monitors how quickly you eat? Do you need this fork to connect to your lights and smoke alarm?
When will we drive autonomous vehicles, by Kashmir Zahid, Ericsson Digital Services (Great talk) #BigDataTO #BigData #AI #Automation
Notes from the #BigDataTO conference in Toronto
- 1996 GM introduced Onstar. It had a weak interface, few features, and was mainly designed to offer roadside assistance.
- 2010 saw in-car navigation but it still wasn’t user friendly nor easy to operate while you were driving.
- 2012 Tesla built an all electric car and people could finally see the possibilities of vehicles with electricity and connectivity. Now that vehicles had so much digital, manufacturers could no longer stay in the shadows and let dealerships handle all the consumer interactions.
- 2014 Apple CarPlay and android auto were introduced. Connectivity was embedded in the car from the time it was installed in the factory as opposed to being added by the consumer after the fact.
- 2015 remote diagnostics are now available, repairs can now be recommended by the vehicle rather than going to the dealership or following the manual.
- 2015 Tesla creates autopilot, a self guiding car but the user is still expected to take physical control when needed.
- 2017, the Google car is no longer a science project, it is a reality.
At CES, three trends were noted
1. cars will be integrated into your life and communicate with your personal device, e.g., your home will be ready to receive you when you arrive, the temperature is set appropriately, the lights are turned on, the garage door is opened, and the turkey is ready to be taken out of the oven
2. Automation will create a natural experience of talking to your car, Alexa is winning here [although it just accidentally bought Whole Foods so I don’t know about the quality at this point]
3. Car to car communication – this will allow vehicles to see and talk to each other, so they can maintain speed and safety among other cars on the road
- Now that everything talks to everything, our user experiences will be completely transformed.
- By 2020, 90% of cars will be connected
- 4 trends in the industry
1. Cars must be connected, software defined car
2. Electrification, ITS, infrastructure
3. Automation, connected automated mobility
4. New business models, multi industry ecosystems
- This is the largest change in transportation since Ford’s model T
- Soon, we will have everything we need to travel but we won’t own the car. [Think of music, we no longer own the music we buy and we could lose it instantly if Apple decides to shut something down]
- Insurance will depend on how you drive, your telemetrics. And later on, insurance won’t be necessary as human drivers won’t be responsible for safety.
- Emergency assistance providers will be affected as cars will have embedded systems that alert first responders instantly to ask if you are safe.
- Government providers will need to reconsider what legislation is needed to take care of drivers and roads.
- 13 out of 14 of the big vehicle manufactures plan to make an autonomous vehicle in the next couple of years
- Google, Apple, Intel, Microsoft and Amazon have focus and investment in self driving car projects. Telecom operators like AT&T, verizon, Vodafone see the potential of new revenue in self driving car. Uber, Lyft, DIDI and many other startups are trying to disrupt the traditional car ownership model.
- The passenger economy will be worth $7 trillion by 2050.
- We are about to see consumer mobility as a service – one stop shop for transportation for everyone who doesn’t own a car [this is amazing for people who don’t know how to drive, are too old to drive, too young to drive, not well enough to drive]
- This will save over half a million lives due to safety from fewer accidents. And, it will free up your time since you don’t have to physically drive.
- We are two years away from letting people sleep in a Tesla on long road trips where the car has not made the trip before – Elon Musk
1. Public safety – people need to trust the machine to work while they sleep.
2. Data privacy and security – who has, uses, and sells my data. It’s not transparent right now.
3. Rules and Regulations – Who is liable for an accident? Who owns the vehicle that caused the accident?
- Connected cars will open multiple innovative services.
- They will improve the efficiency and security of new value added services for both consumers and enterprises.
I’ll admit I didn’t have high hopes. How good could a free conference about big data and artificial intelligence be? Especially if the upgrade tickets, which I so frugally declined, were only $75? Well, I was pleasantly surprised.
Let’s deal with the negatives first. The morning registration line was long and it took some people 30 minutes o get through it. The exhibition hall was small with not nearly as many vendors as I am used to seeing at conferences. There was no free wifi in the main hall (um, admission was free so why do we deserve free wifi too?). Sometimes the sessions were so packed, there wasn’t even room to stand. And, some speakers didn’t even show up because, well, airplanes.
However, those negatives were completely washed aside with the positives. Some of the talks were quite good. Some of the speakers were quite good. The topics were quite good. They gave out free conference programs. And did I say free? Some free things are worth what you paid for them. This one was worth a lot more. I highly advise you to go and it’s definitely on my 2018 conference schedule as time well spent.
- Data science is often handled at the tail-end of a project. We only take the time to learn what happened after the fact and when it’s too late to do anything about the current situation. We need to do a better job of using our data for the future – for segmentation, targeting, to understand what our customers want, to uncover blind spots.
- Good data scientists care where the data came from, who created it, what it represents. They don’t just take the data and run it through stats programs and spit out reports. It’s not just about statistics and reporting. Data quality must come first.
- The real money is not in having the data but rather in knowing what questions to ask. Literally everyone has data but only the companies that hire the smart brains to ask the right questions will succeed with big data.
- You might think using artificial intelligence is very impersonal. On the contrary. It’s impossible for a human being to be personal with hundreds and thousands of people but AI allows you to be far MORE personal with thousands or millions of people.
- Computers and artificial intelligence need to learn the senses – for instance, they need to learn to see the types of moles on skin that will become cancer, learn to hear which wheels on a train are cracked and about to cause a train wreck.
- Algorithms are what make computer see and listen and as such algorithms are the future. Soon, companies will brag about their algorithms not their data.
- We need to let computers do the pattern recognition so that humans can do the strategizing and reasoning
- If you want to work with big data but can’t afford it, have no fears. So much software is free and open source. You can do anything you want with free tools so don’t let dollars hold you back from doing or learning.
- The danger with artificial intelligence is training it with bad, untrustworthy, biased data. We’ve all seen the reports of AI perpetuating racism because the training data contained racist data. You must choose good datasets that are clean and genuinely unbiased and only then will you find success.
Live note taking at #MRIA16 in Montreal. Any errors or bad jokes are my own.
Panel with Sean Copeland, Evan Lyons, Anil Saral, Ariel Charnin, Melanie Kaplan
- Timelines are very compressed now, instead of two or three months people are asking for hours to get answers
- It’s no longer 20 minute questions but quick questions
- Market research is often separate from data science and analytics but this team has put them together
- They don’t have to answer questions with surveys because they have the raw data and they know the surveys probably won’t be able to answer them accurately; they know when to use market research so that it is most effective
- When is MR the right solution and when do they partner with data scientists
- There is a divide between MR and data science which is strange because our goal of understanding consumers is the same
- We can see all th transactional data but without MR you miss the why, the motivator, one method doesn’t answer the entire question
- We need to train and mentor younger researchers [please join http://researchspeakersclub.com ]
- Some mistrust of quantitative data, are panels rep, why do the numbers change month to month, reexploring Qual to understand the needs and wants, clients remember specific comments from specific focus groups which helps the time to see the issues
- A doctor is still a doctor even when they use a robot, the same is true for consumer insights with surveys and data science
- Don’t be protective of your little world, if a project comes to you and is better answered by another method then you are wise to pass it to those people
- You need to appreciate what MR offers and what analytics offers, both have strengths and weaknesses you need to understand
- A new language may be morphing out of the combination of MR and data science
- Everyone believes they are providing insight, of course both sides can do this whether it’s projects and models and understanding the why, insights need to be both of these
- Still need to be an advocate for MR, can’t just go to data science very time even if it’s the new great toy
- Live Flow Data – is this a reality, it will happen, can already see 5 day forecast of weather and know about upcoming conferences and how many tickets were sold for a week from now; monthly assumptions from data could happen
- They can see the effects of ads immediately in live data
- They don’t want to hear what happened yesterday, need to know what’s happening now
- Future of our business is understanding people and solving problems, you always need more information to do this; if you learn new things, you can do more things and solve more problems
- Need more skills in strategy and merging with insights, don’t just hand off reports, help clients take insights and turn them into the next initiative
- Is it one story or multiple stories after you’ve got all the data put together
- Don’t just deliver a product and then leave it, our results are only as accurate as the people who interpret it; research can say a hamburger should look exactly like this but when the end product designers change all the tiny little things to be more convenient then you wine up with a completely wrong hamburger in the end
Enjoy my live note taking at AAPOR in Austin, Texas. Any bad jokes or errors are my own. Good jokes are especially mine.
Moderator: Masahiko Aida, Civis Analytics
Employing Machine Learning Approaches in Social Scientific Analyses; Arne Bethmann, Institute for Employment Research (IAB) Jonas F. Beste, Institute for Employment Research (IAB)
- [Good job on starting without a computer being ready. Because who needs computers for a talk about data science which uses computers:) ]
- Demonstration of chart of wages by age and gender which is far from linear, regression tree is fairly complex
- Why use machine learning? Models are flexible, automatic selection of features and interactions, large toolbox of modeling strategies; but risk is overfitting, not easily interpretable, etc
- Interesting that you can kind of see the model in the regression tree alone
- Start by setting every case in a sample to 0, e.g., male and female are both 0; then predict responses for every person; calculate AME/APE as mean difference between predictions for all cases
- Regression tree and linear model end up with very different results
- R package for average M effects – MLAME on github
- MLR package as well [please ask author for links to these packages]
- Want to add more functions to these – conditional AME, SE estimation, MLR wrapper
Using Big Census Data to Better Understand a Large Community Well-being Study: More than Geography Divides Us; Donald P. Levy, Siena College Research Institute Meghann Crawford, Siena College Research Institute
- Interviewed 16000 people by phone, RDD
- Survey of quality of community, health, safety, financial security, civic engagement, personal well being
- Used factor analysis to group and test multiple indicators into factors, did the items really rest within in each factor [i love factor analysis. It helps you see groupings that are invisible to the naked eye. ]
- Mapped out cities and Burroughs, some changed over time
- Rural versus urban have more in common than neighbouring areas [is this not obvious?]
- 5 connections – wealthy, suburban, rural, urban periphery, urban core
- Can set goals for your city based on these scores
- Simple scoring method based on 111 indicators to help with planning and awareness campaigns, make the numbers public and they are shared in reports and on public transportation so the public knows what they are, helps to identify obstacles, help to enhance quality of life
Using Machine Learning to Infer Demographics for Respondents; Noble Kuriakose, SurveyMonkey; Tommy Nguyen, SurveyMonkey
- Best accuracy for gender inferring is 80%, Google has seen this
- Use mobile survey, but not everyone fills out the entire demographic survey
- Works to find twins, people you look like based on app usage
- Support vector machines try to split a scatter plot where male and female are as far apart as possible
- Give a lot of power to the edges to split the data
- Usually the data overlaps a ton, you don’t see men on the left and women on the right
- “Did this person use this app?” Split people based on gender, Pinterest is often the first node because it is the best differentiator right now, Grindr and emoticon use follow through to define the genders well, stop when a node is all one specific gender
- Men do use Pinterest though, ESPN is also a good indicator but it’s not perfect either, HotOrNot is more male
- Use time spend per app, app used, number of apps installed, websites visited, etc
- Random forest works the best
- Feature selection really matters, use a selected list not a random list
- Really big differences with tree depth
- Can’t apply the app model to the android model, the apps are different, the use of apps is different
Dissonance and Harmony: Exploring How Data Science Helped Solve a Complex Social Science Problem; Michael L. Jugovich, NORC at the University of Chicago; Emily White, NORC at the University of Chicago
- [another speaker who marched on when the computer screens decided they didn’t want to work 🙂 ]
- Recidivism research, going back to prison
- Wanted a national perspective of recidivism
- Offences differ by state, unstructured text forms means a lot of text interpretation, historical data is included which messes up the data if it’s vertical or horizontal in different states
- Have to account for short forms and spelling errors (kinfe)
- Getting the data into a useable format talks the longest time and most work
- Big data is often blue in pictures with spirals [funny comments 🙂 ]
- Old data is changed and new data is added all the time
- 30 000 regular expressions to identify all the pieces of text
- They seek 100% accuracy rate [well that’s completely impossible]
- Added in supervised learning and used to help improve the speed and efficiency of manual review process
- Wanted state specific and global economy models, over 300 models, used brute force model
- Want to improve with neural networks, auto make data base updates
Machine Learning Our Way to Happiness; Pablo Diego Rosell, The Gallup Organization
- Are machine learning models different/better than theory driven models
- Using Gallup daily tracking survey
- Measuring happiness using the ladder scale, best possible life to worst possible life, where do you fall along this continuum, Most people sit around 7 or 8
- 500 interviews everyday, RDD of landlines and mobile, English and Spanish, weighted to national targets and phone lines
- Most models get an R share of .29. Probably because they miss interactions we can’t even imagine
- Include variables that may not be justified in a theory driven model, include quadratic terms that you would never think of, expanded variables from 15 to 194
- [i feel like this isn’t necessarily machine learning but just traditional statistics with every available variable crossed with every other variable included in the process]
- For an 80% solution, needed only five variables
- This example didn’t uncover significant unmodeled variables
- [if machine learning is just as fast and just as predictive as a theory driven model, I’d take the theory driven model any day. If you don’t understand WHY a model is what it is, you can’t act on it as precisely.]
I recently debated big data with a worthy opponent in Marc Alley at the Corporate Research Conference. He stood firm in his belief that big data is the best type of data whereas I stood firm in my position that traditional research is the only way to go. You can read a summary of the debate written by Jeffrey Henning here.
The interesting thing is that, outside of the debate, Marc and I seemed to agree on most points. Neither of us think that big data is the be all and end all. Neither of us think that market research answers every problem. But both of us were determined to present our side as if it was the only side.
In reality, the best type of data is ALL data. If you can access survey data and big data, you will be better off and have an improved understanding of thoughts, opinions, emotions, attitudes AND validated actions. If you can also access eye tracking data or focus group data or behavioural data, you will be far better off and have data that can speak to reliability or validity. Each data type will present you with a different view and a different perspective on reality. You might even see what looks like completely different results.
Different is not wrong. It’s not misleading. It’s not frustrating. Different results are enlightening, and they are indeed valid. Why do people do different than what they say? Why do people present contradictory data? That’s what is so fascinating about people. There is no one reality. People are complex and have many contradictory motivations. No single dataset can describe the reality of people.
There is no debate about whether big data has anything to offer. Though Marc and I did our best to bring you to our dark side, we must remember that every dataset, regardless of the source, has fascinating insights ready for you to discover. Grab as much data as you can.
By now you’ve heard about the three Vs of big data. Whether your concern is millions of research panel records, billions of transactional records, or trillions of web tracking records, we all have the same problem. The volume of data increases exponentially, the variety of data keeps increasing, and the speed, well, let’s think lightspeed. These issues alone make big data a worthy opponent.
Big data is also rife with missing data. It’s incomplete data, and it’s complicated data. It needs specialized analytical tools and specialized analysts. But those problems are also not the reason we’re failing.
Why are we failing at big data? Well, let’s take a step back to the survey and focus group world that market researchers love so much. When I think back to the last survey I wrote, it too was quite the beast. For just twelve minutes of respondent time, I spent many hours dreaming of, writing, tweaking, rewriting, and retweaking every single question and answer. I pondered every the, or, if, they, you, and why. I argued with myself about the possible ramifications that every single word might have on my results. In every case, I settled on the best solution, not the right solution. In the end, I had a survey that would carefully address every single hypothesis and research objective on my list. This survey was a beauty and the analysis was quick and easy.
Let’s move forward to our big data project. You know, the one where someone dumped a giant SQL database with thousands of variables and billions of records on your plate and said, “Make our program better.” You weren’t really sure what the program was, you didn’t know what was currently good or bad about it, and none of the database variables matched up with any project plans or research objectives. Actually, there were no research objectives. Except for “make better.” I can assure that is NOT a solid research objective.
Imagine if someone collected together a hundred surveys from a hundred projects and told you to “make better.” I can guarantee you would fail at that survey analysis regardless of how many years of survey analysis you had behind you.
The simple reason we continue to fail at big data is that we fail to create concrete and specific research plans and objectives as we do for every other research project. We know very well that a survey project will fail without carefully operationalized objectives but when we work with big data, we ignore this essential step. We don’t plan ahead with specific variables, we don’t list out potential hypotheses, we don’t have a game plan. “Find something cool” isn’t a game plan. Nor is “how can we improve?” Big data needs big brains to plan and organize and be specific.
Do you want to succeed at big data? Then stop treating it like a magical panacea and do the work. Do the hard work.