Harnessing text for human insights #IIeX 

Live note-taking at #IIeX in Atlanta. Any errors or bad jokes are my own.

Chaired by Seth Grimes

Automated text coding: humans and machines learning together by Stu Shulman

  • It is a 2500 year old problem, Plato argued it would be frustrating and it still is.
  • Coders are expensive, it’s difficult at scale, some models are easier to validation than others, don’t replace humans, no one right way to do it, validation of humans and machines is essential
  • Want to efficiently code, annotate coding with shared memoirs, manage coding permissions, have unlimited collaborators, easily measure inter-rather reliability, adjudicate validity decisions
  • Wanted to take the mouse out of the process, so items load efficiently for coding
  • Computer science and HSF influence  measure everything 
  • Measure how fast each annotator works, measure interacter reliability, reliability can change drastically by topic
  • Adjudication – sometimes it’s clear when an error has been made, allows you to create a gold standard training set, and give feedback to coders; can identify which coders are weak at even the simplest task, there is human aptitude and not everyone has it, there is a distribution of competencies 
  • 25% of codes are wrong so you need to train machines to trust the people who do a better job at coding
  • Pillars of text analytics – search, filtering, deduplication and clustering and works well with surveys as well, human coding or labelling or tagging which is where most of their work goes, machine learning – this gives a high quality training set
  • If humans can’t do the labelling, then the machines can’t either
  • Always good to keep humans in the loop
  • Word sense dis ambiguities – relevant – is bridge a game or a road, it smoking a cigarette or being awesome

Automated classification interesting, at scale and depth by Ian McCarty

  • Active data collection is specific and granular, as well as standardized; but it’s slow and difficult to scale, there is uncertainty, may be observer bias via social desirability, demand characteristics, Hawthorne effect [EVERY method has strength and weaknesses]
  • Declared vs demonstrated interests – you can give 5 stars to a great movie and then watch Paul Blart Mall Cop 5 times a 6 months [Paul Blart is a great movie! Loved it 🙂 ]
  • They replicate the experience of a specific URL to generate more specific data
  • Closed network use case – examined search queries from members to recruit them into studies, segmentation was manual and company needed to automate and scale; lowered per person costs and increased accuracy, found more panelists in more specific clusters, normalized surveys if declared behaviors conflicted with demonstrated behaviors 
  • Open network use case: home improvement brand needed a modern shared meaning with customers, wanted to automate a manual process; distinguished brand follower end compared to competitive followers, identified where brand values and consumer values aligned, delivered map for future content creation and path to audience connection

Text analytics or social media insights by Michalis Michael

  • Next gen research is here now, listening, asking questions, tracking behavior, insights experts
  • Revenues don’t reflect expectations, yet.
  • We’re not doing a great job of integrating insights yet, social media listening analytics is not completely integrated in our industry yet 
  • Homonyms are major noise, eliminating them needs humans and machines
  • Machine learning is language agnostic, create a taxonomy with it, a dictionary of the product category using the words that people use in social media not marketing words
  • It is possible to have 80% agreement with text analytics and the human [I believe this when the language is reasonably simple and known]
  • Becks means beer and David beckham but you need training algorithms to do this, Beck Hanson is a singer, you need hundreds of clarifications to identify the exact Becks that is beer
  • Beer is related to appearance and occasions, break down occasions into in home or out of home, then at a BBQ or club
  • What do you say about a beer when they do a commercial that has nothing to do with the beer
  • English has s a lot of sarcasm, more than a lot of other languages [yeah right, sure, I believe you]
  • Break down sentiment into emotions – anger, desire, disgust, hate, joy, love, sadness – can benchmark brands in these categories as well
  • Can benchmark NPS with social media
  • Brand tracking questions can be matched to topics in a social media taxonomy, and there can be even more in the social media version than the survey version
%d bloggers like this: