The Power Of Big Data: How We Predicted the World’s Largest Music Poll Using Social Media by Nick Drewe #CASRO #MRX

Live blogging from the #CASRO tech conference in Chicago. Any errors or bad jokes are my own.

The Power Of Big Data: How We Predicted the World’s Largest Music Poll Using Social Media by Nick Drewe, Creative Technologist, The Data Pack

  • it used to be taboo to say your real name online
  • Nick Drewethere is no line between what is signal and what is noise. what is gold to me is trash to you
  • when you know that hundreds and thousands of people have posted noise about a brand, it’s no longer noise
  • Radio station called TripleJ, a national funded station, like NPR or BBC and it’s aimed for a younger audience. They post the hottest 100 songs as voted by listeners through the year. it’s a national institution. 1.4 million votes cast last year.
  • results used to be closely guarded up until number 1 song
  • station had people share what the voted for in hopes of getting more people to vote. every page was hosted on a unique URL which suggests every vote has a page. other little bits of code with info were on the page too. if they could find and collect enough of these pages, they might be able to predict.
  • used twitter api and found 40 000 votes in a few minutes, a sample size or 3 or 4%
  • created a list that seemed realistic but didn’t know what to do with it yet
  • set up a website where people could see their predictions and play the songs
  • turned the website into a disclaimer, people had to scroll way way way down to get to the number one song
  • got a ton of traffic, more people saw it than people who voted
  • made the front page of one of the biggest newspapers
  • not yet sure how accurate they were yet
  • colleague ran a bootstrap of 3.5% sample and concluded they’d get 90 songs 100% accurate or 95 songs at 90% accurate, and #1 song with 83% accuracy
  • the next year, the station closed all the social sharing features
  • found 400 votes that were posted as screencaps to twitter, their confirmation emails
  • but photos are also posted on instagram, found 20 000 votes there after searching for them
  • even if you really really really don’t want people to share something online, they will do it anyways
  • predicted 82 out of 100 songs in the second year with half the amount of data
  • it was an experiment in social data
  • most networks have free APIs to share and use data, most networks don’t really know what to do with the data
  • posts don’t have to sit in isolation online, we can turn these into insights
  • people don’t post the same things in social media that they post on surveys
  • 60 million posts on instagram every day, rich with metadata, a photo contains geolocation, 20 million photos a day have a location [i always turn off my geolocation, decline, decline, decline]
  • can search on username, hashtag, and location – but it must be part of a hashtag not a description
  • youtube is still the largest music sharing site
  • can use youtube, twitter, facebook data to predict music you will like [try me – rankin family, leahy, michelle branch, vanessa carlton]
  • a single message is rarely valuable but a group of messages is, particularly with all the metadata
  • every link tells you something about the person who shared it – what they like, don’t like, know and don’t know, cat gifs too
  • google’s page rank looks at links to your website, more websites gets you a higher page rank, and greater likelihood to appear in a search result – this is a social graph and can be done on a personal level, not just what they’ve shared about a specific topic but everything else they’re doing
  • [Nick is wearing the same shirt today that is shown in his bio. LOVE that as I find matching people to photos very difficult]
  • everyone should try a social api, it’s not a difficult to use as you think it is, point isn’t to start writing code but to start thinking about big data and social data in a different way


Other Posts

Enhanced by Zemanta
%d bloggers like this: