This is the summary of an article by Sitaram Asur and Bernardo A. Huberman. This paper shows how content in social media can be used as a real-world outcome predictor. Twitter.com’s chatter was used to predict movie box-office revenues. The authors created a simple model from tweet rate about certain topics that actually performed better than market-based predictors. Furthermore, Twitter sentiments were also shown to help improve the predicting power of social media. You can get the pdf of the behavioral targeting article here: Predicting the Future With Social Media.
Social Media has made people discuss at a very rapid rate not seen before. As such, social networking sites, such as Facebook and Twitter, have set records in terms of setting trends for a wide variety of topics including technology, entertainment, politics and even the environment.
In social media, the huge and varying amount of data that is found in large social webs allows entities to gather and use these data to create predictions about certain outcomes, and even create models that determine the general opinions of a huge population to collect important insights regarding that population’s behavior. Conversations among social media users regarding products can also be used to design and market ad campaigns.
Predicting Box Office Using Twitter
This study tries to determine if Twitter chatter can be used to predict box-office revenues. Twitter is a very popular micro-blogging social networking service, where you are limited to 140 characters when expressing your thoughts. Twitter currently has more than 300 million members, and 8 new accounts are added every second.
Box office revenues for movies is a very suitable topic for this study because social media users like to talk about films, and box office revenues outcomes are fairly easy to measure.
2.89 million tweets from 1.2 million Twitter users for about 24 movies were extracted using Twitter Search Api. These movies were released over a three month span; on average two new movies are released each week, so the data taking also occurred during that span of time.
Date of release was recorded, including the “critical period” which is the week before release to two weeks after release. These are critical periods because the buzz about a movie is strongest during these weeks. Movies studied include Avatar, The Blind Side, Dear John, Daybreakers, and Twilight: New Moon.
Attention and Popularity
The study is interested in determining how movies get attention and become popular through Twitter.
Pre-release attention can be obtained through tweets that share the promotional material that producers share online, such as posters, trailers and the like. Twitter users can post URLs of trailers and retweets, which are very important in disseminating the information to others rapidly.
Results show that there are more URL tweets before release rather than afterwards, and the number of retweets is basically consistent throughout the three week period, probably because users want to share their own movie experiences.
The next question is, is it possible to accurately predict box office revenues of movies using movie tweets? The measure of tweet-rate was defined, which means the number of tweets for a particular movie per hour. A linear regression model was created to make the predictions, and the predictions assessed using box office revenues data from box office mojo.
“Transylvania,” had the lowest tweet-rate, of only 2.75 tweets per hour, and consequently, it had the lowest gross opening at around 264 thousand dollars. In contrast, “Twilight: New Moon” grossed 142 million dollars, and had a tweet-rate of 1365.8 tweets per hour.
The HSX or hollywood stock exchanges model was compared with this study’s Twitter model, and it has been found out that this study’s tweet-rate regression model outperformed the model based on HSX.
Sentiment Analysis was used to determine whether the prevalent sentiments towards a certain movie can accurately predict its performance in the box office. A sentiment analysis classifier was trained with the help of a thousand workers from Amazon Mechanical Turk, where 3 people labeled each tweet as Positive, Negative or Neutral.
Two concepts are studied here, Subjectivity and Polarity. Subjectivity is defined as the ratio between positive and negative tweets and neutral tweets, while polarity is the ratio between positive and negative tweets. Results showed that there are more subjective, positive or negative tweets after the release as hypothesized.
In addition, polarity is found to be a good measure of variance revenues among movies. For example, the movie, The Blind Side, had an increase in polarity from 5.02 to 9.65 and this was reflected in an increasing in box office sales from the first week to the next week (34 million to 40.1 million).
The method used in this study can also be used for other topics other than predicting box office revenues. That includes product ratings and election outcomes. This just goes to show that social media contains a lot of collective wisdom which can be used to accurately predict future outcomes.