Chapter 15. Sentiment Analysis

Sentiment means “a general thought, view, feeling, emotion, opinion, or sense,” and Wikipedia describes sentiment analysis (also known as opinion mining) as “the use of natural language processing, text analysis, and computational linguistics to identify and extract subjective information in source materials.” Bo Pang and Lillian Lee[21] wrote that “sentiment analysis seeks to identify the viewpoint(s) underlying a text span; an example application is classifying a movie review as thumbs up or thumbs down.” To perform a sentiment analysis about some event, we need to teach computers what a sentiment is (i.e., how to define “positive” or “negative” and “good” or “bad”). This is where machine learning comes in: we must teach computers the meaning of positive, negative, and so on. The first step in this process is to build a model from a set of training data. After the model is built, we will use it to analyze new data.

So what is sentiment data? Typically, it is unstructured data that represents opinions and emotions contained in sources such as special news bulletins, customer support emails, social media posts (such as tweets and Facebook comments), and online product reviews.

To perform a good sentiment analysis, the sentiment analysis engine has to conduct some level of speech analysis and word-sense disambiguation. Therefore, a sentiment analysis of a text document involves more than tokenizing words and checking them against a list of “positive” ...

Get Data Algorithms now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.