Chapter 9. Natural Language Processing

In this chapter, you will learn the following recipes:

  • Reading raw text from the Web
  • Tokenizing and normalizing text
  • Identifying parts of speech, handling n-grams, and recognizing named entities
  • Identifying the topic of an article
  • Identifying the sentence structure
  • Classifying movies based on their reviews

Introduction

Modeling based on structured data gathered via a controlled experiment (as we were doing in previous chapters) is relatively straightforward. However, in the real world, we rarely deal with structured data. This is especially true when it comes to understanding human-generated feedback or analyzing an article in a newspaper.

Natural Language Processing (NLP) is a discipline of computer science, statistics, ...

Get Practical Data Analysis Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.