Chapter 6. Analyzing Text Data

In this chapter, we will cover the following recipes:

  • Preprocessing data using tokenization
  • Stemming text data
  • Converting text to its base form using lemmatization
  • Dividing text using chunking
  • Building a bag-of-words model
  • Building a text classifier
  • Identifying the gender
  • Analyzing the sentiment of a sentence
  • Identifying patterns in text using topic modeling

Introduction

Text analysis and natural language processing (NLP) is an integral part of modern artificial intelligence systems. Computers are good at understanding rigidly-structured data with limited variety. However, when we deal with unstructured free-form text, things begin to get difficult. Developing NLP applications is challenging because computers have a hard time ...

Get Python: Real World Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.