Text Wrangling and Analysis

In this chapter, we will cover:

  • Installing NLTK
  • Performing sentence splitting
  • Performing tokenization
  • Performing stemming
  • Performing lemmatization
  • Identifying and removing stop words
  • Calculating the frequency distribution of words
  • Identifying and removing rare words
  • Identifying and removing short words
  • Removing punctuation marks
  • Piecing together n-grams
  • Scraping a job listing from StackOverflow
  • Reading and cleaning the description in the job listCreating a word cloud from a StackOverflow job listing

Get Python Web Scraping Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.