Chapter 8. Text Mining and Social Network Analysis

In this chapter, we will cover the following recipes:

  • Creating a categorized corpus
  • Tokenizing news articles in sentences and words
  • Stemming, lemmatizing, filtering, and TF-IDF scores
  • Recognizing named entities
  • Extracting topics with non-negative matrix factorization
  • Implementing a basic terms database
  • Computing social network density
  • Calculating social network closeness centrality
  • Determining the betweenness centrality
  • Estimating the average clustering coefficient
  • Calculating the assortativity coefficient of a graph
  • Getting the clique number of a graph
  • Creating a document graph with cosine similarity

Introduction

Humans have communicated through language for thousands of years. Handwritten texts have been around ...

Get Python Data Analysis Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.