Start by importing all necessary modules and defining the domain (LJ community) of interest. We suggest using Pandas and NumPy, the power tools of data science, and NLTK—the Natural Language Toolkit—in this project, as well as some other libraries, so you need to import them. (If you last used them a while ago, you might want to blow the dust off your skill set [Zin16].)
|||import urllib.request, os.path, pickle # Download and cache|
|||import nltk # Convert text to terms|
|||import networkx as nx, community # Build and analyze the network|
|||import pandas as pd, numpy as np # Data science power tools|
Your next step is to get and cache term lists. A term is a unit of CDA. It can be a word, a word group, a word ...