NLTK provides a class, ntlk.probabilities.FreqDist, that allow us to very easily calculate the frequency distribution of values in a list. Let's examine using this class (code is in 07/freq_dist.py):
- To create a frequency distribution using NLTK, start by importing the feature from NTLK (and also tokenizers and stop words):
from nltk.probabilities import FreqDistfrom nltk.tokenize import regexp_tokenizefrom nltk.corpus import stopwords
- Then we can use the FreqDist function to create a frequency distribution given a list of words. We will examine this by reading in the contents of wotw.txt (The War of the Worlds - courtesy of Gutenberg), tokenizing, and removing stop words:
with open('wotw.txt', 'r') as file: data = file ...