Creating a wordlist corpus
The WordListCorpusReader
class is one of the simplest CorpusReader
classes. It provides access to a file containing a list of words, one word per line. In fact, you've already used it when we used the stopwords corpus in Chapter 1, Tokenizing Text and WordNet Basics, in the Filtering stopwords in a tokenized sentence and Discovering word collocations recipes.
Getting ready
We need to start by creating a wordlist file. This could be a single column CSV file, or just a normal text file with one word per line. Let's create a file named wordlist
that looks like this:
nltk corpus corpora wordnet
How to do it...
Now we can instantiate a WordListCorpusReader
class that will produce a list of words from our file. It takes two arguments: ...
Get Python 3 Text Processing with NLTK 3 Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.