Proceed with the recipe as follows (code is available in 07/06_freq_dist.py):
- The following demonstrates stop word removal using NLTK. First, start with importing stop words:
>>> from nltk.corpus import stopwords
- Then select the stop words for your desired language. The following selects English:
>>> stoplist = stopwords.words('english')
- The English stop list has 153 words:
>>> len(stoplist)153
- That's not too many that we can't show them all here:
>>> stoplist ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers', 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', ...