How to do it

Proceed with the recipe as follows (code is available in 07/06_freq_dist.py): 

  1. The following demonstrates stop word removal using NLTK.  First, start with importing stop words:
>>> from nltk.corpus import stopwords
  1. Then select the stop words for your desired language. The following selects English:
>>> stoplist = stopwords.words('english')
  1. The English stop list has 153 words:
>>> len(stoplist)153
  1. That's not too many that we can't show them all here:
>>> stoplist ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers', 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', ...

Get Python Web Scraping Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.