How to do it...

  1. Import the following packages:
from nltk.tokenize import RegexpTokenizerfrom nltk.stem.snowball import SnowballStemmerfrom gensim import models, corporafrom nltk.corpus import stopwords
  1. Load the input data:
def load_words(in_file):  element = []  with open(in_file, 'r') as f:    for line in f.readlines():      element.append(line[:-1])  return element
  1. Class to pre-process text:
classPreprocedure(object):  def __init__(self):    # Create a regular expression tokenizer    self.tokenizer = RegexpTokenizer(r'w+')
  1. Obtain a list of stop words to terminate the program execution:
    self.english_stop_words= stopwords.words('english')
  1. Create a Snowball stemmer:
    self.snowball_stemmer = SnowballStemmer('english')  
  1. Define a function to perform ...

Get Raspberry Pi 3 Cookbook for Python Programmers - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.