Getting ready

To illustrate how to use bag-of-words with a text dataset, we will use a spam-ham phone text database from the UCI machine learning data repository (https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection). This is a collection of phone text messages that are spam or not-spam (ham). We will download this data, store it for future use, and then proceed with the bag-of-words method to predict if a text is spam or not. The model that will operate on the bag-of-words algorithm will be a logistic model with no hidden layers. We will use stochastic training, with a batch size of 1, and compute the accuracy on a held-out test set at the end.

Get TensorFlow Machine Learning Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.