Acquiring the example data

In this chapter, we will explore the Ling-Spam email dataset (The original dataset is described at http://csmining.org/index.php/ling-spam-datasets.html). Download the dataset from http://data.scala4datascience.com/ling-spam.tar.gz (or ling-spam.zip, depending on your preferred mode of compression), and unpack the contents to the directory containing the code examples for this chapter. The archive contains two directories, spam/ and ham/, containing the spam and legitimate emails, respectively.

Get Scala for Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.