You are previewing Machine Learning for Email.
O'Reilly logo
Machine Learning for Email

Book Description

This compact book explores standard tools for text classification, and teaches the reader how to use machine learning to decide whether a e-mail is spam or ham (binary classification), based on raw data from The SpamAssassin Public Corpus. Of course, sometimes the items in one class are not created equally, or we want to distinguish among them in some meaningful way. The second part of the book will look at how to not only filter spam from our email, but also placing "more important" messages at the top of the queue. This is a curated excerpt from the upcoming book "Machine Learning for Hackers."

Table of Contents

  1. Special Upgrade Offer
  2. Preface
    1. Machine Learning for Hackers: Email
    2. How This Book is Organized
    3. Conventions Used in This Book
    4. Using Code Examples
    5. Safari® Books Online
    6. How to Contact Us
  3. 1. Using R
    1. R for Machine Learning
      1. Downloading and Installing R
        1. Windows
        2. Mac OS X
        3. Linux
      2. IDEs and Text Editors
      3. Loading and Installing R Packages
      4. R Basics for Machine Learning
        1. Loading libraries and the data
        2. Converting date strings, and dealing with malformed data
        3. Organizing location data
        4. Dealing with data outside our scope
        5. Aggregating and organizing the data
        6. Analyzing the data
    2. Further Reading on R
  4. 2. Data Exploration
    1. Exploration vs. Confirmation
    2. What is Data?
    3. Inferring the Types of Columns in Your Data
    4. Inferring Meaning
    5. Numeric Summaries
    6. Means, Medians, and Modes
    7. Quantiles
    8. Standard Deviations and Variances
    9. Exploratory Data Visualization
      1. Modes
      2. Skewness
      3. Thin Tails vs. Heavy Tails
    10. Visualizing the Relationships between Columns
  5. 3. Classification: Spam Filtering
    1. This or That: Binary Classification
    2. Moving Gently into Conditional Probability
    3. Writing Our First Bayesian Spam Classifier
      1. Defining the Classifier and Testing It with Hard Ham
      2. Testing the Classifier Against All Email Types
      3. Improving the Results
  6. 4. Ranking: Priority Inbox
    1. How Do You Sort Something When You Don’t Know the Order?
    2. Ordering Email Messages by Priority
      1. Priority Features Email
    3. Writing a Priority Inbox
      1. Functions for Extracting the Feature Set
      2. Creating a Weighting Scheme for Ranking
        1. A Log-Weighting Scheme
      3. Weighting from Email Thread Activity
      4. Training and Testing the Ranker
  7. Works Cited
  8. About the Authors
  9. Special Upgrade Offer
  10. Copyright