O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Machine Learning and Security

Book Description

Can machine learning techniques solve our computer security problems and finally put an end to the cat-and-mouse game between attackers and defenders? Or is this hope merely hype? Now you can dive into the science and answer this question for yourself. With this practical guide, you’ll explore ways to apply machine learning to security issues such as intrusion detection, malware classification, and network analysis.

Machine learning and security specialists Clarence Chio and David Freeman provide a framework for discussing the marriage of these two fields, as well as a toolkit of machine-learning algorithms that you can apply to an array of security problems. This book is ideal for security engineers and data scientists alike.

  • Learn how machine learning has contributed to the success of modern spam filters
  • Quickly detect anomalies, including breaches, fraud, and impending system failure
  • Conduct malware analysis by extracting useful information from computer binaries
  • Uncover attackers within the network by finding patterns inside datasets
  • Examine how attackers exploit consumer-facing websites and app functionality
  • Translate your machine learning algorithms from the lab to production
  • Understand the threat attackers pose to machine learning solutions

Table of Contents

  1. Preface
    1. What’s In This Book?
    2. Who Is This Book For?
    3. Conventions Used in This Book
    4. Using Code Examples
    5. O’Reilly Safari
    6. How to Contact Us
    7. Acknowledgments
  2. 1. Why Machine Learning and Security?
    1. Cyber Threat Landscape
    2. The Cyber Attacker’s Economy
      1. A Marketplace for Hacking Skills
      2. Indirect Monetization
      3. The Upshot
    3. What Is Machine Learning?
      1. What Machine Learning Is Not
      2. Adversaries Using Machine Learning
    4. Real-World Uses of Machine Learning in Security
    5. Spam Fighting: An Iterative Approach
    6. Limitations of Machine Learning in Security
  3. 2. Classifying and Clustering
    1. Machine Learning: Problems and Approaches
    2. Machine Learning in Practice: A Worked Example
    3. Training Algorithms to Learn
      1. Model Families
      2. Loss Functions
      3. Optimization
    4. Supervised Classification Algorithms
      1. Logistic Regression
      2. Decision Trees
      3. Decision Forests
      4. Support Vector Machines
      5. Naive Bayes
      6. k-Nearest Neighbors
      7. Neural Networks
    5. Practical Considerations in Classification
      1. Selecting a Model Family
      2. Training Data Construction
      3. Feature Selection
      4. Overfitting and Underfitting
      5. Choosing Thresholds and Comparing Models
    6. Clustering
      1. Clustering Algorithms
      2. Evaluating Clustering Results
    7. Conclusion
  4. 3. Anomaly Detection
    1. When to Use Anomaly Detection Versus Supervised Learning
    2. Intrusion Detection with Heuristics
    3. Data-Driven Methods
    4. Feature Engineering for Anomaly Detection
      1. Host Intrusion Detection
      2. Network Intrusion Detection
      3. Web Application Intrusion Detection
      4. In Summary
    5. Anomaly Detection with Data and Algorithms
      1. Forecasting (Supervised Machine Learning)
      2. Statistical Metrics
      3. Goodness-of-Fit
      4. Unsupervised Machine Learning Algorithms
      5. Density-Based Methods
      6. In Summary
    6. Challenges of Using Machine Learning in Anomaly Detection
    7. Response and Mitigation
    8. Practical System Design Concerns
      1. Optimizing for Explainability
      2. Maintainability of Anomaly Detection Systems
      3. Integrating Human Feedback
      4. Mitigating Adversarial Effects
    9. Conclusion
  5. 4. Malware Analysis
    1. Understanding Malware
      1. Defining Malware Classification
      2. Malware: Behind the Scenes
    2. Feature Generation
      1. Data Collection
      2. Generating Features
      3. Feature Selection
    3. From Features to Classification
      1. How to Get Malware Samples and Labels
    4. Conclusion
  6. 5. Network Traffic Analysis
    1. Theory of Network Defense
      1. Access Control and Authentication
      2. Intrusion Detection
      3. Detecting In-Network Attackers
      4. Data-Centric Security
      5. Honeypots
      6. Summary
    2. Machine Learning and Network Security
      1. From Captures to Features
      2. Threats in the Network
      3. Botnets and You
    3. Building a Predictive Model to Classify Network Attacks
      1. Exploring the Data
      2. Data Preparation
      3. Classification
      4. Supervised Learning
      5. Semi-Supervised Learning
      6. Unsupervised Learning
      7. Advanced Ensembling
    4. Conclusion
  7. 6. Protecting the Consumer Web
    1. Monetizing the Consumer Web
    2. Types of Abuse and the Data That Can Stop Them
      1. Authentication and Account Takeover
      2. Account Creation
      3. Financial Fraud
      4. Bot Activity
    3. Supervised Learning for Abuse Problems
      1. Labeling Data
      2. Cold Start Versus Warm Start
      3. False Positives and False Negatives
      4. Multiple Responses
      5. Large Attacks
    4. Clustering Abuse
      1. Example: Clustering Spam Domains
      2. Generating Clusters
      3. Scoring Clusters
    5. Further Directions in Clustering
    6. Conclusion
  8. 7. Production Systems
    1. Defining Machine Learning System Maturity and Scalability
      1. What’s Important for Security Machine Learning Systems?
    2. Data Quality
      1. Problem: Bias in Datasets
      2. Problem: Label Inaccuracy
      3. Solutions: Data Quality
      4. Problem: Missing Data
      5. Solutions: Missing Data
    3. Model Quality
      1. Problem: Hyperparameter Optimization
      2. Solutions: Hyperparameter Optimization
      3. Feature: Feedback Loops, A/B Testing of Models
      4. Feature: Repeatable and Explainable Results
    4. Performance
      1. Goal: Low Latency, High Scalability
      2. Performance Optimization
      3. Horizontal Scaling with Distributed Computing Frameworks
      4. Using Cloud Services
    5. Maintainability
      1. Problem: Checkpointing, Versioning, and Deploying Models
      2. Goal: Graceful Degradation
      3. Goal: Easily Tunable and Configurable
    6. Monitoring and Alerting
    7. Security and Reliability
      1. Feature: Robustness in Adversarial Contexts
      2. Feature: Data Privacy Safeguards and Guarantees
    8. Feedback and Usability
    9. Conclusion
  9. 8. Adversarial Machine Learning
    1. Terminology
    2. The Importance of Adversarial ML
    3. Security Vulnerabilities in Machine Learning Algorithms
      1. Attack Transferability
    4. Attack Technique: Model Poisoning
      1. Example: Binary Classifier Poisoning Attack
      2. Attacker Knowledge
      3. Defense Against Poisoning Attacks
    5. Attack Technique: Evasion Attack
      1. Example: Binary Classifier Evasion Attack
      2. Defense Against Evasion Attacks
    6. Conclusion
  10. A. Supplemental Material for Chapter 2
    1. More About Metrics
    2. Size of Logistic Regression Models
    3. Implementing the Logistic Regression Cost Function
    4. Minimizing the Cost Function
  11. B. Integrating Open Source Intelligence
    1. Security Intelligence Feeds
    2. Geolocation
  12. Index