Chapter 4. Clustering, Classification, and Regression

In this chapter, we will cover the following recipes:

  • Introduction
  • Applying regression analysis for sales data
    • Variable identification
    • Data exploration
    • Feature engineering
    • Applying linear regression
  • Applying logistic regression on bank marketing data
    • Variable identification
    • Data exploration
    • Feature engineering
    • Applying logistic regression
  • Real-time intrusion detection using streaming k-means
    • Variable identification
    • Producer code generating real-time data
    • Applying streaming k-means

Introduction

Machine learning is a field of study that gives computers the ability to learn without being explicitly programmed. Many successful applications of machine learning exist already, including systems that analyse past sales ...

Get Apache Spark for Data Science Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.