Machine Learning with scikit-learn Quick Start Guide

Book description

Deploy supervised and unsupervised machine learning algorithms using scikit-learn to perform classification, regression, and clustering.

Key Features

  • Build your first machine learning model using scikit-learn
  • Train supervised and unsupervised models using popular techniques such as classification, regression and clustering
  • Understand how scikit-learn can be applied to different types of machine learning problems

Book Description

Scikit-learn is a robust machine learning library for the Python programming language. It provides a set of supervised and unsupervised learning algorithms. This book is the easiest way to learn how to deploy, optimize, and evaluate all of the important machine learning algorithms that scikit-learn provides.

This book teaches you how to use scikit-learn for machine learning. You will start by setting up and configuring your machine learning environment with scikit-learn. To put scikit-learn to use, you will learn how to implement various supervised and unsupervised machine learning models. You will learn classification, regression, and clustering techniques to work with different types of datasets and train your models.

Finally, you will learn about an effective pipeline to help you build a machine learning project from scratch. By the end of this book, you will be confident in building your own machine learning models for accurate predictions.

What you will learn

  • Learn how to work with all scikit-learn's machine learning algorithms
  • Install and set up scikit-learn to build your first machine learning model
  • Employ Unsupervised Machine Learning Algorithms to cluster unlabelled data into groups
  • Perform classification and regression machine learning
  • Use an effective pipeline to build a machine learning project from scratch

Who this book is for

This book is for aspiring machine learning developers who want to get started with scikit-learn. Intermediate knowledge of Python programming and some fundamental knowledge of linear algebra and probability will help.

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. Machine Learning with scikit-learn Quick Start Guide
  3. Dedication
  4. About Packt
    1. Why subscribe?
    2. Packt.com
  5. Contributors
    1. About the author
    2. About the reviewers
    3. Packt is searching for authors like you
  6. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Code in action
      3. Conventions used
    4. Get in touch
      1. Reviews
  7. Introducing Machine Learning with scikit-learn
    1. A brief introduction to machine learning
      1. Supervised learning
      2. Unsupervised learning
    2. What is scikit-learn?
    3. Installing scikit-learn
      1. The pip method
      2. The Anaconda method
      3. Additional packages
        1. Pandas
        2. Matplotlib
        3. Tree
        4. Pydotplus
        5. Image
    4. Algorithms that you will learn to implement using scikit-learn
      1. Supervised learning algorithms
      2. Unsupervised learning algorithms
    5. Summary
  8. Predicting Categories with K-Nearest Neighbors
    1. Technical requirements
    2. Preparing a dataset for machine learning with scikit-learn
      1. Dropping features that are redundant
      2. Reducing the size of the data
      3. Encoding the categorical variables
      4. Missing values
    3. The k-NN algorithm
    4. Implementing the k-NN algorithm using scikit-learn
      1. Splitting the data into training and test sets
      2. Implementation and evaluation of your model
    5. Fine-tuning the parameters of the k-NN algorithm
    6. Scaling for optimized performance
    7. Summary
  9. Predicting Categories with Logistic Regression
    1. Technical requirements
    2. Understanding logistic regression mathematically
    3. Implementing logistic regression using scikit-learn
      1. Splitting the data into training and test sets
    4. Fine-tuning the hyperparameters
    5. Scaling the data
    6. Interpreting the logistic regression model
    7. Summary
  10. Predicting Categories with Naive Bayes and SVMs
    1. Technical requirements
    2. The Naive Bayes algorithm
      1. Implementing the Naive Bayes algorithm in scikit-learn
    3. Support vector machines
      1. Implementing the linear support vector machine algorithm in scikit-learn
      2. Hyperparameter optimization for the linear SVMs
        1. Graphical hyperparameter optimization
        2. Hyperparameter optimization using GridSearchCV
      3. Scaling the data for performance improvement
    4. Summary
  11. Predicting Numeric Outcomes with Linear Regression
    1. Technical requirements
    2. The inner mechanics of the linear regression algorithm
    3. Implementing linear regression in scikit-learn
      1. Linear regression in two dimensions
      2. Using linear regression to predict mobile transaction amount
      3. Scaling your data
    4. Model optimization
      1. Ridge regression
      2. Lasso regression
    5. Summary
  12. Classification and Regression with Trees
    1. Technical requirements
    2. Classification trees
      1. The decision tree classifier
        1. Picking the best feature
        2. The Gini coefficient
        3. Implementing the decision tree classifier in scikit-learn
        4. Hyperparameter tuning for the decision tree
        5. Visualizing the decision tree
      2. The random forests classifier
        1. Implementing the random forest classifier in scikit-learn
        2. Hyperparameter tuning for random forest algorithms
      3. The AdaBoost classifier
        1. Implementing the AdaBoost classifier in scikit-learn
        2. Hyperparameter tuning for the AdaBoost classifier
    3. Regression trees
      1. The decision tree regressor
        1. Implementing the decision tree regressor in scikit-learn
        2. Visualizing the decision tree regressor
      2. The random forest regressor
        1. Implementing the random forest regressor in scikit-learn
      3. The gradient boosted tree
        1. Implementing the gradient boosted tree in scikit-learn
    4. Ensemble classifier
      1. Implementing the voting classifier in scikit-learn
    5. Summary
  13. Clustering Data with Unsupervised Machine Learning
    1. Technical requirements
    2. The k-means algorithm
      1. Assignment of centroids
      2. When does the algorithm stop iterating?
    3. Implementing the k-means algorithm in scikit-learn
      1. Creating the base k-means model
      2. The optimal number of clusters
    4. Feature engineering for optimization
      1. Scaling
      2. Principal component analysis
    5. Cluster visualization
      1. t-SNE
      2. Hierarchical clustering
        1. Step 1 – Individual features as individual clusters
        2. Step 2 – The merge
        3. Step 3 – Iteration
        4. Implementing hierarchical clustering
    6. Going from unsupervised to supervised learning
      1. Creating a labeled dataset
      2. Building the decision tree
    7. Summary
  14. Performance Evaluation Methods
    1. Technical requirements
    2. Why is performance evaluation critical?
    3. Performance evaluation for classification algorithms
      1. The confusion matrix
      2. The normalized confusion matrix
      3. Area under the curve
      4. Cumulative gains curve
      5. Lift curve
      6. K-S statistic plot
      7. Calibration plot
      8. Learning curve
      9. Cross-validated box plot
    4. Performance evaluation for regression algorithms
      1. Mean absolute error
      2. Mean squared error
      3. Root mean squared error
    5. Performance evaluation for unsupervised algorithms
      1. Elbow plot
    6. Summary
  15. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Machine Learning with scikit-learn Quick Start Guide
  • Author(s): Kevin Jolly
  • Release date: October 2018
  • Publisher(s): Packt Publishing
  • ISBN: 9781789343700