Hands-On Automated Machine Learning

Book description

Automate data and model pipelines for faster machine learning applications

About This Book
  • Build automated modules for different machine learning components
  • Understand each component of a machine learning pipeline in depth
  • Learn to use different open source AutoML and feature engineering platforms
Who This Book Is For

If you're a budding data scientist, data analyst, or Machine Learning enthusiast and are new to the concept of automated machine learning, this book is ideal for you. You'll also find this book useful if you're an ML engineer or data professional interested in developing quick machine learning pipelines for your projects. Prior exposure to Python programming will help you get the best out of this book.

What You Will Learn
  • Understand the fundamentals of Automated Machine Learning systems
  • Explore auto-sklearn and MLBox for AutoML tasks
  • Automate your preprocessing methods along with feature transformation
  • Enhance feature selection and generation using the Python stack
  • Assemble individual components of ML into a complete AutoML framework
  • Demystify hyperparameter tuning to optimize your ML models
  • Dive into Machine Learning concepts such as neural networks and autoencoders
  • Understand the information costs and trade-offs associated with AutoML
In Detail

AutoML is designed to automate parts of Machine Learning. Readily available AutoML tools are making data science practitioners' work easy and are received well in the advanced analytics community. Automated Machine Learning covers the necessary foundation needed to create automated machine learning modules and helps you get up to speed with them in the most practical way possible.

In this book, you'll learn how to automate different tasks in the machine learning pipeline such as data preprocessing, feature selection, model training, model optimization, and much more. In addition to this, it demonstrates how you can use the available automation libraries, such as auto-sklearn and MLBox, and create and extend your own custom AutoML components for Machine Learning.

By the end of this book, you will have a clearer understanding of the different aspects of automated Machine Learning, and you'll be able to incorporate automation tasks using practical datasets. You can leverage your learning from this book to implement Machine Learning in your projects and get a step closer to winning various machine learning competitions.

Style and approach

Step by step approach to understand how to automate your machine learning tasks

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. Hands-On Automated Machine Learning
  3. Packt Upsell
    1. Why subscribe?
    2. PacktPub.com
  4. Contributors
    1. About the authors
    2. About the reviewers
    3. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  6. Introduction to AutoML
    1. Scope of machine learning
    2. What is AutoML?
    3. Why use AutoML and how does it help?
    4. When do you automate ML?
    5. What will you learn?
      1. Core components of AutoML systems
        1. Automated feature preprocessing
        2. Automated algorithm selection
        3. Hyperparameter optimization
      2. Building prototype subsystems for each component
      3. Putting it all together as an end–to–end AutoML system
    6. Overview of AutoML libraries
      1. Featuretools
      2. Auto-sklearn
      3. MLBox
      4. TPOT
    7. Summary
  7. Introduction to Machine Learning Using Python
    1. Technical requirements
    2. Machine learning
      1. Machine learning process
      2. Supervised learning
      3. Unsupervised learning
    3. Linear regression
      1. What is linear regression?
        1. Working of OLS regression
        2. Assumptions of OLS
      2. Where is linear regression used?
      3. By which method can linear regression be implemented?
    4. Important evaluation metrics – regression algorithms
    5. Logistic regression
      1. What is logistic regression?
      2. Where is logistic regression used?
      3. By which method can logistic regression be implemented?
    6. Important evaluation metrics – classification algorithms
    7. Decision trees
      1. What are decision trees?
      2. Where are decision trees used?
      3. By which method can decision trees be implemented?
    8. Support Vector Machines
      1. What is SVM?
      2. Where is SVM used?
      3. By which method can SVM be implemented?
    9. k-Nearest Neighbors
      1. What is k-Nearest Neighbors?
      2. Where is KNN used?
      3. By which method can KNN be implemented?
    10. Ensemble methods
      1. What are ensemble models?
        1. Bagging
        2. Boosting
        3. Stacking/blending
    11. Comparing the results of classifiers
    12. Cross-validation
    13. Clustering
      1. What is clustering?
      2. Where is clustering used?
      3. By which method can clustering be implemented?
      4. Hierarchical clustering
      5. Partitioning clustering (KMeans)
    14. Summary
  8. Data Preprocessing
    1. Technical requirements
    2. Data transformation
      1. Numerical data transformation
        1. Scaling
        2. Missing values
        3. Outliers
          1. Detecting and treating univariate outliers
          2. Inter-quartile range
          3. Filtering values
          4. Winsorizing
          5. Trimming
          6. Detecting and treating multivariate outliers
        4. Binning
        5. Log and power transformations
      2. Categorical data transformation
        1. Encoding
        2. Missing values for categorical data transformation
      3. Text preprocessing
    3. Feature selection
      1. Excluding features with low variance
      2. Univariate feature selection
      3. Recursive feature elimination
      4. Feature selection using random forest
      5. Feature selection using dimensionality reduction
        1. Principal Component Analysis
    4. Feature generation
    5. Summary
  9. Automated Algorithm Selection
    1. Technical requirements
    2. Computational complexity
      1. Big O notation
    3. Differences in training and scoring time
      1. Simple measure of training and scoring time 
      2. Code profiling in Python
      3. Visualizing performance statistics
      4. Implementing k-NN from scratch
      5. Profiling your Python script line by line
    4. Linearity versus non-linearity
      1. Drawing decision boundaries
      2. Decision boundary of logistic regression
      3. The decision boundary of random forest
      4. Commonly used machine learning algorithms
    5. Necessary feature transformations
    6. Supervised ML
      1. Default configuration of auto-sklearn
      2. Finding the best ML pipeline for product line prediction
      3. Finding the best machine learning pipeline for network anomaly detection
    7. Unsupervised AutoML
      1. Commonly used clustering algorithms
      2. Creating sample datasets with sklearn
      3. K-means algorithm in action
      4. The DBSCAN algorithm in action
      5. Agglomerative clustering algorithm in action
      6. Simple automation of unsupervised learning
      7. Visualizing high-dimensional datasets
      8. Principal Component Analysis in action
      9. t-SNE in action
      10. Adding simple components together to improve the pipeline
    8. Summary
  10. Hyperparameter Optimization
    1. Technical requirements
    2. Hyperparameters
    3. Warm start
    4. Bayesian-based hyperparameter tuning
    5. An example system
    6. Summary
  11. Creating AutoML Pipelines
    1. Technical requirements
    2. An introduction to machine learning pipelines
    3. A simple pipeline
    4. FunctionTransformer
    5. A complex pipeline
    6. Summary
  12. Dive into Deep Learning
    1. Technical requirements
    2. Overview of neural networks
      1. Neuron
      2. Activation functions
        1. The step function
        2. The sigmoid function
        3. The ReLU function
        4. The tanh function
    3. A feed-forward neural network using Keras
    4. Autoencoders
    5. Convolutional Neural Networks
      1. Why CNN?
      2. What is convolution?
      3. What are filters?
      4. The convolution layer
      5. The ReLU layer
      6. The pooling layer
      7. The fully connected layer
    6. Summary
  13. Critical Aspects of ML and Data Science Projects
    1. Machine learning as a search
    2. Trade-offs in machine learning
    3. Engagement model for a typical data science project
    4. The phases of an engagement model
      1. Business understanding
      2. Data understanding
      3. Data preparation
      4. Modeling
      5. Evaluation
      6. Deployment
    5. Summary
  14. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Hands-On Automated Machine Learning
  • Author(s): Sibanjan Das, Umit Mert Cakmak
  • Release date: April 2018
  • Publisher(s): Packt Publishing
  • ISBN: 9781788629898