O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Machine Learning with Python Cookbook

Book Description

With Early Release ebooks, you get books in their earliest form—the author's raw and unedited content as he or she writes—so you can take advantage of these technologies long before the official release of these titles. You’ll also receive updates when significant changes are made, new chapters are available, and the final ebook bundle is released.

The Python programming language and its libraries, including pandas and scikit-learn, provide a production-grade environment to help you accomplish a broad range of machine-learning tasks. With this comprehensive cookbook, data scientists and software engineers familiar with Python will benefit from almost 200 practical recipes for building a comprehensive machine-learning pipeline—everything from data preprocessing and feature engineering to model evaluation and deep learning.

Learn from author Chris Albon, a data scientist who has written more than 500 tutorials on Python, data science, and machine learning. Each recipe in this practical cookbook includes code solutions that you can put to work right away, along with a discussion of how and why they work—making it ideal as a learning tool and reference book.

Table of Contents

  1. Preface
    1. Who This Book Is For
    2. Who This Book Is Not For
    3. Terminology Used In This Book
    4. Acknowledgements
  2. Chapter 1
    1. 1.0. Introduction
    2. 1.1. Creating A Vector
    3. 1.2. Creating A Matrix
    4. 1.3. Creating A Sparse Matrix
    5. 1.4. Selecting Elements
    6. 1.5. Describing A Matrix
    7. 1.6. Applying Operations To Elements
    8. 1.7. Finding The Maximum And Minimum Values
    9. 1.8. Calculating The Average, Variance, And Standard Deviation
    10. 1.9. Reshaping Arrays
    11. 1.10. Transposing A Vector Or Matrix
    12. 1.11. Flattening A Matrix
    13. 1.12. Finding The Rank Of A Matrix
    14. 1.13. Calculating The Determinant
    15. 1.14. Getting The Diagonal Of A Matrix
    16. 1.15. Calculating The Trace Of A Matrix
    17. 1.16. Finding Eigenvalues And Eigenvectors
    18. 1.17. Calculating Dot Products
    19. 1.18. Adding And Subtracting Matrices
    20. 1.19. Multiplying Matrices
    21. 1.20. Inverting A Matrix
    22. 1.21. Generating Random Values
  3. Chapter 2
    1. 2.0. Introduction
    2. 2.1. Loading A Sample Dataset
    3. 2.2. Creating A Simulated Dataset
    4. 2.3. Loading A CSV File
    5. 2.4. Loading An Excel File
    6. 2.5. Loading A JSON File
    7. 2.6. Querying A SQL Database
  4. Chapter 3
    1. 3.0. Introduction
    2. 3.1. Creating A Data Frame
    3. 3.2. Describing The Data
    4. 3.3. Navigating DataFrames
    5. 3.4. Selecting Rows Based On Conditionals
    6. 3.5. Replacing Values
    7. 3.6. Renaming Columns
    8. 3.7. Finding The Minimum, Maximum, Sum, Average, And Count
    9. 3.8. Finding Unique Values
    10. 3.9. Handling Missing Values
    11. 3.10. Deleting A Column
    12. 3.11. Deleting A Row
    13. 3.12. Dropping Duplicate Rows
    14. 3.13. Grouping Rows By Values
    15. 3.14. Grouping Rows By Time
    16. 3.15. Looping Over A Column
    17. 3.16. Applying A Function Over All Elements In A Column
    18. 3.17. Applying A Function To Groups
    19. 3.18. Concatenating DataFrames
    20. 3.19. Merging DataFrames
  5. Chapter 4
    1. 4.0. Introduction
    2. 4.1. Rescaling A Feature
    3. 4.2. Normalizing Observations
    4. 4.3. Generating Polynomial And Interaction Features
    5. 4.4. Transforming Features
    6. 4.5. Detecting Outliers
    7. 4.6. Handling Outliers
    8. 4.7. Discretizating Features
    9. 4.8. Grouping Observations Using Clustering
    10. 4.9. Deleting Observations With Missing Values
    11. 4.10. Imputing Missing Values
  6. Chapter 5
    1. 5.0. Introduction
    2. 5.1. Encoding Nominal Categorical Features
    3. 5.2. Encoding Ordinal Categorical Features
    4. 5.3. Encoding Dictionaries Of Features
    5. 5.4. Imputing Missing Class Values
    6. 5.5. Handling Imbalanced Classes
  7. Chapter 6
    1. 6.0. Introduction
    2. 6.1. Cleaning Text
    3. 6.2. Parsing And Cleaning HTML
    4. 6.3. Removing Punctuation
    5. 6.4. Tokenizing Text
    6. 6.5. Removing Stop Words
    7. 6.6. Stemming Words
    8. 6.7. Tagging Parts Of Speech
    9. 6.8. Encoding Text As A Bag Of Words
    10. 6.9. Weighting Word Importance
  8. Chapter 7
    1. 7.0. Introduction
    2. 7.1. Converting Strings To Dates
    3. 7.2. Handling Time Zones
    4. 7.3. Selecting Dates And Times
    5. 7.4. Breaking Up Date Data Into Multiple Features
    6. 7.5. Calculating The Difference Between Dates
    7. 7.6. Encoding Days Of The Week
    8. 7.7. Creating A Lagged Feature
    9. 7.8. Using Rolling Time Windows
    10. 7.9. Handling Missing Data In Time Series
  9. Chapter 8
    1. 8.0. Introduction
    2. 8.1. Loading Images
    3. 8.2. Saving Images
    4. 8.3. Resizing Images
    5. 8.4. Cropping Images
    6. 8.5. Blurring Images
    7. 8.6. Sharpening Images
    8. 8.7. Enhancing Contrast
    9. 8.8. Isolating Colors
    10. 8.9. Binarizing Images
    11. 8.10. Removing Backgrounds
    12. 8.11. Detecting Edges
    13. 8.12. Detecting Corners
    14. 8.13. Creating Features For Machine Learning
    15. 8.14. Encoding Mean Color As A Feature
    16. 8.15. Encoding Color Histograms As Features
  10. Chapter 9
    1. 9.0. Introduction
    2. 9.1. Reducing Features Using Principal Components
    3. 9.2. Reducing Features When Data Is Linearly Inseparable
    4. 9.3. Reducing Features By Maximizing Class Separability
    5. 9.4. Reducing Features Using Matrix Factorization
    6. 9.5. Reducing Features On Sparse Data
  11. Chapter 10
    1. 10.0. Introduction
    2. 10.1. Thresholding Numerical Feature Variance
    3. 10.2. Thresholding Binary Feature Variance
    4. 10.3. Handling Highly Correlated Features
    5. 10.4. Removing Irrelevant Features For Classification
    6. 10.5. Recursively Eliminating Features
  12. Chapter 11
    1. 11.0. Introduction
    2. 11.1. Cross-validating Models
    3. 11.2. Creating A Baseline Regression Model
    4. 11.3. Creating A Baseline Classification Model
    5. 11.4. Evaluating Binary Classifier Predictions
    6. 11.5. Evaluating Binary Classifier Thresholds
    7. 11.6. Evaluating Multi-Class Classifier Predictions
    8. 11.7. Visualizing A Classifier’s Performance
    9. 11.8. Evaluating Regression Models
    10. 11.9. Evaluating Clustering Models
    11. 11.10. Creating Custom Evaluation Metric
    12. 11.11. Visualizing Effect Of Training Set Size
    13. 11.12. Visualizing Effect Of Hyperparameter Values
  13. Chapter 12
    1. 12.0. Introduction
    2. 12.1. Selecting Best Models Using Exhaustive Search
    3. 12.2. Selecting Best Models From Multiple Learning Algorithms
    4. 12.3. Selecting Best Models When Preprocessing
    5. 12.4. Speeding Up Model Selection With Parallelization
    6. 12.5. Speeding Up Model Selection Algorithm Specific Methods
    7. 12.6. Evaluating Performance After Model Selection
  14. Chapter 13
    1. 13.0. Introduction
    2. 13.1. Fitting A Line
    3. 13.2. Handling Interactive Effects
    4. 13.3. Fitting A Non-Linear Relationship
    5. 13.4. Reducing Variance With Regularization
    6. 13.5. Reducing Features With Lasso Regression
  15. Chapter 14
    1. 14.0. Introduction
    2. 14.1. Training A Decision Tree Classifier
    3. 14.2. Training A Decision Tree Regressor
    4. 14.3. Visualizing A Decision Tree Model
    5. 14.4. Training A Random Forest Classifier
    6. 14.5. Training A Random Forest Regressor
    7. 14.6. Identifying Important Features In Random Forests
    8. 14.7. Selecting Important Features In Random Forests
    9. 14.8. Handling Imbalanced Classes
    10. 14.9. Controlling Tree Size
    11. 14.10. Improving Performance Through Boosting
    12. 14.11. Evaluating Random Forests With Out-Of-Bag Errors
  16. Chapter 15
    1. 15.0. Introduction
    2. 15.1. Finding An Observation’s Nearest Neighbors
    3. 15.2. Creating A K-Nearest Neighbor Classifier
    4. 15.3. Identifying The Best Neighborhood Size
    5. 15.4. Creating A Radius-Based Nearest Neighbor Classifier
  17. Chapter 16
    1. 16.0. Introduction
    2. 16.1. Training A Binary Classifier
    3. 16.2. Training A Multi-Class Classifier
    4. 16.3. Reducing Variance Through Regularization
    5. 16.4. Training A Classifier On Very Large Data
    6. 16.5. Handling Imbalanced Classes
  18. Chapter 17
    1. 17.0. Introduction
    2. 17.1. Training A Linear Classifier
    3. 17.2. Handling Linearly Inseparable Classes Using Kernels
    4. 17.3. Creating Predicted Probabilities
    5. 17.4. Identifying Support Vectors
    6. 17.5. Handling Imbalanced Classes
  19. Chapter 18
    1. 18.0. Introduction
    2. 18.1. Training A Classifier For Continuous Features
    3. 18.2. Training A Classifier For Discrete And Count Features
    4. 18.3. Training A Naive Bayes Classifier For Binary Features
    5. 18.4. Calibrating Predicted Probabilities
  20. Chapter 19
    1. 19.0. Introduction
    2. 19.1. Clustering Using K-Means
    3. 19.2. Speeding Up K-Means Clustering
    4. 19.3. Clustering Using Meanshift
    5. 19.4. Clustering Using DBSCAN
    6. 19.5. Clustering Using Hierarchical Merging
  21. Chapter 20
    1. 20.0. Introduction
    2. 20.1. Preprocessing Data For Neural Networks
    3. 20.2. Designing A Neural Network
    4. 20.3. Training A Binary Classifier
    5. 20.4. Training A Multi-Class Classifier
    6. 20.5. Training A Regressor
    7. 20.6. Making Predictions
    8. 20.7. Visualize Training History
    9. 20.8. Reducing Overfitting With Weight Regularization
    10. 20.9. Reducing Overfitting With Early Stopping
    11. 20.10. Reducing Overfitting With Dropout
    12. 20.11. Saving Model Training Progress
    13. 20.12. k-Fold Cross-validating Neural Networks
    14. 20.13. Tuning Neural Networks
    15. 20.14. Visualizing Neural Networks
    16. 20.15. Classifying Images
    17. 20.16. Improving Performance With Image Augmentation
    18. 20.17. Classifying Text
  22. Chapter 21
    1. 21.0. Introduction
    2. 21.1. Saving And Loading A scikit-learn Model
    3. 21.2. Saving And Loading A Keras Model
  23. Index