O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Mastering Machine Learning with scikit-learn

Book Description

Apply effective learning algorithms to real-world problems using scikit-learn

In Detail

This book examines machine learning models including logistic regression, decision trees, and support vector machines, and applies them to common problems such as categorizing documents and classifying images. It begins with the fundamentals of machine learning, introducing you to the supervised-unsupervised spectrum, the uses of training and test data, and evaluating models. You will learn how to use generalized linear models in regression problems, as well as solve problems with text and categorical features.

You will be acquainted with the use of logistic regression, regularization, and the various loss functions that are used by generalized linear models. The book will also walk you through an example project that prompts you to label the most uncertain training examples. You will also use an unsupervised Hidden Markov Model to predict stock prices.

By the end of the book, you will be an expert in scikit-learn and will be well versed in machine learning

What You Will Learn

  • Review fundamental concepts including supervised and unsupervised experiences, common tasks, and performance metrics
  • Predict the values of continuous variables using linear regression
  • Create representations of documents and images that can be used in machine learning models
  • Categorize documents and text messages using logistic regression and support vector machines
  • Classify images by their subjects
  • Discover hidden structures in data using clustering and visualize complex data using decomposition
  • Evaluate the performance of machine learning systems in common tasks
  • Diagnose and redress problems with models due to bias and variance
  • Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

    Table of Contents

    1. Mastering Machine Learning with scikit-learn
      1. Table of Contents
      2. Mastering Machine Learning with scikit-learn
      3. Credits
      4. About the Author
      5. About the Reviewers
      6. www.PacktPub.com
        1. Support files, eBooks, discount offers, and more
          1. Why subscribe?
          2. Free access for Packt account holders
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
        7. Downloading the example code
        8. Errata
        9. Piracy
        10. Questions
      8. 1. The Fundamentals of Machine Learning
        1. Learning from experience
        2. Machine learning tasks
        3. Training data and test data
        4. Performance measures, bias, and variance
        5. An introduction to scikit-learn
        6. Installing scikit-learn
          1. Installing scikit-learn on Windows
          2. Installing scikit-learn on Linux
          3. Installing scikit-learn on OS X
          4. Verifying the installation
        7. Installing pandas and matplotlib
        8. Summary
      9. 2. Linear Regression
        1. Simple linear regression
          1. Evaluating the fitness of a model with a cost function
          2. Solving ordinary least squares for simple linear regression
        2. Evaluating the model
        3. Multiple linear regression
        4. Polynomial regression
        5. Regularization
        6. Applying linear regression
          1. Exploring the data
          2. Fitting and evaluating the model
        7. Fitting models with gradient descent
        8. Summary
      10. 3. Feature Extraction and Preprocessing
        1. Extracting features from categorical variables
        2. Extracting features from text
          1. The bag-of-words representation
          2. Stop-word filtering
          3. Stemming and lemmatization
          4. Extending bag-of-words with TF-IDF weights
          5. Space-efficient feature vectorizing with the hashing trick
        3. Extracting features from images
          1. Extracting features from pixel intensities
          2. Extracting points of interest as features
          3. SIFT and SURF
        4. Data standardization
        5. Summary
      11. 4. From Linear Regression to Logistic Regression
        1. Binary classification with logistic regression
        2. Spam filtering
        3. Binary classification performance metrics
          1. Accuracy
          2. Precision and recall
        4. Calculating the F1 measure
        5. ROC AUC
        6. Tuning models with grid search
        7. Multi-class classification
          1. Multi-class classification performance metrics
        8. Multi-label classification and problem transformation
          1. Multi-label classification performance metrics
        9. Summary
      12. 5. Nonlinear Classification and Regression with Decision Trees
        1. Decision trees
        2. Training decision trees
          1. Selecting the questions
          2. Information gain
          3. Gini impurity
        3. Decision trees with scikit-learn
          1. Tree ensembles
          2. The advantages and disadvantages of decision trees
        4. Summary
      13. 6. Clustering with K-Means
        1. Clustering with the K-Means algorithm
          1. Local optima
          2. The elbow method
        2. Evaluating clusters
        3. Image quantization
        4. Clustering to learn features
        5. Summary
      14. 7. Dimensionality Reduction with PCA
        1. An overview of PCA
        2. Performing Principal Component Analysis
          1. Variance, Covariance, and Covariance Matrices
          2. Eigenvectors and eigenvalues
          3. Dimensionality reduction with Principal Component Analysis
        3. Using PCA to visualize high-dimensional data
        4. Face recognition with PCA
        5. Summary
      15. 8. The Perceptron
        1. Activation functions
          1. The perceptron learning algorithm
        2. Binary classification with the perceptron
          1. Document classification with the perceptron
        3. Limitations of the perceptron
        4. Summary
      16. 9. From the Perceptron to Support Vector Machines
        1. Kernels and the kernel trick
        2. Maximum margin classification and support vectors
        3. Classifying characters in scikit-learn
          1. Classifying handwritten digits
          2. Classifying characters in natural images
        4. Summary
      17. 10. From the Perceptron to Artificial Neural Networks
        1. Nonlinear decision boundaries
        2. Feedforward and feedback artificial neural networks
          1. Multilayer perceptrons
          2. Minimizing the cost function
          3. Forward propagation
          4. Backpropagation
        3. Approximating XOR with Multilayer perceptrons
        4. Classifying handwritten digits
        5. Summary
      18. Index