You are previewing Mastering Predictive Analytics with Python.
O'Reilly logo
Mastering Predictive Analytics with Python

Book Description

Exploit the power of data in your business by building advanced predictive modeling applications with Python

About This Book

  • Master open source Python tools to build sophisticated predictive models

  • Learn to identify the right machine learning algorithm for your problem with this forward-thinking guide

  • Grasp the major methods of predictive modeling and move beyond the basics to a deeper level of understanding

  • Who This Book Is For

    This book is designed for business analysts, BI analysts, data scientists, or junior level data analysts who are ready to move from a conceptual understanding of advanced analytics to an expert in designing and building advanced analytics solutions using Python. You’re expected to have basic development experience with Python.

    What You Will Learn

  • Gain an insight into components and design decisions for an analytical application

  • Master the use Python notebooks for exploratory data analysis and rapid prototyping

  • Get to grips with applying regression, classification, clustering, and deep learning algorithms

  • Discover the advanced methods to analyze structured and unstructured data

  • Find out how to deploy a machine learning model in a production environment

  • Visualize the performance of models and the insights they produce

  • Scale your solutions as your data grows using Python

  • Ensure the robustness of your analytic applications by mastering the best practices of predictive analysis

  • In Detail

    The volume, diversity, and speed of data available has never been greater. Powerful machine learning methods can unlock the value in this information by finding complex relationships and unanticipated trends. Using the Python programming language, analysts can use these sophisticated methods to build scalable analytic applications to deliver insights that are of tremendous value to their organizations.

    In Mastering Predictive Analytics with Python, you will learn the process of turning raw data into powerful insights. Through case studies and code examples using popular open-source Python libraries, this book illustrates the complete development process for analytic applications and how to quickly apply these methods to your own data to create robust and scalable prediction services.

    Covering a wide range of algorithms for classification, regression, clustering, as well as cutting-edge techniques such as deep learning, this book illustrates not only how these methods work, but how to implement them in practice. You will learn to choose the right approach for your problem and how to develop engaging visualizations to bring the insights of predictive modeling to life

    Style and approach

    This book emphasizes on explaining methods through example data and code, showing you templates that you can quickly adapt to your own use cases. It focuses on both a practical application of sophisticated algorithms and the intuitive understanding necessary to apply the correct method to the problem at hand. Through visual examples, it also demonstrates how to convey insights through insightful charts and reporting.

    Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at If you purchased this book elsewhere, you can visit and register to have the code file.

    Table of Contents

    1. Mastering Predictive Analytics with Python
      1. Table of Contents
      2. Mastering Predictive Analytics with Python
      3. Credits
      4. About the Author
      5. About the Reviewer
        1. eBooks, discount offers, and more
          1. Why subscribe?
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the example code
          2. Downloading the color images of this book
          3. Errata
          4. Piracy
          5. Questions
      8. 1. From Data to Decisions – Getting Started with Analytic Applications
        1. Designing an advanced analytic solution
          1. Data layer: warehouses, lakes, and streams
          2. Modeling layer
          3. Deployment layer
          4. Reporting layer
        2. Case study: sentiment analysis of social media feeds
          1. Data input and transformation
          2. Sanity checking
          3. Model development
          4. Scoring
          5. Visualization and reporting
        3. Case study: targeted e-mail campaigns
          1. Data input and transformation
          2. Sanity checking
          3. Model development
          4. Scoring
          5. Visualization and reporting
        4. Summary
      9. 2. Exploratory Data Analysis and Visualization in Python
        1. Exploring categorical and numerical data in IPython
          1. Installing IPython notebook
          2. The notebook interface
          3. Loading and inspecting data
          4. Basic manipulations – grouping, filtering, mapping, and pivoting
          5. Charting with Matplotlib
        2. Time series analysis
          1. Cleaning and converting
          2. Time series diagnostics
          3. Joining signals and correlation
        3. Working with geospatial data
          1. Loading geospatial data
          2. Working in the cloud
        4. Introduction to PySpark
          1. Creating the SparkContext
          2. Creating an RDD
          3. Creating a Spark DataFrame
        5. Summary
      10. 3. Finding Patterns in the Noise – Clustering and Unsupervised Learning
        1. Similarity and distance metrics
          1. Numerical distance metrics
          2. Correlation similarity metrics and time series
          3. Similarity metrics for categorical data
          4. K-means clustering
        2. Affinity propagation – automatically choosing cluster numbers
        3. k-medoids
        4. Agglomerative clustering
          1. Where agglomerative clustering fails
        5. Streaming clustering in Spark
        6. Summary
      11. 4. Connecting the Dots with Models – Regression Methods
        1. Linear regression
          1. Data preparation
          2. Model fitting and evaluation
          3. Statistical significance of regression outputs
          4. Generalize estimating equations
          5. Mixed effects models
          6. Time series data
          7. Generalized linear models
          8. Applying regularization to linear models
        2. Tree methods
          1. Decision trees
          2. Random forest
        3. Scaling out with PySpark – predicting year of song release
        4. Summary
      12. 5. Putting Data in its Place – Classification Methods and Analysis
        1. Logistic regression
          1. Multiclass logistic classifiers: multinomial regression
          2. Formatting a dataset for classification problems
          3. Learning pointwise updates with stochastic gradient descent
          4. Jointly optimizing all parameters with second-order methods
        2. Fitting the model
        3. Evaluating classification models
          1. Strategies for improving classification models
        4. Separating Nonlinear boundaries with Support vector machines
          1. Fitting and SVM to the census data
          2. Boosting – combining small models to improve accuracy
          3. Gradient boosted decision trees
        5. Comparing classification methods
        6. Case study: fitting classifier models in pyspark
        7. Summary
      13. 6. Words and Pixels – Working with Unstructured Data
        1. Working with textual data
          1. Cleaning textual data
          2. Extracting features from textual data
          3. Using dimensionality reduction to simplify datasets
        2. Principal component analysis
          1. Latent Dirichlet Allocation
          2. Using dimensionality reduction in predictive modeling
        3. Images
          1. Cleaning image data
          2. Thresholding images to highlight objects
          3. Dimensionality reduction for image analysis
        4. Case Study: Training a Recommender System in PySpark
        5. Summary
      14. 7. Learning from the Bottom Up – Deep Networks and Unsupervised Features
        1. Learning patterns with neural networks
          1. A network of one – the perceptron
          2. Combining perceptrons – a single-layer neural network
          3. Parameter fitting with back-propagation
          4. Discriminative versus generative models
          5. Vanishing gradients and explaining away
          6. Pretraining belief networks
          7. Using dropout to regularize networks
          8. Convolutional networks and rectified units
          9. Compressing Data with autoencoder networks
          10. Optimizing the learning rate
        2. The TensorFlow library and digit recognition
          1. The MNIST data
          2. Constructing the network
        3. Summary
      15. 8. Sharing Models with Prediction Services
        1. The architecture of a prediction service
        2. Clients and making requests
          1. The GET requests
          2. The POST request
          3. The HEAD request
          4. The PUT request
          5. The DELETE request
        3. Server – the web traffic controller
          1. Application – the engine of the predictive services
        4. Persisting information with database systems
        5. Case study – logistic regression service
          1. Setting up the database
          2. The web server
          3. The web application
            1. The flow of a prediction service – training a model
            2. On-demand and bulk prediction
        6. Summary
      16. 9. Reporting and Testing – Iterating on Analytic Systems
        1. Checking the health of models with diagnostics
          1. Evaluating changes in model performance
          2. Changes in feature importance
          3. Changes in unsupervised model performance
        2. Iterating on models through A/B testing
          1. Experimental allocation – assigning customers to experiments
          2. Deciding a sample size
          3. Multiple hypothesis testing
        3. Guidelines for communication
          1. Translate terms to business values
          2. Visualizing results
            1. Case Study: building a reporting service
          3. The report server
          4. The report application
          5. The visualization layer
        4. Summary
      17. Index