O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Bayesian Analysis with Python

Book Description

Unleash the power and flexibility of the Bayesian framework

About This Book

  • Simplify the Bayes process for solving complex statistical problems using Python;

  • Tutorial guide that will take the you through the journey of Bayesian analysis with the help of sample problems and practice exercises;

  • Learn how and when to use Bayesian analysis in your applications with this guide.

  • Who This Book Is For

    Students, researchers and data scientists who wish to learn Bayesian data analysis with Python and implement probabilistic models in their day to day projects. Programming experience with Python is essential. No previous statistical knowledge is assumed.

    What You Will Learn

  • Understand the essentials Bayesian concepts from a practical point of view

  • Learn how to build probabilistic models using the Python library PyMC3

  • Acquire the skills to sanity-check your models and modify them if necessary

  • Add structure to your models and get the advantages of hierarchical models

  • Find out how different models can be used to answer different data analysis questions

  • When in doubt, learn to choose between alternative models.

  • Predict continuous target outcomes using regression analysis or assign classes using logistic and softmax regression.

  • Learn how to think probabilistically and unleash the power and flexibility of the Bayesian framework

  • In Detail

    The purpose of this book is to teach the main concepts of Bayesian data analysis. We will learn how to effectively use PyMC3, a Python library for probabilistic programming, to perform Bayesian parameter estimation, to check models and validate them. This book begins presenting the key concepts of the Bayesian framework and the main advantages of this approach from a practical point of view. Moving on, we will explore the power and flexibility of generalized linear models and how to adapt them to a wide array of problems, including regression and classification. We will also look into mixture models and clustering data, and we will finish with advanced topics like non-parametrics models and Gaussian processes. With the help of Python and PyMC3 you will learn to implement, check and expand Bayesian models to solve data analysis problems.

    Style and approach

    Bayes algorithms are widely used in statistics, machine learning, artificial intelligence, and data mining. This will be a practical guide allowing the readers to use Bayesian methods for statistical modelling and analysis using Python.

    Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

    Table of Contents

    1. Bayesian Analysis with Python
      1. Table of Contents
      2. Bayesian Analysis with Python
      3. Credits
      4. About the Author
      5. About the Reviewer
      6. www.PacktPub.com
        1. eBooks, discount offers, and more
          1. Why subscribe?
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the example code
          2. Downloading the color images of this book
          3. Errata
          4. Piracy
          5. Questions
      8. 1. Thinking Probabilistically - A Bayesian Inference Primer
        1. Statistics as a form of modeling
          1. Exploratory data analysis
          2. Inferential statistics
        2. Probabilities and uncertainty
          1. Probability distributions
          2. Bayes' theorem and statistical inference
        3. Single parameter inference
          1. The coin-flipping problem
            1. The general model
            2. Choosing the likelihood
            3. Choosing the prior
            4. Getting the posterior
            5. Computing and plotting the posterior
            6. Influence of the prior and how to choose one
          2. Communicating a Bayesian analysis
          3. Model notation and visualization
          4. Summarizing the posterior
            1. Highest posterior density
        4. Posterior predictive checks
        5. Installing the necessary Python packages
        6. Summary
        7. Exercises
      9. 2. Programming Probabilistically – A PyMC3 Primer
        1. Probabilistic programming
          1. Inference engines
            1. Non-Markovian methods
              1. Grid computing
              2. Quadratic method
              3. Variational methods
            2. Markovian methods
              1. Monte Carlo
              2. Markov chain
              3. Metropolis-Hastings
              4. Hamiltonian Monte Carlo/NUTS
              5. Other MCMC methods
        2. PyMC3 introduction
          1. Coin-flipping, the computational approach
            1. Model specification
            2. Pushing the inference button
            3. Diagnosing the sampling process
              1. Convergence
              2. Autocorrelation
              3. Effective size
        3. Summarizing the posterior
          1. Posterior-based decisions
            1. ROPE
            2. Loss functions
        4. Summary
        5. Keep reading
        6. Exercises
      10. 3. Juggling with Multi-Parametric and Hierarchical Models
        1. Nuisance parameters and marginalized distributions
        2. Gaussians, Gaussians, Gaussians everywhere
          1. Gaussian inferences
          2. Robust inferences
            1. Student's t-distribution
        3. Comparing groups
          1. The tips dataset
          2. Cohen's d
          3. Probability of superiority
        4. Hierarchical models
          1. Shrinkage
        5. Summary
        6. Keep reading
        7. Exercises
      11. 4. Understanding and Predicting Data with Linear Regression Models
        1. Simple linear regression
          1. The machine learning connection
          2. The core of linear regression models
          3. Linear models and high autocorrelation
            1. Modifying the data before running
            2. Changing the sampling method
          4. Interpreting and visualizing the posterior
          5. Pearson correlation coefficient
            1. Pearson coefficient from a multivariate Gaussian
        2. Robust linear regression
        3. Hierarchical linear regression
          1. Correlation, causation, and the messiness of life
        4. Polynomial regression
          1. Interpreting the parameters of a polynomial regression
          2. Polynomial regression – the ultimate model?
        5. Multiple linear regression
          1. Confounding variables and redundant variables
          2. Multicollinearity or when the correlation is too high
          3. Masking effect variables
          4. Adding interactions
        6. The GLM module
        7. Summary
        8. Keep reading
        9. Exercises
      12. 5. Classifying Outcomes with Logistic Regression
        1. Logistic regression
          1. The logistic model
          2. The iris dataset
          3. The logistic model applied to the iris dataset
            1. Making predictions
        2. Multiple logistic regression
          1. The boundary decision
          2. Implementing the model
          3. Dealing with correlated variables
          4. Dealing with unbalanced classes
          5. How do we solve this problem?
          6. Interpreting the coefficients of a logistic regression
          7. Generalized linear models
          8. Softmax regression or multinomial logistic regression
        3. Discriminative and generative models
        4. Summary
        5. Keep reading
        6. Exercises
      13. 6. Model Comparison
        1. Occam's razor – simplicity and accuracy
          1. Too many parameters leads to overfitting
          2. Too few parameters leads to underfitting
          3. The balance between simplicity and accuracy
        2. Regularizing priors
          1. Regularizing priors and hierarchical models
        3. Predictive accuracy measures
          1. Cross-validation
          2. Information criteria
            1. The log-likelihood and the deviance
            2. Akaike information criterion
            3. Deviance information criterion
            4. Widely available information criterion
            5. Pareto smoothed importance sampling leave-one-out cross-validation
            6. Bayesian information criterion
          3. Computing information criteria with PyMC3
            1. A note on the reliability of WAIC and LOO computations
          4. Interpreting and using information criteria measures
          5. Posterior predictive checks
        4. Bayes factors
          1. Analogy with information criteria
          2. Computing Bayes factors
            1. Common problems computing Bayes factors
        5. Bayes factors and information criteria
        6. Summary
        7. Keep reading
        8. Exercises
      14. 7. Mixture Models
        1. Mixture models
          1. How to build mixture models
          2. Marginalized Gaussian mixture model
          3. Mixture models and count data
            1. The Poisson distribution
            2. The Zero-Inflated Poisson model
            3. Poisson regression and ZIP regression
          4. Robust logistic regression
        2. Model-based clustering
          1. Fixed component clustering
            1. Non-fixed component clustering
        3. Continuous mixtures
          1. Beta-binomial and negative binomial
          2. The Student's t-distribution
        4. Summary
        5. Keep reading
        6. Exercises
      15. 8. Gaussian Processes
        1. Non-parametric statistics
        2. Kernel-based models
          1. The Gaussian kernel
          2. Kernelized linear regression
          3. Overfitting and priors
        3. Gaussian processes
          1. Building the covariance matrix
            1. Sampling from a GP prior
            2. Using a parameterized kernel
          2. Making predictions from a GP
          3. Implementing a GP using PyMC3
            1. Posterior predictive checks
            2. Periodic kernel
        4. Summary
        5. Keep reading
        6. Exercises
      16. Index