You are previewing Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference.
O'Reilly logo
Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference

Book Description

Master Bayesian Inference through Practical Examples and Computation–Without Advanced Mathematical Analysis

Bayesian methods of inference are deeply natural and extremely powerful. However, most discussions of Bayesian inference rely on intensely complex mathematical analyses and artificial examples, making it inaccessible to anyone without a strong mathematical background. Now, though, Cameron Davidson-Pilon introduces Bayesian inference from a computational perspective, bridging theory to practice–freeing you to get results using computing power.

Bayesian Methods for Hackers illuminates Bayesian inference through probabilistic programming with the powerful PyMC language and the closely related Python tools NumPy, SciPy, and Matplotlib. Using this approach, you can reach effective solutions in small increments, without extensive mathematical intervention.

Davidson-Pilon begins by introducing the concepts underlying Bayesian inference, comparing it with other techniques and guiding you through building and training your first Bayesian model. Next, he introduces PyMC through a series of detailed examples and intuitive explanations that have been refined after extensive user feedback. You’ll learn how to use the Markov Chain Monte Carlo algorithm, choose appropriate sample sizes and priors, work with loss functions, and apply Bayesian inference in domains ranging from finance to marketing. Once you’ve mastered these techniques, you’ll constantly turn to this guide for the working PyMC code you need to jumpstart future projects.

Coverage includes

• Learning the Bayesian “state of mind” and its practical implications

• Understanding how computers perform Bayesian inference

• Using the PyMC Python library to program Bayesian analyses

• Building and debugging models with PyMC

• Testing your model’s “goodness of fit”

• Opening the “black box” of the Markov Chain Monte Carlo algorithm to see how and why it works

• Leveraging the power of the “Law of Large Numbers”

• Mastering key concepts, such as clustering, convergence, autocorrelation, and thinning

• Using loss functions to measure an estimate’s weaknesses based on your goals and desired outcomes

• Selecting appropriate priors and understanding how their influence changes with dataset size

• Overcoming the “exploration versus exploitation” dilemma: deciding when “pretty good” is good enough

• Using Bayesian inference to improve A/B testing

• Solving data science problems when only small amounts of data are available

Cameron Davidson-Pilon has worked in many areas of applied mathematics, from the evolutionary dynamics of genes and diseases to stochastic modeling of financial prices. His contributions to the open source community include lifelines, an implementation of survival analysis in Python. Educated at the University of Waterloo and at the Independent University of Moscow, he currently works with the online commerce leader Shopify.

Table of Contents

  1. About This eBook
  2. Title Page
  3. Copyright Page
  4. Dedication Page
  5. Contents
  6. Foreword
  7. Preface
  8. Acknowledgments
  9. About the Author
  10. 1. The Philosophy of Bayesian Inference
    1. 1.1 Introduction
      1. 1.1.1 The Bayesian State of Mind
      2. 1.1.2 Bayesian Inference in Practice
      3. 1.1.3 Are Frequentist Methods Incorrect?
      4. 1.1.4 A Note on “Big Data”
    2. 1.2 Our Bayesian Framework
      1. 1.2.1 Example: Mandatory Coin-Flip
      2. 1.2.2 Example: Librarian or Farmer?
    3. 1.3 Probability Distributions
      1. 1.3.1 Discrete Case
      2. 1.3.2 Continuous Case
      3. 1.3.3 But What Is λ?
    4. 1.4 Using Computers to Perform Bayesian Inference for Us
      1. 1.4.1 Example: Inferring Behavior from Text-Message Data
      2. 1.4.2 Introducing Our First Hammer: PyMC
      3. 1.4.3 Interpretation
      4. 1.4.4 What Good Are Samples from the Posterior, Anyway?
    5. 1.5 Conclusion
    6. 1.6 Appendix
      1. 1.6.1 Determining Statistically if the Two λs Are Indeed Different?
      2. 1.6.2 Extending to Two Switchpoints
    7. 1.7 Exercises
      1. 1.7.1 Answers
    8. 1.8 References
  11. 2. A Little More on PyMC
    1. 2.1 Introduction
      1. 2.1.1 Parent and Child Relationships
      2. 2.1.2 PyMC Variables
      3. 2.1.3 Including Observations in the Model
      4. 2.1.4 Finally. . .
    2. 2.2 Modeling Approaches
      1. 2.2.1 Same Story, Different Ending
      2. 2.2.2 Example: Bayesian A/B Testing
      3. 2.2.3 A Simple Case
      4. 2.2.4 A and B Together
      5. 2.2.5 Example: An Algorithm for Human Deceit
      6. 2.2.6 The Binomial Distribution
      7. 2.2.7 Example: Cheating Among Students
      8. 2.2.8 Alternative PyMC Model
      9. 2.2.9 More PyMC Tricks
      10. 2.2.10 Example: Challenger Space Shuttle Disaster
      11. 2.2.11 The Normal Distribution
      12. 2.2.12 What Happened the Day of the Challenger Disaster?
    3. 2.3 Is Our Model Appropriate?
      1. 2.3.1 Separation Plots
    4. 2.4 Conclusion
    5. 2.5 Appendix
    6. 2.6 Exercises
      1. 2.6.1 Answers
    7. 2.7 References
  12. 3. Opening the Black Box of MCMC
    1. 3.1 The Bayesian Landscape
      1. 3.1.1 Exploring the Landscape Using MCMC
      2. 3.1.2 Algorithms to Perform MCMC
      3. 3.1.3 Other Approximation Solutions to the Posterior
      4. 3.1.4 Example: Unsupervised Clustering Using a Mixture Model
      5. 3.1.5 Don’t Mix Posterior Samples
      6. 3.1.6 Using MAP to Improve Convergence
    2. 3.2 Diagnosing Convergence
      1. 3.2.1 Autocorrelation
      2. 3.2.2 Thinning
      3. 3.2.3 pymc.Matplot.plot()
    3. 3.3 Useful Tips for MCMC
      1. 3.3.1 Intelligent Starting Values
      2. 3.3.2 Priors
      3. 3.3.3 The Folk Theorem of Statistical Computing
    4. 3.4 Conclusion
    5. 3.5 References
  13. 4. The Greatest Theorem Never Told
    1. 4.1 Introduction
    2. 4.2 The Law of Large Numbers
      1. 4.2.1 Intuition
      2. 4.2.2 Example: Convergence of Poisson Random Variables
      3. 4.2.3 How Do We Compute Var(Z)?
      4. 4.2.4 Expected Values and Probabilities
      5. 4.2.5 What Does All This Have to Do with Bayesian Statistics?
    3. 4.3 The Disorder of Small Numbers
      1. 4.3.1 Example: Aggregated Geographic Data
      2. 4.3.2 Example: Kaggle’s U.S. Census Return Rate Challenge
      3. 4.3.3 Example: How to Sort Reddit Comments
      4. 4.3.4 Sorting!
      5. 4.3.5 But This Is Too Slow for Real-Time!
      6. 4.3.6 Extension to Starred Rating Systems
    4. 4.4 Conclusion
    5. 4.5 Appendix
      1. 4.5.1 Derivation of Sorting Comments Formula
    6. 4.6 Exercises
      1. 4.6.1 Answers
    7. 4.7 References
  14. 5. Would You Rather Lose an Arm or a Leg?
    1. 5.1 Introduction
    2. 5.2 Loss Functions
      1. 5.2.1 Loss Functions in the Real World
      2. 5.2.2 Example: Optimizing for the Showcase on The Price Is Right
    3. 5.3 Machine Learning via Bayesian Methods
      1. 5.3.1 Example: Financial Prediction
      2. 5.3.2 Example: Kaggle Contest on Observing Dark Worlds
      3. 5.3.3 The Data
      4. 5.3.4 Priors
      5. 5.3.5 Training and PyMC Implementation
    4. 5.4 Conclusion
    5. 5.5 References
  15. 6. Getting Our Priorities Straight
    1. 6.1 Introduction
    2. 6.2 Subjective versus Objective Priors
      1. 6.2.1 Objective Priors
      2. 6.2.2 Subjective Priors
      3. 6.2.3 Decisions, Decisions . . .
      4. 6.2.4 Empirical Bayes
    3. 6.3 Useful Priors to Know About
      1. 6.3.1 The Gamma Distribution
      2. 6.3.2 The Wishart Distribution
      3. 6.3.3 The Beta Distribution
    4. 6.4 Example: Bayesian Multi-Armed Bandits
      1. 6.4.1 Applications
      2. 6.4.2 A Proposed Solution
      3. 6.4.3 A Measure of Good
      4. 6.4.4 Extending the Algorithm
    5. 6.5 Eliciting Prior Distributions from Domain Experts
      1. 6.5.1 Trial Roulette Method
      2. 6.5.2 Example: Stock Returns
      3. 6.5.3 Pro Tips for the Wishart Distribution
    6. 6.6 Conjugate Priors
    7. 6.7 Jeffreys Priors
    8. 6.8 Effect of the Prior as N Increases
    9. 6.9 Conclusion
    10. 6.10 Appendix
      1. 6.10.1 Bayesian Perspective of Penalized Linear Regressions
      2. 6.10.2 Picking a Degenerate Prior
    11. 6.11 References
  16. 7. Bayesian A/B Testing
    1. 7.1 Introduction
    2. 7.2 Conversion Testing Recap
    3. 7.3 Adding a Linear Loss Function
      1. 7.3.1 Expected Revenue Analysis
      2. 7.3.2 Extending to an A/B Experiment
    4. 7.4 Going Beyond Conversions: t-test
      1. 7.4.1 The Setup of the t-test
    5. 7.5 Estimating the Increase
      1. 7.5.1 Creating Point Estimates
    6. 7.6 Conclusion
    7. 7.7 References
  17. Glossary
  18. Index
  19. Code Snippets