## With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

No credit card required

## Book Description

Machine learning methods extract value from vast data sets quickly and with modest resources. They are established tools in a wide range of industrial applications, including search engines, DNA sequencing, stock market analysis, and robot locomotion, and their use is spreading rapidly. People who know the methods have their choice of rewarding jobs. This hands-on text opens these opportunities to computer science students with modest mathematical backgrounds. It is designed for final-year undergraduates and master's students with limited background in linear algebra and calculus. Comprehensive and coherent, it develops everything from basic reasoning to advanced techniques within the framework of graphical models. Students learn more than a menu of techniques, they develop analytical and problem-solving skills that equip them for the real world. Numerous examples and exercises, both computer based and theoretical, are included in every chapter. Resources for students and instructors, including a MATLAB toolbox, are available online.

1. Cover
2. Half Title
3. Title Page
5. Contents
6. Preface
7. List of notation
8. BRMLTOOLBOX
9. I: Inference in probabilistic models
1. 1. Probabilistic reasoning
1. 1.1 Probability refresher
2. 1.2 Probabilistic reasoning
3. 1.3 Prior, likelihood and posterior
4. 1.4 Summary
5. 1.5 Code
6. 1.6 Exercises
2. 2. Basic graph concepts
1. 2.1 Graphs
2. 2.2 Numerically encoding graphs
3. 2.3 Summary
4. 2.4 Code
5. 2.5 Exercises
3. 3. Belief networks
1. 3.1 The benefits of structure
2. 3.2 Uncertain and unreliable evidence
3. 3.3 Belief networks
4. 3.4 Causality
5. 3.5 Summary
6. 3.6 Code
7. 3.7 Exercises
4. 4. Graphical models
1. 4.1 Graphical models
2. 4.2 Markov networks
3. 4.3 Chain graphical models
4. 4.4 Factor graphs
5. 4.5 Expressiveness of graphical models
6. 4.6 Summary
7. 4.7 Code
8. 4.8 Exercises
5. 5. Efficient inference in trees
1. 5.1 Marginal inference
2. 5.2 Other forms of inference
3. 5.3 Inference in multiply connected graphs
4. 5.4 Message passing for continuous distributions
5. 5.5 Summary
6. 5.6 Code
7. 5.7 Exercises
6. 6. The junction tree algorithm
1. 6.1 Clustering variables
2. 6.2 Clique graphs
3. 6.3 Junction trees
4. 6.4 Constructing a junction tree for singly connected distributions
5. 6.5 Junction trees for multiply connected distributions
6. 6.6 The junction tree algorithm
7. 6.7 Finding the most likely state
8. 6.8 Reabsorption: converting a junction tree to a directed network
9. 6.9 The need for approximations
10. 6.10 Summary
11. 6.11 Code
12. 6.12 Exercises
7. 7. Making decisions
1. 7.1 Expected utility
2. 7.2 Decision trees
3. 7.3 Extending Bayesian networks for decisions
4. 7.4 Solving influence diagrams
5. 7.5 Markov decision processes
6. 7.6 Temporally unbounded MDPs
7. 7.7 Variational inference and planning
8. 7.8 Financial matters
9. 7.9 Further topics
10. 7.10 Summary
11. 7.11 Code
12. 7.12 Exercises
10. II: Learning in probabilistic models
1. 8. Statistics for machine learning
1. 8.1 Representing data
2. 8.2 Distributions
3. 8.3 Classical distributions
4. 8.4 Multivariate Gaussian
5. 8.5 Exponential family
6. 8.6 Learning distributions
7. 8.7 Properties of maximum likelihood
8. 8.8 Learning a Gaussian
9. 8.9 Summary
10. 8.10 Code
11. 8.11 Exercises
2. 9. Learning as inference
1. 9.1 Learning as inference
2. 9.2 Bayesian methods and ML-II
3. 9.3 Maximum likelihood training of belief networks
4. 9.4 Bayesian belief network training
5. 9.5 Structure learning
6. 9.6 Maximum likelihood for undirected models
7. 9.7 Summary
8. 9.8 Code
9. 9.9 Exercises
3. 10. Naive Bayes
1. 10.1 Naive Bayes and conditional independence
2. 10.2 Estimation using maximum likelihood
3. 10.3 Bayesian naive Bayes
4. 10.4 Tree augmented naive Bayes
5. 10.5 Summary
6. 10.6 Code
7. 10.7 Exercises
4. 11. Learning with hidden variables
1. 11.1 Hidden variables and missing data
2. 11.2 Expectation maximisation
3. 11.3 Extensions of EM
4. 11.4 A failure case for EM
5. 11.5 Variational Bayes
6. 11.6 Optimising the likelihood by gradient methods
7. 11.7 Summary
8. 11.8 Code
9. 11.9 Exercises
5. 12. Bayesian model selection
1. 12.1 Comparing models the Bayesian way
2. 12.2 Illustrations: coin tossing
3. 12.3 Occam’s razor and Bayesian complexity penalisation
4. 12.4 A continuous example: curve fitting
5. 12.5 Approximating the model likelihood
6. 12.6 Bayesian hypothesis testing for outcome analysis
7. 12.7 Summary
8. 12.8 Code
9. 12.9 Exercises
11. III: Machine learning
1. 13. Machine learning concepts
1. 13.1 Styles of learning
2. 13.2 Supervised learning
3. 13.3 Bayes versus empirical decisions
4. 13.4 Summary
5. 13.5 Exercises
2. 14. Nearest neighbour classification
1. 14.1 Do as your neighbour does
2. 14.2 K-nearest neighbours
3. 14.3 A probabilistic interpretation of nearest neighbours
4. 14.4 Summary
5. 14.5 Code
6. 14.6 Exercises
3. 15. Unsupervised linear dimension reduction
1. 15.1 High-dimensional spaces – low-dimensional manifolds
2. 15.2 Principal components analysis
3. 15.3 High-dimensional data
4. 15.4 Latent semantic analysis
5. 15.5 PCA with missing data
6. 15.6 Matrix decomposition methods
7. 15.7 Kernel PCA
8. 15.8 Canonical correlation analysis
9. 15.9 Summary
10. 15.10 Code
11. 15.11 Exercises
4. Plate Section
5. 16. Supervised linear dimension reduction
1. 16.1 Supervised linear projections
2. 16.2 Fisher’s linear discriminant
3. 16.3 Canonical variates
4. 16.4 Summary
5. 16.5 Code
6. 16.6 Exercises
6. 17. Linear models
1. 17.1 Introduction: fitting a straight line
2. 17.2 Linear parameter models for regression
3. 17.3 The dual representation and kernels
4. 17.4 Linear parameter models for classification
5. 17.5 Support vector machines
6. 17.6 Soft zero-one loss for outlier robustness
7. 17.7 Summary
8. 17.8 Code
9. 17.9 Exercises
7. 18. Bayesian linear models
1. 18.1 Regression with additive Gaussian noise
2. 18.2 Classification
3. 18.3 Summary
4. 18.4 Code
5. 18.5 Exercises
8. 19. Gaussian processes
1. 19.1 Non-parametric prediction
2. 19.2 Gaussian process prediction
3. 19.3 Covariance functions
4. 19.4 Analysis of covariance functions
5. 19.5 Gaussian processes for classification
6. 19.6 Summary
7. 19.7 Code
8. 19.8 Exercises
9. 20. Mixture models
1. 20.1 Density estimation using mixtures
2. 20.2 Expectation maximisation for mixture models
3. 20.3 The Gaussian mixture model
4. 20.4 Mixture of experts
5. 20.5 Indicator models
6. 20.6 Mixed membership models
7. 20.7 Summary
8. 20.8 Code
9. 20.9 Exercises
10. 21. Latent linear models
1. 21.1 Factor analysis
2. 21.2 Factor analysis: maximum likelihood
3. 21.3 Interlude: modelling faces
4. 21.4 Probabilistic principal components analysis
5. 21.5 Canonical correlation analysis and factor analysis
6. 21.6 Independent components analysis
7. 21.7 Summary
8. 21.8 Code
9. 21.9 Exercises
11. 22. Latent ability models
1. 22.1 The Rasch model
2. 22.2 Competition models
3. 22.3 Summary
4. 22.4 Code
5. 22.5 Exercises
12. IV: Dynamical models
1. 23. Discrete-state Markov models
1. 23.1 Markov models
2. 23.2 Hidden Markov models
3. 23.3 Learning HMMs
4. 23.4 Related models
5. 23.5 Applications
6. 23.6 Summary
7. 23.7 Code
8. 23.8 Exercises
2. 24. Continuous-state Markov models
1. 24.1 Observed linear dynamical systems
2. 24.2 Auto-regressive models
3. 24.3 Latent linear dynamical systems
4. 24.4 Inference
5. 24.5 Learning linear dynamical systems
6. 24.6 Switching auto-regressive models
7. 24.7 Summary
8. 24.8 Code
9. 24.9 Exercises
3. 25. Switching linear dynamical systems
1. 25.1 Introduction
2. 25.2 The switching LDS
3. 25.3 Gaussian sum filtering
4. 25.4 Gaussian sum smoothing
5. 25.5 Reset models
6. 25.6 Summary
7. 25.7 Code
8. 25.8 Exercises
4. 26. Distributed computation
1. 26.1 Introduction
2. 26.2 Stochastic Hopfield networks
3. 26.3 Learning sequences
4. 26.4 Tractable continuous latent variable models
5. 26.5 Neural models
6. 26.6 Summary
7. 26.7 Code
8. 26.8 Exercises
13. V: Approximate inference
1. 27. Sampling
1. 27.1 Introduction
2. 27.2 Ancestral sampling
3. 27.3 Gibbs sampling
4. 27.4 Markov chain Monte Carlo (MCMC)
5. 27.5 Auxiliary variable methods
6. 27.6 Importance sampling
7. 27.7 Summary
8. 27.8 Code
9. 27.9 Exercises
2. 28. Deterministic approximate inference
1. 28.1 Introduction
2. 28.2 The Laplace approximation
3. 28.3 Properties of Kullback–Leibler variational inference
4. 28.4 Variational bounding using KL(q|p)
5. 28.5 Local and KL variational approximations
6. 28.6 Mutual information maximisation: a KL variational approach
7. 28.7 Loopy belief propagation
8. 28.8 Expectation propagation
9. 28.9 MAP for Markov networks