You are previewing Machine Learning.
O'Reilly logo
Machine Learning

Book Description

This tutorial text gives a unifying perspective on machine learning by covering both probabilistic and deterministic approaches -which are based on optimization techniques – together with the Bayesian inference approach, whose essence lies in the use of a hierarchy of probabilistic models. The book presents the major machine learning methods as they have been developed in different disciplines, such as statistics, statistical and adaptive signal processing and computer science. Focusing on the physical reasoning behind the mathematics, all the various methods and techniques are explained in depth, supported by examples and problems, giving an invaluable resource to the student and researcher for understanding and applying machine learning concepts.

The book builds carefully from the basic classical methods  to  the most recent trends, with chapters written to be as self-contained as possible, making the text suitable for  different courses: pattern recognition, statistical/adaptive signal processing, statistical/Bayesian learning, as well as short courses on sparse modeling, deep learning, and probabilistic graphical models.



  • All major classical techniques: Mean/Least-Squares regression and filtering, Kalman filtering, stochastic approximation and online learning, Bayesian classification, decision trees, logistic regression and boosting methods.
  • The latest trends: Sparsity, convex analysis and optimization, online distributed algorithms, learning in RKH spaces, Bayesian inference, graphical and hidden Markov models, particle filtering, deep learning, dictionary learning and latent variables modeling.
  • Case studies - protein folding prediction, optical character recognition, text authorship identification, fMRI data analysis, change point detection, hyperspectral image unmixing, target localization, channel equalization and echo cancellation, show how the theory can be applied.
  • MATLAB code for all the main algorithms are available on an accompanying website, enabling the reader to experiment with the code.

Table of Contents

  1. Cover image
  2. Title page
  3. Table of Contents
  4. Copyright
  5. Preface
  6. Acknowledgments
  7. Notation
  8. Dedication
  9. Chapter 1: Introduction
    1. Abstract
    2. 1.1 What Machine Learning is About
    3. 1.2 Structure and a Road Map of the Book
  10. Chapter 2: Probability and Stochastic Processes
    1. Abstract
    2. 2.1 Introduction
    3. 2.2 Probability and Random Variables
    4. 2.3 Examples of Distributions
    5. 2.4 Stochastic Processes
    6. 2.5 Information Theory
    7. 2.6 Stochastic Convergence
    8. Problems
  11. Chapter 3: Learning in Parametric Modeling: Basic Concepts and Directions
    1. Abstract
    2. 3.1 Introduction
    3. 3.2 Parameter Estimation: The Deterministic Point of View
    4. 3.3 Linear Regression
    5. 3.4 Classification
    6. 3.5 Biased Versus Unbiased Estimation
    7. 3.6 The Cramér-Rao Lower Bound
    8. 3.7 Sufficient Statistic
    9. 3.8 Regularization
    10. 3.9 The Bias-Variance Dilemma
    11. 3.10 Maximum Likelihood Method
    12. 3.11 Bayesian Inference
    13. 3.12 Curse of Dimensionality
    14. 3.13 Validation
    15. 3.14 Expected and Empirical Loss Functions
    16. 3.15 Nonparametric Modeling and Estimation
    17. Problems
  12. Chapter 4: Mean-Square Error Linear Estimation
    1. Abstract
    2. 4.1 Introduction
    3. 4.2 Mean-Square Error Linear Estimation: The Normal Equations
  13. Chapter 5: Stochastic Gradient Descent: The LMS Algorithm and its Family
    1. Abstract
    2. 5.1 Introduction
    3. 5.2 The Steepest Descent Method
    4. 5.3 Application to the Mean-Square Error Cost Function
    5. 5.4 Stochastic Approximation
    6. 5.5 The Least-Mean-Squares Adaptive Algorithm
    7. 5.6 The Affine Projection Algorithm
    8. 5.7 The Complex-Valued Case
    9. 5.8 Relatives of the LMS
    10. 5.9 Simulation Examples
    11. 5.10 Adaptive Decision Feedback Equalization
    12. 5.11 The Linearly Constrained LMS
    13. 5.12 Tracking Performance of the LMS in Nonstationary Environments
    14. 5.13 Distributed Learning: The Distributed LMS
    15. 5.14 A Case Study: Target Localization
    16. 5.15 Some Concluding Remarks: Consensus Matrix
    17. Problems
    18. MATLAB Exercises
  14. Chapter 6: The Least-Squares Family
    1. Abstract
    2. 6.1 Introduction
    3. 6.2 Least-Squares Linear Regression: A Geometric Perspective
    4. 6.3 Statistical Properties of the LS Estimator
    5. 6.4 Orthogonalizing the Column Space of <span xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" class="italic">X</span>: The SVD Method: The SVD Method
    6. 6.5 Ridge Regression
    7. 6.6 The Recursive Least-Squares Algorithm
    8. 6.7 Newton’s Iterative Minimization Method
    9. 6.8 Steady-State Performance of the RLS
    10. 6.9 Complex-Valued Data: The Widely Linear RLS
    11. 6.10 Computational Aspects of the LS Solution
    12. 6.11 The Coordinate and Cyclic Coordinate Descent Methods
    13. 6.12 Simulation Examples
    14. 6.13 Total-Least-Squares
    15. Problems
  15. Chapter 7: Classification: A Tour of the Classics
    1. Abstract
    2. 7.1 Introduction
    3. 7.2 Bayesian Classification
    4. 7.3 Decision (Hyper)Surfaces
    5. 7.4 The Naive Bayes Classifier
    6. 7.5 The Nearest Neighbor Rule
    7. 7.6 Logistic Regression
    8. 7.7 Fisher’s Linear Discriminant
    9. 7.8 Classification Trees
    10. 7.9 Combining Classifiers
    11. 7.10 The Boosting Approach
    12. 7.11 Boosting Trees
    13. 7.12 A Case Study: Protein Folding Prediction
    14. Problems
  16. Chapter 8: Parameter Learning: A Convex Analytic Path
    1. Abstract
    2. 8.1 Introduction
    3. 8.2 Convex Sets and Functions
    4. 8.3 Projections onto Convex Sets
    5. 8.4 Fundamental Theorem of Projections onto Convex Sets
    6. 8.5 A Parallel Version of POCS
    7. 8.6 From Convex Sets to Parameter Estimation and Machine Learning
    8. 8.7 Infinite Many Closed Convex Sets: The Online Learning Case
    9. 8.8 Constrained Learning
    10. 8.9 The Distributed APSM
    11. 8.10 Optimizing Nonsmooth Convex Cost Functions
    12. 8.11 Regret Analysis
    13. 8.12 Online Learning and Big Data Applications: A Discussion
    14. 8.13 Proximal Operators
    15. 8.14 Proximal Splitting Methods for Optimization
    16. Problems
    17. MATLAB Exercises
    18. 8.15 Appendix to Chapter 8
  17. Chapter 9: Sparsity-Aware Learning: Concepts and Theoretical Foundations
    1. Abstract
    2. 9.1 Introduction
    3. 9.2 Searching for a Norm
    4. 9.3 The Least Absolute Shrinkage and Selection Operator (LASSO)
    5. 9.4 Sparse Signal Representation
    6. 9.5 In Search of the Sparsest Solution
    7. 9.6 Uniqueness of the <span xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" class="italic">&#8467;</span><sub xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">0</sub> Minimizer Minimizer
    8. 9.7 Equivalence of <span xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" class="italic">&#8467;</span><sub xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">0</sub> and and <span xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" class="italic">&#8467;</span><sub xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">1</sub> Minimizers: Sufficiency Conditions Minimizers: Sufficiency Conditions
    9. 9.8 Robust Sparse Signal Recovery from Noisy Measurements
    10. 9.9 Compressed Sensing: The Glory of Randomness
    11. 9.10 A Case Study: Image De-Noising
    12. Problems
  18. Chapter 10: Sparsity-Aware Learning: Algorithms and Applications
    1. Abstract
    2. 10.1 Introduction
    3. 10.2 Sparsity-Promoting Algorithms
    4. 10.3 Variations on the Sparsity-Aware Theme
    5. 10.4 Online Sparsity-Promoting Algorithms
    6. 10.5 Learning Sparse Analysis Models
    7. 10.6 A Case Study: Time-Frequency Analysis
    8. 10.7 Appendix to Chapter 10: Some Hints from the Theory of Frames
    9. Problems
  19. Chapter 11: Learning in Reproducing Kernel Hilbert Spaces
    1. Abstract
    2. 11.1 Introduction
    3. 11.2 Generalized Linear Models
    4. 11.3 Volterra, Wiener, and Hammerstein Models
    5. 11.4 Cover’s Theorem: Capacity of a Space in Linear Dichotomies
    6. 11.5 Reproducing Kernel Hilbert Spaces
    7. 11.6 Representer Theorem
    8. 11.7 Kernel Ridge Regression
    9. 11.8 Support Vector Regression
    10. 11.9 Kernel Ridge Regression Revisited
    11. 11.10 Optimal Margin Classification: Support Vector Machines
    12. 11.11 Computational Considerations
    13. 11.12 Online Learning in RKHS
    14. 11.13 Multiple Kernel Learning
    15. 11.14 Nonparametric Sparsity-Aware Learning: Additive Models
    16. 11.15 A Case Study: Authorship Identification
    17. Problems
  20. Chapter 12: Bayesian Learning: Inference and the EM Algorithm
    1. Abstract
    2. 12.1 Introduction
    3. 12.2 Regression: A Bayesian Perspective
    4. 12.3 The Evidence Function and Occam’s Razor Rule
    5. 12.4 Exponential Family of Probability Distributions
    6. 12.5 Latent Variables and the EM Algorithm
    7. 12.6 Linear Regression and the EM Algorithm
    8. 12.7 Gaussian Mixture Models
    9. 12.8 Combining Learning Models: A Probabilistic Point of View
    10. Problems
    11. MATLAB Exercises
    12. 12.9 Appendix to Chapter 12
  21. Chapter 13: Bayesian Learning: Approximate Inference and Nonparametric Models
    1. Abstract
    2. 13.1 Introduction
    3. 13.2 Variational Approximation in Bayesian Learning
    4. 13.3 A Variational Bayesian Approach to Linear Regression
    5. 13.4 A Variational Bayesian Approach to Gaussian Mixture Modeling
    6. 13.5 When Bayesian Inference Meets Sparsity
    7. 13.6 Sparse Bayesian Learning (SBL)
    8. 13.7 The Relevance Vector Machine Framework
    9. 13.8 Convex Duality and Variational Bounds
    10. 13.9 Sparsity-Aware Regression: A Variational Bound Bayesian Path
    11. 13.10 Sparsity-Aware Learning: Some Concluding Remarks
    12. 13.11 Expectation Propagation
    13. 13.12 Nonparametric Bayesian Modeling
    14. 13.13 Gaussian Processes
    15. 13.14 A Case Study: Hyperspectral Image Unmixing
    16. Problems
  22. Chapter 14: Monte Carlo Methods
    1. Abstract
    2. 14.1 Introduction
    3. 14.2 Monte Carlo Methods: The Main Concept
    4. 14.3 Random Sampling Based on Function Transformation
    5. 14.4 Rejection Sampling
    6. 14.5 Importance Sampling
    7. 14.6 Monte Carlo Methods and the EM Algorithm
    8. 14.7 Markov Chain Monte Carlo Methods
    9. 14.8 The Metropolis Method
    10. 14.9 Gibbs Sampling
    11. 14.10 In Search of More Efficient Methods: A Discussion
    12. 14.11 A Case Study: Change-Point Detection
    13. Problems
  23. Chapter 15: Probabilistic Graphical Models: Part I
    1. Abstract
    2. 15.1 Introduction
    3. 15.2 The Need for Graphical Models
    4. 15.3 Bayesian Networks and the Markov Condition
    5. 15.4 Undirected Graphical Models
    6. 15.5 Factor Graphs
    7. 15.6 Moralization of Directed Graphs
    8. 15.7 Exact Inference Methods: Message-Passing Algorithms
    9. Problems
  24. Chapter 16: Probabilistic Graphical Models: Part II
    1. Abstract
    2. 16.1 Introduction
    3. 16.2 Triangulated Graphs and Junction Trees
    4. 16.3 Approximate Inference Methods
    5. 16.4 Dynamic Graphical Models
    6. 16.5 Hidden Markov Models
    7. 16.6 Beyond HMMs: A Discussion
    8. 16.7 Learning Graphical Models
    9. Problems
  25. Chapter 17: Particle Filtering
    1. Abstract
    2. 17.1 Introduction
    3. 17.2 Sequential Importance Sampling
    4. 17.3 Kalman and Particle Filtering
    5. 17.4 Particle Filtering
    6. Problems
  26. Chapter 18: Neural Networks and Deep Learning
    1. Abstract
    2. 18.1 Introduction
    3. 18.2 The Perceptron
    4. 18.3 Feed-Forward Multilayer Neural Networks
    5. 18.4 The Backpropagation Algorithm
    6. 18.5 Pruning the Network
    7. 18.6 Universal Approximation Property of Feed-Forward Neural Networks
    8. 18.7 Neural Networks: A Bayesian Flavor
    9. 18.8 Learning Deep Networks
    10. 18.9 Deep Belief Networks
    11. 18.10 Variations on the Deep Learning Theme
    12. 18.11 Case Study: A Deep Network for Optical Character Recognition
    13. 18.12 CASE Study: A Deep Autoencoder
    14. 18.13 Example: Generating Data via a DBN
    15. Problems
    16. MATLAB Exercises
  27. Chapter 19: Dimensionality Reduction and Latent Variables Modeling
    1. Abstract
    2. 19.1 Introduction
    3. 19.2 Intrinsic Dimensionality
    4. 19.3 Principle Component Analysis
    5. 19.4 Canonical Correlation Analysis
    6. 19.5 Independent Component Analysis
    7. 19.6 Dictionary Learning: The <span xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" class="italic">k</span>-SVD Algorithm-SVD Algorithm
    8. 19.7 Nonnegative Matrix Factorization
    9. 19.8 Learning Low-Dimensional Models: A Probabilistic Perspective
    10. 19.9 Nonlinear Dimensionality Reduction
    11. 19.10 Low-Rank Matrix Factorization: A Sparse Modeling Path
    12. 19.11 A Case Study: fMRI Data Analysis
    13. Problems
  28. Appendix A: Linear Algebra
    1. A.1 Properties of Matrices
    2. A.2 Positive Definite and Symmetric Matrices
    3. A.3 Wirtinger Calculus
  29. Appendix B: Probability Theory and Statistics
    1. B.1 Cramér-Rao Bound
    2. B.2 Characteristic Functions
    3. B.3 Moments and Cumulants
    4. B.4 Edgeworth Expansion of a pdf
  30. Appendix C: Hints on Constrained Optimization
    1. C.1 Equality Constraints
    2. C.2 Inequality Constrains
  31. Index