You are previewing Scaling up Machine Learning.
O'Reilly logo
Scaling up Machine Learning

Book Description

This book presents an integrated collection of representative approaches for scaling up machine learning and data mining methods on parallel and distributed computing platforms. Demand for parallelizing learning algorithms is highly task-specific: in some settings it is driven by the enormous dataset sizes, in others by model complexity or by real-time performance requirements. Making task-appropriate algorithm and platform choices for large-scale machine learning requires understanding the benefits, trade-offs and constraints of the available options. Solutions presented in the book cover a range of parallelization platforms from FPGAs and GPUs to multi-core systems and commodity clusters, concurrent programming frameworks including CUDA, MPI, MapReduce and DryadLINQ, and learning settings (supervised, unsupervised, semi-supervised and online learning). Extensive coverage of parallelization of boosted trees, SVMs, spectral clustering, belief propagation and other popular learning algorithms and deep dives into several applications make the book equally useful for researchers, students and practitioners.

Table of Contents

  1. Cover
  2. Half Title
  3. Title Page
  4. Copyright
  5. Contents
  6. Contributors
  7. Preface
  8. 1. Scaling Up Machine Learning: Introduction
    1. 1.1 Machine Learning Basics
    2. 1.2 Reasons for Scaling Up Machine Learning
    3. 1.3 Key Concepts in Parallel and Distributed Computing
    4. 1.4 Platform Choices and Trade-Offs
    5. 1.5 Thinking about Performance
    6. 1.6 Organization of the Book
    7. 1.7 Bibliographic Notes
    8. References
  9. Part One: Frameworks for Scaling Up Machine Learning
    1. 2. MapReduce and Its Application to Massively Parallel Learning of Decision Tree Ensembles
      1. 2.1 Preliminaries
      2. 2.2 Example of PLANET
      3. 2.3 Technical Details
      4. 2.4 Learning Ensembles
      5. 2.5 Engineering Issues
      6. 2.6 Experiments
      7. 2.7 Related Work
      8. 2.8 Conclusions
      9. Acknowledgments
      10. References
    2. 3. Large-Scale Machine Learning Using DryadLINQ
      1. 3.1 Manipulating Datasets with LINQ
      2. 3.2 k-Means in LINQ
      3. 3.3 Running LINQ on a Cluster with DryadLINQ
      4. 3.4 Lessons Learned
      5. References
    3. 4. IBM Parallel Machine Learning Toolbox
      1. 4.1 Data-Parallel Associative-Commutative Computation
      2. 4.2 API and Control Layer
      3. 4.3 API Extensions for Distributed-State Algorithms
      4. 4.4 Control Layer Implementation and Optimizations
      5. 4.5 Parallel Kernel k-Means
      6. 4.6 Parallel Decision Tree
      7. 4.7 Parallel Frequent Pattern Mining
      8. 4.8 Summary
      9. References
    4. 5. Uniformly Fine-Grained Data-Parallel Computing for Machine Learning Algorithms
      1. 5.1 Overview of a GP-GPU
      2. 5.2 Uniformly Fine-Grained Data-Parallel Computing on a GPU
      3. 5.3 The k-Means Clustering Algorithm
      4. 5.4 The k-Means Regression Clustering Algorithm
      5. 5.5 Implementations and Performance Comparisons
      6. 5.6 Conclusions
      7. References
  10. Part Two: Supervised and Unsupervised Learning Algorithms
    1. 6. PSVM: Parallel Support Vector Machines with Incomplete Cholesky Factorization
      1. 6.1 Interior Point Method with Incomplete Cholesky Factorization
      2. 6.2 PSVM Algorithm
      3. 6.3 Experiments
      4. 6.4 Conclusion
      5. Acknowledgments
      6. References
    2. 7. Massive SVM Parallelization Using Hardware Accelerators
      1. 7.1 Problem Formulation
      2. 7.2 Implementation of the SMO Algorithm
      3. 7.3 Micro Parallelization: Related Work
      4. 7.4 Previous Parallelizations on Multicore Systems
      5. 7.5 Micro Parallelization: Revisited
      6. 7.6 Massively Parallel Hardware Accelerator
      7. 7.7 Results
      8. 7.8 Conclusion
      9. References
    3. 8. Large-Scale Learning to Rank Using Boosted Decision Trees
      1. 8.1 Related Work
      2. 8.2 LambdaMART
      3. 8.3 Approaches to Distributing LambdaMART
      4. 8.4 Experiments
      5. 8.5 Conclusions and Future Work
      6. 8.6 Acknowledgments
      7. References
    4. 9. The Transform Regression Algorithm
      1. 9.1 Classification, Regression, and Loss Functions
      2. 9.2 Background
      3. 9.3 Motivation and Algorithm Description
      4. 9.4 TReg Expansion: Initialization and Termination
      5. 9.5 Model Accuracy Results
      6. 9.6 Parallel Performance Results
      7. 9.7 Summary
      8. References
    5. 10. Parallel Belief Propagation in Factor Graphs
      1. 10.1 Belief Propagation in Factor Graphs
      2. 10.2 Shared Memory Parallel Belief Propagation
      3. 10.3 Multicore Performance Comparison
      4. 10.4 Parallel Belief Propagation on Clusters
      5. 10.5 Conclusion
      6. Acknowledgments
      7. References
    6. 11. Distributed Gibbs Sampling for Latent Variable Models
      1. 11.1 Latent Variable Models
      2. 11.2 Distributed Inference Algorithms
      3. 11.3 Experimental Analysis of Distributed Topic Modeling
      4. 11.4 Practical Guidelines for Implementation
      5. 11.5 A Foray into Distributed Inference for Bayesian Networks
      6. 11.6 Conclusion
      7. Acknowledgments
      8. References
    7. 12. Large-Scale Spectral Clustering with MapReduce and MPI
      1. 12.1 Spectral Clustering
      2. 12.2 Spectral Clustering Using a Sparse Similarity Matrix
      3. 12.3 Parallel Spectral Clustering (PSC) Using a Sparse Similarity Matrix
      4. 12.4 Experiments
      5. 12.5 Conclusions
      6. References
    8. 13. Parallelizing Information-Theoretic Clustering Methods
      1. 13.1 Information-Theoretic Clustering
      2. 13.2 Parallel Clustering
      3. 13.3 Sequential Co-clustering
      4. 13.4 The DataLoom Algorithm
      5. 13.5 Implementation and Experimentation
      6. 13.6 Conclusion
      7. References
  11. Part Three: Alternative Learning Settings
    1. 14. Parallel Online Learning
      1. 14.1 Limits Due to Bandwidth and Latency
      2. 14.2 Parallelization Strategies
      3. 14.3 Delayed Update Analysis
      4. 14.4 Parallel Learning Algorithms
      5. 14.5 Global Update Rules
      6. 14.6 Experiments
      7. 14.7 Conclusion
      8. References
    2. 15. Parallel Graph-Based Semi-Supervised Learning
      1. 15.1 Scaling SSL to Large Datasets
      2. 15.2 Graph-Based SSL
      3. 15.3 Dataset: A 120-Million-Node Graph
      4. 15.4 Large-Scale Parallel Processing
      5. 15.5 Discussion
      6. References
    3. 16. Distributed Transfer Learning via Cooperative Matrix Factorization
      1. 16.1 Distributed Coalitional Learning
      2. 16.2 Extension of DisCo to Classification Tasks
      3. 16.3 Conclusion
      4. References
    4. 17. Parallel Large-Scale Feature Selection
      1. 17.1 Logistic Regression
      2. 17.2 Feature Selection
      3. 17.3 Parallelizing Feature Selection Algorithms
      4. 17.4 Experimental Results
      5. 17.5 Conclusions
      6. References
  12. Part Four: Applications
    1. 18. Large-Scale Learning for Vision with GPUs
      1. 18.1 A Standard Pipeline
      2. 18.2 Introduction to GPUs
      3. 18.3 A Standard Approach Scaled Up
      4. 18.4 Feature Learning with Deep Belief Networks
      5. 18.5 Conclusion
      6. References
    2. 19. Large-Scale FPGA-Based Convolutional Networks
      1. 19.1 Learning Internal Representations
      2. 19.2 A Dedicated Digital Hardware Architecture
      3. 19.3 Summary
      4. References
    3. 20. Mining Tree-Structured Data on Multicore Systems
      1. 20.1 The Multicore Challenge
      2. 20.2 Background
      3. 20.3 Memory Optimizations
      4. 20.4 Adaptive Parallelization
      5. 20.5 Empirical Evaluation
      6. 20.6 Discussion
      7. Acknowledgments
      8. References
    4. 21. Scalable Parallelization of Automatic Speech Recognition
      1. 21.1 Concurrency Identification
      2. 21.2 Software Architecture and Implementation Challenges
      3. 21.3 Multicore and Manycore Parallel Platforms
      4. 21.4 Multicore Infrastructure and Mapping
      5. 21.5 The Manycore Implementation
      6. 21.6 Implementation Profiling and Sensitivity Analysis
      7. 21.7 Application-Level Optimization
      8. 21.8 Conclusion and Key Lessons
      9. References
  13. Subject Index