O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Practical Machine Learning with H2O

Book Description

Machine learning has finally come of age. With H2O software, you can perform machine learning and data analysis using a simple open source framework that’s easy to use, has a wide range of OS and language support, and scales for big data. This hands-on guide teaches you how to use H20 with only minimal math and theory behind the learning algorithms. If you’re familiar with R or Python, know a bit of statistics, and have some experience manipulating data, author Darren Cook will take you through H2O basics and help you conduct machine-learning experiments on different sample data sets.

Table of Contents

  1. Preface
    1. Who Uses It and Why?
    2. About You
    3. Conventions Used in This Book
    4. Using Code Examples
    5. O’Reilly Safari
    6. How to Contact Us
    7. Acknowledgments
  2. 1. Installation and Quick-Start
    1. Preparing to Install
      1. Installing R
      2. Installing Python
      3. Privacy
      4. Installing Java
    2. Install H2O with R (CRAN)
    3. Install H2O with Python (pip)
    4. Our First Learning
      1. Training and Predictions, with Python
      2. Training and Predictions, with R
      3. Performance Versus Predictions
      4. On Being Unlucky
    5. Flow
      1. Data
      2. Models
      3. Predictions
      4. Other Things in Flow
    6. Summary
  3. 2. Data Import, Data Export
    1. Memory Requirements
    2. Preparing the Data
    3. Getting Data into H2O
      1. Load CSV Files
      2. Load Other File Formats
      3. Load Directly from R
      4. Load Directly from Python
    4. Data Manipulation
      1. Laziness, Naming, Deleting
      2. Data Summaries
      3. Operations on Columns
      4. Aggregating Rows
      5. Indexing
      6. Split Data Already in H2O
      7. Rows and Columns
    5. Getting Data Out of H2O
      1. Exporting Data Frames
      2. POJOs
      3. Model Files
      4. Save All Models
    6. Summary
  4. 3. The Data Sets
    1. Data Set: Building Energy Efficiency
      1. Setup and Load
      2. The Data Columns
      3. Splitting the Data
      4. Let’s Take a Look!
      5. About the Data Set
    2. Data Set: Handwritten Digits
      1. Setup and Load
      2. Taking a Look
      3. Helping the Models
      4. About the Data Set
    3. Data Set: Football Scores
      1. Correlations
      2. Missing Data… And Yet More Columns
      3. How to Train and Test?
      4. Setup and Load
      5. The Other Third
      6. Missing Data (Again)
      7. Setup and Load (Again)
      8. About the Data Set
    4. Summary
  5. 4. Common Model Parameters
    1. Supported Metrics
      1. Regression Metrics
      2. Classification Metrics
      3. Binomial Classification
    2. The Essentials
    3. Effort
    4. Scoring and Validation
    5. Early Stopping
    6. Checkpoints
    7. Cross-Validation (aka k-folds)
    8. Data Weighting
    9. Sampling, Generalizing
    10. Regression
    11. Output Control
    12. Summary
  6. 5. Random Forest
    1. Decision Trees
    2. Random Forest
    3. Parameters
    4. Building Energy Efficiency: Default Random Forest
    5. Grid Search
      1. Cartesian
      2. RandomDiscrete
      3. High-Level Strategy
    6. Building Energy Efficiency: Tuned Random Forest
    7. MNIST: Default Random Forest
    8. MNIST: Tuned Random Forest
      1. Enhanced Data
    9. Football: Default Random Forest
    10. Football: Tuned Random Forest
    11. Summary
  7. 6. Gradient Boosting Machines
    1. Boosting
    2. The Good, the Bad, and… the Mysterious
    3. Parameters
    4. Building Energy Efficiency: Default GBM
    5. Building Energy Efficiency: Tuned GBM
    6. MNIST: Default GBM
    7. MNIST: Tuned GBM
    8. Football: Default GBM
    9. Football: Tuned GBM
    10. Summary
  8. 7. Linear Models
    1. GLM Parameters
    2. Building Energy Efficiency: Default GLM
    3. Building Energy Efficiency: Tuned GLM
    4. MNIST: Default GLM
    5. MNIST: Tuned GLM
    6. Football: Default GLM
    7. Football: Tuned GLM
    8. Summary
  9. 8. Deep Learning (Neural Nets)
    1. What Are Neural Nets?
      1. Numbers Versus Categories
      2. Network Layers
      3. Activation Functions
    2. Parameters
      1. Deep Learning Regularization
      2. Deep Learning Scoring
    3. Building Energy Efficiency: Default Deep Learning
    4. Building Energy Efficiency: Tuned Deep Learning
    5. MNIST: Default Deep Learning
    6. MNIST: Tuned Deep Learning
    7. Football: Default Deep Learning
    8. Football: Tuned Deep Learning
    9. Summary
    10. Appendix: More Deep Learning Parameters
  10. 9. Unsupervised Learning
    1. K-Means Clustering
    2. Deep Learning Auto-Encoder
      1. Stacked Auto-Encoder
    3. Principal Component Analysis
    4. GLRM
    5. Missing Data
      1. GLRM
      2. Lose the R!
    6. Summary
  11. 10. Everything Else
    1. Staying on Top of and Poking into Things
    2. Installing the Latest Version
      1. Building from Source
    3. Running from the Command Line
    4. Clusters
      1. EC2
      2. Other Cloud Providers
      3. Hadoop
    5. Spark / Sparkling Water
    6. Naive Bayes
    7. Ensembles
      1. Stacking: h2o.ensemble
      2. Categorical Ensembles
    8. Summary
  12. 11. Epilogue: Didn’t They All Do Well!
    1. Building Energy Results
    2. MNIST Results
    3. Football Data
    4. How Low Can You Go?
      1. The More the Merrier
      2. Still Desperate for More
      3. Filtering for Hardness
      4. Auto-Encoder
      5. Convolute and Shrink
      6. Ensembles
      7. That Was as Low as I Go…
    5. Summary
  13. Index