You are previewing Deep Learning.
O'Reilly logo
Deep Learning

Book Description

Looking for one central source where you can learn key findings on machine learning? Deep Learning: A Practitioner's Approach provides developers and data scientists with the most practical information available on the subject, including deep learning theory, best practices, and use cases.

Table of Contents

  1. Preface
    1. What’s in This Book?
    2. Who is “The Practitioner”?
    3. Who Should Read This Book?
      1. The Enterprise Machine Learning Practitioner
      2. The Enterprise Executive
      3. The Academic
    4. About Early Release books from O’Reilly
    5. Conventions Used in This Book
    6. Using Code Examples
    7. Administrative Notes
    8. Safari® Books Online
    9. How to Contact Us
    10. Acknowledgements
      1. Josh’s Acknowledgements
      2. Adam’s Acknowledgements
  2. 1. A Review of Machine Learning
    1. The Learning Machines
      1. How Can Machines Learn?
      2. Biological Inspiration
      3. What is Deep Learning?
      4. Going Down the Rabbit Hole 
      5. Organization of This Book
    2. Framing the Questions
    3. Linear Algebra
      1. Scalars
      2. Vectors
      3. Matrices
      4. Tensors
      5. Hyperplanes
      6. Relevant Mathematical Operations
      7. Converting Data Into Vectors
      8. Solving Systems of Equations
    4. Some Basic Statistics
      1. Probability
      2. Conditional Probabilities
      3. Posterior Probability
      4. Distributions
      5. Samples vs Population
      6. Resampling Methods
      7. Selection Bias
      8. Likelihood
    5. How Does Machine Learning Work?
      1. Optimization
      2. Convex Optimization
      3. Gradient Descent
      4. Stochastic Gradient Descent
      5. Quasi-Newton Optimization Methods
      6. Underfitting and Overfitting
      7. Regression
      8. Classification
      9. Recommendation
      10. Clustering
    6. Logistic Regression
      1. The Logistic Function
      2. Understanding Logistic Regression Output
    7. Evaluating Models
      1. The Confusion Matrix
  3. 2. Foundations of Neural Networks
    1. Neural Networks
      1. The Biological Neuron
      2. The Perceptron
      3. Multi-Layer Feed-Forward Networks
    2. Training Neural Networks
      1. Backpropagation Learning
    3. Activation Functions
      1. Linear
      2. Sigmoid
      3. Tanh
      4. Hard Tanh
      5. Softmax
      6. Rectified Linear
    4. Loss Functions
      1. Loss Function Notation
      2. Loss Functions for Regression
      3. Loss Functions for Classification
      4. Loss Functions for Reconstruction
    5. Hyperparameters
      1. Learning Rate
      2. Regularization
      3. Momentum
      4. Sparsity
  4. 3. Fundamentals of Deep Networks
    1. Defining Deep Learning
      1. What is Deep Learning?
      2. Organization of This Chapter
    2. Common Architectural Principals of Deep Networks
      1. Parameters
      2. Layers
      3. Activation Functions
      4. Loss Functions
      5. Optimization Algorithms
      6. Hyperparameters
      7. Summary
    3. Building Blocks of Deep Networks
      1. Restricted Boltzmann Machines
      2. Autoencoders
  5. 4. Major Architectures of Deep Networks
    1. Unsupervised Pre-Trained Networks
      1. Deep Belief Networks
    2. Convolutional Neural Networks
      1. Biological Inspiration
      2. Intuition
      3. Convolutional Network Architecture Overview
      4. Input Layers
      5. Convolutional Layers
      6. Pooling Layers
      7. Fully-Connected Layers
      8. Other Popular Convolutional Network Architectures
      9. Summary
    3. Recurrent Neural Networks
      1. Modeling the Time Dimension
      2. 3D Volumetric Input
      3. Why Not Markov Models?
      4. Network Architecture
      5. Domain Specific Applications
    4. Recursive Neural Networks
      1. Network Architecture
      2. Varieties of Recursive Neural Networks
      3. Applications of Recursive Neural Networks
    5. Summary and Discussion
      1. Will Deep Learning Make Other Algorithms Obsolete?
      2. Different Problems Have Different Best Methods
      3. When Do I Need Deep Learning?
  6. 5. Building Deep Networks
    1. Matching Deep Networks to the Right Problem
      1. Columnar Data and Multi-Layer Perceptrons
      2. Images and Convolutional Neural Networks
      3. Timeseries Sequences and Recurrent Neural Networks
      4. Using Hybrid Networks
    2. The DL4J Suite of Tools
      1. Vectorization and DataVec
      2. Runtimes and ND4J
    3. Starting a New DL4J Project
      1. Java
      2. Working with Maven
      3. Integrated Development Environments
    4. Basic Concepts of DL4J API
      1. Loading and Saving Models
      2. Getting Input For the Model
      3. Setting Up Model Architecture
      4. Training and Evaluation
    5. Modeling CSV Data with Multi-Layer Perceptron Networks
      1. Setting Up Input Data
      2. Determining Network Architecture
      3. Training the Model
      4. Evaluating the Model
    6. Modeling Hand-Written Images with Convolutional Neural Networks
      1. High-Level Workflow for Modeling MNIST with LeNet
      2. Java Code Listing for LeNet Convolutional Network
      3. Loading and Vectorizing the Input Images
      4. Network Architecture for LeNet in DL4J
      5. Training the Convolutional Network
    7. Modeling Sequence Data with Recurrent Neural Networks
      1. Generating Shakespeare with LSTMs
      2. Classifying Sensor Timeseries Sequences with LSTMs
    8. Using AutoEncoders for Anomaly Detection
      1. Reconstruction Error as Anomaly Indicator
      2. High-Level Workflow for Training the AutoEncoder
      3. Java Code Listing for Example
      4. Setting Up Input Data
      5. AutoEncoder Network Architecture
      6. Training the AutoEncoder Network
      7. Evaluating the Model
    9. Applications of Deep Learning in Natural Language Processing
      1. Learning Word Embeddings with Word2Vec
      2. Other Semantic Embedding Variants
  7. 6. Tuning Deep Networks
    1. Basic Concepts in Tuning Deep Networks
      1. An Intuition for Building Deep Networks
      2. Building the Intuition as a Step-by-Step Process
    2. Matching Input Data and Network Architectures
      1. Columnar Data
      2. Image Data
      3. Sequential Data
      4. Audio Data
      5. Video Data
      6. Summary
    3. Relating Model Goal and Output Layers
      1. Regression Model Output Layer
      2. Classification Model Output Layer
    4. Working with Layer Count and Number of Neurons
      1. Feed-Forward Multi-Layer Networks
      2. Controlling Layer and Neuron Counts
    5. Weight Initialization Strategies
      1. Basic Rule of Thumb
      2. Weights Connecting to TanH Units
      3. Weights Connecting to ReLU Units
    6. Using Activation Functions
      1. Target Distributions
      2. Linear
      3. Sigmoid
      4. Tanh
      5. Rectified Linear Unit (ReLU)
      6. Softmax
      7. Softplus
      8. Hard Tanh
      9. Maxout
      10. Summary Table for Activation Functions
      11. Summary of Activation Function Usage
    7. Applying Loss Functions
      1. Intuition
      2. Loss Functions for Regressions
      3. Loss Functions for Classification
      4. Summary
    8. Understanding Learning Rates
      1. Intuition
      2. Summary
    9. How Sparsity Affects Learning
      1. Leveraging Histograms When Setting Sparsity
      2. Sparsity and Other Hyperparameters
    10. Applying Methods of Optimization
      1. Tuning Intuition
      2. Stochastic Gradient Descent
      3. L-BFGS
      4. Conjugate Gradient
      5. Hessian Free
      6. Comparing the Methods
      7. Choosing An Optimization Algorithm
      8. Optimization Methods Seen as Impractical
      9. Summary
    11. Leveraging Parallelization and GPUs for Faster Training
      1. From Batch to Online Learning
      2. The Cost of Moving Data
      3. Parallel Iterative Algorithms
      4. Parallelizing Stochastic Gradient Descent
      5. Parallelization Effects on Training
      6. GPUs
    12. Controlling Epochs and Mini-Batch Size
      1. Terminology For Passes Over Training Data
      2. Determining Mini-Batch Size
      3. Mini-Batch Size and Learning Rate
    13. How to Use Regularization
      1. Priors as Regularizers
      2. Max-Norm Regularization
      3. Random Targets
      4. Dropout
      5. Stochastic Pooling
      6. Adversarial Training
      7. Curriculum Learning
    14. An Introduction to Hyperparameter Optimization
      1. Manual Search
      2. Random Search
    15. Other Tuning-Related Topics
      1. Visualizing Learning in Deep Belief Networks
      2. Evaluating a Neural Network Model
      3. Overfitting
      4. Dealing with Class Imbalance
  8. 7. Tuning Specific Deep Network Architectures
    1. Restricted Boltzmann Machines
      1. Intuition
      2. Hidden Units and Modeing Available Information
    2. Deep Belief Networks
      1. Pre-Training with Restricted Boltzmann Machines
      2. Initializing Weights
      3. Setting the Learning Rate
      4. Using Momentum
      5. Using Regularization
      6. Sparsity
      7. Determining Hidden Unit Count
    3. Convolutional Neural Networks
      1. Intuition
      2. Working with Spatial Arrangement
      3. Configuring Filters
      4. Common Convolutional Architectural Patterns
      5. Configuring Layers
    4. Recurrent Neural Networks
      1. Intuition
      2. Network Input Data and Input Layers
      3. Output Layer
      4. Padding and Masking
      5. Training the Network
      6. Evaluation and Scoring With Masking
      7. Regularization
      8. Variants of Recurrent Network Architectures
  9. 8. Vectorization
    1. Introduction to Vectorization in Machine Learning
      1. A Missing Link
      2. Why do We Need to Vectorize Data?
    2. Converting Raw Data Into Vectors
      1. Understanding Raw Data
      2. Strategies For Dealing with Raw Data Attributes
      3. Transforming Raw Source Data
    3. Traditional Feature Engineering Techniques
      1. Feature Copying
      2. Feature Scaling
      3. Standardization
      4. Binarization
      5. Normalization
      6. Zero Mean, Unit Variance
      7. Dimensionality Reduction
      8. Deep Learning and Feature Learning
    4. Well Known Vector File Formats
      1. SVMLight
      2. libSVM
      3. ARFF
    5. Working with Text in Vectorization
      1. Free Text and Vector Space Model
      2. Bag of Words
      3. Term Frequency Inverse Document Frequency (TF-IDF)
      4. N-grams
      5. Kernel Hashing
    6. Image Vectorization
    7. Working with Timeseries in Vectorization
    8. DataVec and Vectorization
  10. 9. Using Deep Learning and DL4J on Spark
    1. Introduction to Using Spark and Hadoop
      1. What is Spark?
      2. What is Hadoop?
      3. Operating Spark from the Command Line
    2. Configuring and Tuning Spark Execution
      1. Understanding Spark Execution Modes
      2. General Spark Tuning Guide
      3. Understanding Spark, the JVM, and Garbage Collection
      4. Tuning DL4J Jobs on Spark
      5. Understanding Parallel Performance
    3. Setting Up a Maven POM for Spark and DL4J
      1. Major Dependencies for a DL4J Spark Job
      2. Understanding Versions
      3. A Pom.xml File Dependency Template
      4. Setting Up a POM File for CDH 5.X
      5. Setting Up a POM file for HDP 2.4
      6. Controlling Jar Size
    4. Troubleshooting Spark and Hadoop
      1. Common Issues with ND4J
      2. Common Issues with Spark
      3. List of Key Ports for Spark and YARN
    5. DL4J Parallel Execution on Spark
      1. Distributed Network Training
      2. A Minimal Spark Training Example
      3. Working with the TrainingMaster
    6. DL4J API Best Practices for Spark
      1. Slim Down the Jar
      2. A Well-Tuned Cluster
      3. Efficient Vectorization Pipelines
      4. A Well-Tuned JVM Goes a Long Way
    7. Multi-Layer Perceptron Spark Example
      1. Since We Last Spoke
      2. Spark Code
      3. Building the Spark Job Jar
      4. Setting Up Network Architecture
      5. Tracking Progress and Understanding Results
    8. Recurrent Neural Network Spark Example
      1. Loading and Vectorizing Data
      2. Setting Up Network Architecture
      3. Tracking Progress and Understanding Results
    9. Modeling MNIST with a Convolutional Neural Network on Spark Local
      1. Java Code Listing for Spark Local LeNet Example
      2. Configuring the Spark Job in Code
      3. Loading and Vectorizing MNIST Data
      4. Setting Up the Convolutional Network Architecture
      5. Training the Convolutional Network on Spark
      6. Building the Spark Job Jar
      7. Executing the Spark Job
  11. A. What is Artificial Intelligence?
    1. The Story So Far
      1. Defining Deep Learning
      2. Defining Artificial Intelligence
    2. What is Driving Interest Today in Artificial Intelligence?
      1. The Big Jump in Computer Vision
      2. Advancement in Applications of Deep Learning
      3. The Wave of Big Data
      4. A Confluence of Coverage
      5. Tidal Forces
    3. Winter Is Coming
      1. A Repeating Season
  12. B. Numbers Everyone Should Know
  13. C. Setting Up DL4J Projects in Maven
    1. ND4J
  14. D. Setting Up GPUs for DL4J Projects
    1. Switching Backends to GPU
      1. Picking a GPU
  15. E. Using the ND4J API
    1. Design and Basic Usage
      1. Understanding NDArrays
      2. ND4J General Syntax
      3. The Basics of Working with NDArrays
      4. DataSet
    2. Creating Input Vectors
      1. Basics of Vector Creation
    3. Using MLLibUtil
      1. Converting From INDArray to MLLib Vector
      2. Converting from MLLib Vector to INDArray
    4. Making Model Predictions with DL4J
      1. Using the DL4J and ND4J Together
      2. Evaluating Regular Predictions
      3. Evaluating Timeseries Predictions
    5. Building Confusion Matrices
  16. F. Using DataVec
    1. Loading Data for Machine Learning
    2. Loading CSV Data for Multi-Layer Perceptrons
    3. Loading Image Data for Convolutional Neural Networks
    4. Loading Sequence Data for Recurrent Neural Networks
    5. Transforming Data: Data Wranging with DataVec
      1. DataVec Transforms: Key Concepts
      2. DataVec Transform Functionality: An Example
  17. G. RL4J and Reinforcement Learning
    1. Preliminaries
      1. Markov Decision Process
    2. Different Settings
      1. Model-Free
      2. Observation Setting
      3. Single Player and Adversarial Games
    3. Q-Learning
      1. From Policy to Neural Network
      2. Policy Iteration
      3. Exploration vs Exploitation
      4. Bellman Equation
      5. Offline and Online Reinforcement Learning
      6. Initial State Sampling
      7. Q-Learning Implementation
      8. Modeling Q(s, a)
      9. Experience Replay
      10. Convolutional Layers and Image Preprocessing
      11. Skip Frame
      12. Double dqn
      13. Clipping
      14. Policy Gradient
      15. Asynchronous Methods for Deep Reinforcement Learning
    4. Example in RL4J
  18. H. Other Deep Learning Libraries
    1. TensorFlow 
    2. Caffe
    3. Keras
    4. Torch
    5. Theano
    6. Other Machine Learning Libraries
  19. I. Evaluating Deep Learning Platforms
    1. Licensing
    2. Speed
    3. Arguments for the JVM
      1. Widely Deployed
      2. The JVM Ecosystem
      3. Java and Speed
      4. Solvable Issues
      5. Security as a First Class Citizen
  20. J. Working with DL4J From Source
    1. Verifying Git is Installed
    2. Cloning Key DL4J Github Projects
    3. Downloading Source Via Zip File
    4. Using Maven to Build Source Code
  21. K. Troubleshooting DL4J Installations
    1. Previous Installation
    2. Memory Errors When Installing From Source
    3. Older Versions of Maven
    4. Maven and Path Variables
    5. Bad JDK Versions
    6. C++ and Other Development Tools
    7. Windows and Include Paths
    8. Monitoring GPUs
    9. Using the JVisualVM
    10. Working with Clojure
    11. OSX and Float Support
    12. Fork-Join Bug in Java 7
    13. Precautions
      1. Other Local Repositories
      2. Check Maven Dependencies
      3. Reinstall Dependencies
      4. If All Else Fails
    14. Different Platforms
      1. OSX
      2. Windows
      3. Linux
  22. L. References
    1. Reference Papers