You are previewing Machine Learning Projects for .NET Developers.
O'Reilly logo
Machine Learning Projects for .NET Developers

Book Description

Machine Learning Projects for .NET Developers shows you how to build smarter .NET applications that learn from data, using simple algorithms and techniques that can be applied to a wide range of real-world problems. You’ll code each project in the familiar setting of Visual Studio, while the machine learning logic uses F#, a language ideally suited to machine learning applications in .NET. If you’re new to F#, this book will give you everything you need to get started. If you’re already familiar with F#, this is your chance to put the language into action in an exciting new context.

In a series of fascinating projects, you’ll learn how to:

  • Build an optical character recognition (OCR) system from scratch
  • Code a spam filter that learns by example
  • Use F#’s powerful type providers to interface with external resources (in this case, data analysis tools from the R programming language)
  • Transform your data into informative features, and use them to make accurate predictions
  • Find patterns in data when you don’t know what you’re looking for
  • Predict numerical values using regression models
  • Implement an intelligent game that learns how to play from experience
  • Along the way, you’ll learn fundamental ideas that can be applied in all kinds of real-world contexts and industries, from advertising to finance, medicine, and scientific research. While some machine learning algorithms use fairly advanced mathematics, this book focuses on simple but effective approaches. If you enjoy hacking code and data, this book is for you.

    Table of Contents

    1. Cover
    2. Title
    3. Copyright
    4. Contents at a Glance
    5. Contents
    6. About the Author
    7. About the Technical Reviewer
    8. Acknowledgments
    9. Introduction
    10. Chapter 1: 256 Shades of Gray
      1. What Is Machine Learning?
      2. A Classic Machine Learning Problem: Classifying Images
        1. Our Challenge: Build a Digit Recognizer
        2. Distance Functions in Machine Learning
        3. Start with Something Simple
      3. Our First Model, C# Version
        1. Dataset Organization
        2. Reading the Data
        3. Computing Distance between Images
        4. Writing a Classifier
      4. So, How Do We Know It Works?
        1. Cross-validation
        2. Evaluating the Quality of Our Model
        3. Improving Your Model
      5. Introducing F# for Machine Learning
        1. Live Scripting and Data Exploration with F# Interactive
        2. Creating our First F# Script
        3. Dissecting Our First F# Script
        4. Creating Pipelines of Functions
        5. Manipulating Data with Tuples and Pattern Matching
        6. Training and Evaluating a Classifier Function
      6. Improving Our Model
        1. Experimenting with Another Definition of Distance
        2. Factoring Out the Distance Function
      7. So, What Have We Learned?
        1. What to Look for in a Good Distance Function
        2. Models Don’t Have to Be Complicated
        3. Why F#?
      8. Going Further
    11. Chapter 2: Spam or Ham?
      1. Our Challenge: Build a Spam-Detection Engine
        1. Getting to Know Our Dataset
        2. Using Discriminated Unions to Model Labels
        3. Reading Our Dataset
      2. Deciding on a Single Word
        1. Using Words as Clues
        2. Putting a Number on How Certain We Are
        3. Bayes’ Theorem
        4. Dealing with Rare Words
      3. Combining Multiple Words
        1. Breaking Text into Tokens
        2. Naïvely Combining Scores
        3. Simplified Document Score
      4. Implementing the Classifier
        1. Extracting Code into Modules
        2. Scoring and Classifying a Document
        3. Introducing Sets and Sequences
        4. Learning from a Corpus of Documents
      5. Training Our First Classifier
        1. Implementing Our First Tokenizer
        2. Validating Our Design Interactively
        3. Establishing a Baseline with Cross-validation
      6. Improving Our Classifier
        1. Using Every Single Word
        2. Does Capitalization Matter?
        3. Less Is more
        4. Choosing Our Words Carefully
        5. Creating New Features
        6. Dealing with Numeric Values
      7. Understanding Errors
      8. So What Have We Learned?
    12. Chapter 3: The Joy of Type Providers
      1. Exploring StackOverflow data
        1. The StackExchange API
        2. Using the JSON Type Provider
        3. Building a Minimal DSL to Query Questions
      2. All the Data in the World
        1. The World Bank Type Provider
        2. The R Type Provider
        3. Analyzing Data Together with R Data Frames
        4. Deedle, a .NET Data Frame
        5. Data of the World, Unite!
      3. So, What Have We Learned?
        1. Going Further
    13. Chapter 4: Of Bikes and Men
      1. Getting to Know the Data
        1. What’s in the Dataset?
        2. Inspecting the Data with FSharp.Charting
        3. Spotting Trends with Moving Averages
      2. Fitting a Model to the Data
        1. Defining a Basic Straight-Line Model
        2. Finding the Lowest-Cost Model
        3. Finding the Minimum of a Function with Gradient Descent
        4. Using Gradient Descent to Fit a Curve
        5. A More General Model Formulation
      3. Implementing Gradient Descent
        1. Stochastic Gradient Descent
        2. Analyzing Model Improvements
        3. Batch Gradient Descent
      4. Linear Algebra to the Rescue
        1. Honey, I Shrunk the Formula!
        2. Linear Algebra with Math.NET
        3. Normal Form
        4. Pedal to the Metal with MKL
      5. Evolving and Validating Models Rapidly
        1. Cross-Validation and Over-Fitting, Again
        2. Simplifying the Creation of Models
        3. Adding Continuous Features to the Model
      6. Refining Predictions with More Features
        1. Handling Categorical Features
        2. Non-linear Features
        3. Regularization
      7. So, What Have We Learned?
        1. Minimizing Cost with Gradient Descent
        2. Predicting a Number with Regression
    14. Chapter 5: You Are Not a Unique Snowflake
      1. Detecting Patterns in Data
      2. Our Challenge: Understanding Topics on StackOverflow
        1. Getting to Know Our Data
      3. Finding Clusters with K-Means Clustering
        1. Improving Clusters and Centroids
        2. Implementing K-Means Clustering
      4. Clustering StackOverflow Tags
        1. Running the Clustering Analysis
        2. Analyzing the Results
      5. Good Clusters, Bad Clusters
      6. Rescaling Our Dataset to Improve Clusters
      7. Identifying How Many Clusters to Search For
        1. What Are Good Clusters?
        2. Identifying k on the StackOverflow Dataset
        3. Our Final Clusters
      8. Detecting How Features Are Related
        1. Covariance and Correlation
        2. Correlations Between StackOverflow Tags
      9. Identifying Better Features with Principal Component Analysis
        1. Recombining Features with Algebra
        2. A Small Preview of PCA in Action
        3. Implementing PCA
        4. Applying PCA to the StackOverflow Dataset
        5. Analyzing the Extracted Features
      10. Making Recommendations
        1. A Primitive Tag Recommender
        2. Implementing the Recommender
        3. Validating the Recommendations
      11. So What Have We Learned?
    15. Chapter 6: Trees and Forests
      1. Our Challenge: Sink or Swim on the Titanic
        1. Getting to Know the Dataset
        2. Taking a Look at Features
        3. Building a Decision Stump
        4. Training the Stump
      2. Features That Don’t Fit
        1. How About Numbers?
        2. What about Missing Data?
      3. Measuring Information in Data
        1. Measuring Uncertainty with Entropy
        2. Information Gain
        3. Implementing the Best Feature Identification
        4. Using Entropy to Discretize Numeric Features
      4. Growing a Tree from Data
        1. Modeling the Tree
        2. Constructing the Tree
        3. A Prettier Tree
      5. Improving the Tree
        1. Why Are We Over-Fitting?
        2. Limiting Over-Confidence with Filters
      6. From Trees to Forests
        1. Deeper Cross-Validation with k-folds
        2. Combining Fragile Trees into Robust Forests
        3. Implementing the Missing Blocks
        4. Growing a Forest
        5. Trying Out the Forest
      7. So, What Have We Learned?
    16. Chapter 7: A Strange Game
      1. Building a Simple Game
        1. Modeling Game Elements
        2. Modeling the Game Logic
        3. Running the Game as a Console App
        4. Rendering the Game
      2. Building a Primitive Brain
        1. Modeling the Decision Making Process
        2. Learning a Winning Strategy from Experience
        3. Implementing the Brain
        4. Testing Our Brain
      3. Can We Learn More Effectively?
        1. Exploration vs. Exploitation
        2. Is a Red Door Different from a Blue Door?
        3. Greed vs. Planning
      4. A World of Never-Ending Tiles
      5. Implementing Brain 2.0
        1. Simplifying the World
        2. Planning Ahead
        3. Epsilon Learning
      6. So, What Have We Learned?
        1. A Simple Model That Fits Intuition
        2. An Adaptive Mechanism
    17. Chapter 8: Digits, Revisited
      1. Optimizing and Scaling Your Algorithm Code
      2. Tuning Your Code
        1. What to Search For
        2. Tuning the Distance
        3. Using Array.Parallel
      3. Different Classifiers with Accord.NET
        1. Logistic Regression
        2. Simple Logistic Regression with Accord
        3. One-vs-One, One-vs-All Classification
        4. Support Vector Machines
        5. Neural Networks
        6. Creating and Training a Neural Network with Accord
      4. Scaling with m-brace.net
        1. Getting Started with MBrace on Azure with Brisk
        2. Processing Large Datasets with MBrace
      5. So What Did We Learn?
    18. Chapter 9: Conclusion
      1. Mapping Our Journey
      2. Science!
      3. F#: Being Productive in a Functional Style
      4. What’s Next?
    19. Index