You are previewing Machine Learning: Hands-On for Developers and Technical Professionals.
O'Reilly logo
Machine Learning: Hands-On for Developers and Technical Professionals

Book Description

Dig deep into the data with a hands-on guide to machine learning

Machine Learning: Hands-On for Developers and Technical Professionals provides hands-on instruction and fully-coded working examples for the most common machine learning techniques used by developers and technical professionals. The book contains a breakdown of each ML variant, explaining how it works and how it is used within certain industries, allowing readers to incorporate the presented techniques into their own work as they follow along. A core tenant of machine learning is a strong focus on data preparation, and a full exploration of the various types of learning algorithms illustrates how the proper tools can help any developer extract information and insights from existing data. The book includes a full complement of Instructor's Materials to facilitate use in the classroom, making this resource useful for students and as a professional reference.

At its core, machine learning is a mathematical, algorithm-based technology that forms the basis of historical data mining and modern big data science. Scientific analysis of big data requires a working knowledge of machine learning, which forms predictions based on known properties learned from training data. Machine Learning is an accessible, comprehensive guide for the non-mathematician, providing clear guidance that allows readers to:

  • Learn the languages of machine learning including Hadoop, Mahout, and Weka

  • Understand decision trees, Bayesian networks, and artificial neural networks

  • Implement Association Rule, Real Time, and Batch learning

  • Develop a strategic plan for safe, effective, and efficient machine learning

  • By learning to construct a system that can learn from data, readers can increase their utility across industries. Machine learning sits at the core of deep dive data analysis and visualization, which is increasingly in demand as companies discover the goldmine hiding in their existing data. For the tech professional involved in data science, Machine Learning: Hands-On for Developers and Technical Professionals provides the skills and techniques required to dig deeper.

    Table of Contents

    1. Chapter 1: What Is Machine Learning?
      1. History of Machine Learning
      2. Algorithm Types for Machine Learning
      3. The Human Touch
      4. Uses for Machine Learning
      5. Languages for Machine Learning
      6. Software Used in This Book
      7. Data Repositories
      8. Summary
    2. Chapter 2: Planning for Machine Learning
      1. The Machine Learning Cycle
      2. It All Starts with a Question
      3. I Don't Have Data!
      4. One Solution Fits All?
      5. Defining the Process
      6. Building a Data Team
      7. Data Processing
      8. Data Storage
      9. Data Privacy
      10. Data Quality and Cleaning
      11. Thinking about Input Data
      12. Thinking about Output Data
      13. Don't Be Afraid to Experiment
      14. Summary
    3. Chapter 3: Working with Decision Trees
      1. The Basics of Decision Trees
      2. Decision Trees in Weka
      3. Summary
    4. Chapter 4: Bayesian Networks
      1. Pilots to Paperclips
      2. A Little Graph Theory
      3. A Little Probability Theory
      4. Bayes' Theorem
      5. How Bayesian Networks Work
      6. Node Counts
      7. Using Domain Experts
      8. A Bayesian Network Walkthrough
      9. Summary
    5. Chapter 5: Artificial Neural Networks
      1. What Is a Neural Network?
      2. Artificial Neural Network Uses
      3. Breaking Down the Artificial Neural Network
      4. Data Preparation for Artificial Neural Networks
      5. Artificial Neural Networks with Weka
      6. Implementing a Neural Network in Java
      7. Summary
    6. Chapter 6: Association Rules Learning
      1. Where Is Association Rules Learning Used?
      2. How Association Rules Learning Works
      3. Algorithms
      4. Mining the Baskets—A Walkthrough
      5. Summary
    7. Chapter 7: Support Vector Machines
      1. What Is a Support Vector Machine?
      2. Where Are Support Vector Machines Used?
      3. The Basic Classification Principles
      4. How Support Vector Machines Approach Classification
      5. Using Support Vector Machines in Weka
      6. Summary
    8. Chapter 8: Clustering
      1. What Is Clustering?
      2. Where Is Clustering Used?
      3. Clustering Models
      4. K-Means Clustering with Weka
      5. Summary
    9. Chapter 9: Machine Learning in Real Time with Spring XD
      1. Capturing the Firehose of Data
      2. Using Spring XD
      3. Learning from Twitter Data
      4. Configuring Spring XD
      5. Spring XD and Twitter
      6. Introducing Processors
      7. Real-Time Sentiment Analysis
      8. Summary
    10. Chapter 10: Machine Learning as a Batch Process
      1. Is It Big Data?
      2. Considerations for Batch Processing Data
      3. Practical Examples of Batch Processes
      4. Using the Hadoop Framework
      5. How MapReduce Works
      6. Mining the Hashtags
      7. Mining Sales Data
      8. Scheduling Batch Jobs
      9. Summary
    11. Chapter 11: Apache Spark
      1. Spark: A Hadoop Replacement?
      2. Java, Scala, or Python?
      3. Scala Crash Course
      4. Downloading and Installing Spark
      5. A Quick Intro to Spark
      6. Comparing Hadoop MapReduce to Spark
      7. Writing Standalone Programs with Spark
      8. Spark SQL
      9. Spark Streaming
      10. MLib: The Machine Learning Library
      11. Summary
    12. Chapter 12: Machine Learning with R
      1. Installing R
      2. Your First Run
      3. Installing R-Studio
      4. The R Basics
      5. Simple Statistics
      6. Simple Linear Regression
      7. Basic Sentiment Analysis
      8. Apriori Association Rules
      9. Accessing R from Java
      10. R and Hadoop
      11. Summary
    13. Appendix A: SpringXD Quick Start
      1. Installing Manually
      2. Starting SpringXD
      3. Creating a Stream
      4. Adding a Twitter Application Key
    14. Appendix B: Hadoop 1.x Quick Start
      1. Downloading and Installing Hadoop
      2. Formatting the HDFS Filesystem
      3. Starting and Stopping Hadoop
      4. Process List of a Basic Job
    15. Appendix C: Useful Unix Commands
      1. Using Sample Data
      2. Showing the Contents: cat, more, and less
      3. Filtering Content: grep
      4. Sorting Data: sort
      5. Finding Unique Occurrences: uniq
      6. Showing the Top of a File: head
      7. Counting Words: wc
      8. Locating Anything: find
      9. Combining Commands and Redirecting Output
      10. Picking a Text Editor
    16. Appendix D: Further Reading
      1. Machine Learning
      2. Statistics
      3. Big Data and Data Science
      4. Hadoop
      5. Visualization
      6. Making Decisions
      7. Datasets
      8. Blogs
      9. Useful Websites
      10. The Tools of the Trade
    17. Introduction
      1. Aims of This Book
      2. “Hands-On” Means Hands-On
      3. “What About the Math?”
      4. What Will You Have Learned by the End?
      5. Balancing Theory and Hands-On Learning
      6. Outline of the Chapters
      7. Source Code for This Book
      8. Using Git
    18. End User License Agreement