O'Reilly logo
live online training icon Live Online training

A Practical Introduction to Machine Learning

Bypass the academic theories. Learn how to integrate two basic machine-learning algorithms in your daily work, using programming best practices.

Matt Kirk

With the increasing popularity of Alexa, Xbox Kinect, Cortana, and Siri, machine learning and AI are fast becoming required components of the software developer’s toolkit. However, machine learning isn’t a silver bullet. It requires domain knowledge and intuition to solve problems.

Join Matthew Kirk for an introduction to machine-learning concepts. Instead of spending time focusing on the academic foundation of machine learning, you’ll delve into the k-Nearest Neighbors algorithm (k-NN) and naive Bayes classifiers to learn how to apply the machine-learning thought process to any programming-centric career. You’ll leave prepared to approach supervised learning problems with programming best practices and ready to implement these two algorithms in your daily work.

What you'll learn-and how you can apply it

By the end of this live, online course, you’ll understand:

  • Induction versus deduction and how this applies to data
  • How to use the k-Nearest Neighbors algorithm to classify housing data
  • How to utilize naive Bayes classifiers for simple yes or no answers


And you’ll be able to:

  • Write cross-validation tests for supervised learning algorithms
  • Code a simple classifier using k-Nearest Neighbors
  • Code a simple classifier using naive Bayes

This training course is for you because...

  • You are a mid-level software developer who wants to become adept in machine learning.
  • You are a data analyst with an academic background who wants to automate some of your tasks.
  • You are a technical executive who wants to guide your organization to implementing more machine-learning projects.

Prerequisites

  • Basic knowledge of coding principles, such as for loops, if conditions, and data structures

Materials and downloads needed:

  • A machine with Python 3 installed

Recommended Preparation:

Thoughtful Machine Learning with Python (Book)

Introduction to Python

About your instructor

  • Matt Kirk is a data architect, software engineer, and entrepreneur based out of Seattle, WA.
    For years, he struggled to piece together his quantitative finance background with his passion for building software.
    Then he discovered his affinity for solving problems with data.
    Now, he helps multi-million dollar companies with their data projects. From diamond recommendation engines to marketing automation tools, he loves educating engineering teams about methods to start their big data projects.
    To learn more about how you can get started with your big data project (beyond taking this class), check out matthewkirk.com for tips.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Introduction (5 minutes)

Inductive reasoning: a supervised learning approach to machine learning (1 hour)

  • Presentation (20 minutes)
  • What is machine learning?
  • Inductive versus deductive reasoning
  • Supervised learning and other learning classes
  • Testing supervised learning methods
  • Coding principles and how they relate to machine learning (i.e., SOLID)
  • Test-driven development

Quiz (5 minutes)

  • What is the difference between induction and deduction?
  • What is domain knowledge?
  • What is the most common way to test supervised learning?

Discussion (20 minutes)

  • The high-interest credit card debt of machine learning
  • Why machine learning isn’t a silver bullet

Break (10 minutes)

Distance-based methods: k-Nearest Neighbors (1 hour)

  • Presentation (20 minutes)
  • How to calculate a house value (classification versus regression)
  • Calculating a value based on relevancy or closeness
  • What is distance? (triangle inequality)
  • The k-Nearest Neighbors algorithm in a nutshell
  • Trade-off: Curse of dimensionality
  • Quiz (5 minutes)
  • What is an xample of a distance metric?
  • What is the curse of dimensionality and how does it relate to distance?
  • Why would you use Euclidean distance versus Manhattan distance?
  • Demo (10 minutes)
  • The problem of housing data using k-NN
  • Lab (20 minutes)
  • Implementing a k-NN classifier of housing data, using regression, with a downloadable examplee

Break (10 minutes)

Probabilistic methods: naive Bayes classifier (1 hour)

  • Presentation (20 minutes)
  • Likelihood estimate of spammy emails, based on keywords
  • How to exploit posterior distributions
  • Bayes’ theorem and inverse conditional probability
  • How to test ROC curves
  • Confusion matrices
  • Quiz (5 minutes)
  • What is the probability of X given A?
  • What happens if posterior distributions don’t tell us much?
  • Why is the naive Bayes classifier called “naive”?
  • Demo (10 minutes)
  • Introducing some data points, guiding principles, and what to expect
  • Lab (20 minutes)
  • Implementing a naive Bayes classifier using sci-kit learn

Conclusion and wrap-up (10 minutes)