O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Unsupervised Machine Learning in Security Applications

Video Description

What is this video course about, and why is it important?

Today, arguably the most important field in the IT industry is security. With more and more commerce and business being conducted online every day, keeping data safe by detecting and repelling attacks is paramount to every organization. One area that shows great potential in the battle against hackers and their exploits is machine learning. Unleashing the increasing power and finesse of these systems toward defeating intrusions and data theft is no longer a theoretical pursuit. Indeed, machine learning is being used to defend systems and networks across an increasing range of industries and enterprises, so it’s no mystery that there’s also an increasing demand for skilled and qualified security specialists who can apply data science techniques to the task of data security.

This video course introduces you to the concept “unsupervised” model training, or learning, in a security context. Your host, cyber security specialist and data scientist Charles Givre, explains the theory behind commonly used clustering algorithms such as K-means and DBSCAN as well as the direct application to security problems such as anomaly detection. You’ll see how to pipeline your models into a production environment using the Python scikit-learning library. You’ll also learn how to calculate metrics to assess your models’ performance, and how to use Yellowbrick to create visualizations of those performance evaluations.

This video course is one in a set of three individual ones intended for security professionals who want to learn how to use and apply data science to their toughest security problems. Mr. Givre focuses on the tools and techniques that are directly applicable to the industry, and uses security problems and datasets to walk you through the entire data science process from end-to-end.

What you’ll learn—and how you can apply it

  • The mechanics of several commonly used clustering algorithms such as K-means and DBSCAN
  • How to reduce the dimensions of large datasets using Principal Component Analysis
  • Understanding how to evaluate the performance of unsupervised techniques when possible
  • Using Yellowbrick to produce visualizations of performance evaluations
  • Applying unsupervised learning techniques to security problems such as detecting anomalies

This video course is for you because…

  • You’re a security professional with some scripting skills and you want to apply data science techniques to your work to analyze data more efficiently
  • You’re a network analyst with some scripting skills and you want to use machine learning techniques to better secure your network


  • You should have beginner- to intermediate-level experience with the Python programming language
  • You should be familiar with security and networking concepts
  • You should be generally familiar with basic statistical concepts

Materials or downloads needed in advance:

  • Students are encouraged to use the Griffon Virtual Machine for Data Science, which is available at https://github.com/gtkcyber/griffon-vm. (Griffon is a virtual machine with all data sources and all tools preconfigured)
  • Students should have access to a computer with at least 8 GB of RAM and 20 to 30 GB of hard drive space