Unsupervised Machine Learning in Security Applications

by Charles Givre

Released April 2018

Publisher(s): O'Reilly Media, Inc.

ISBN: 9781492032366

Start your free trial

Video description

What is this video course about, and why is it important?

Today, arguably the most important field in the IT industry is security. With more and more commerce and business being conducted online every day, keeping data safe by detecting and repelling attacks is paramount to every organization. One area that shows great potential in the battle against hackers and their exploits is machine learning. Unleashing the increasing power and finesse of these systems toward defeating intrusions and data theft is no longer a theoretical pursuit. Indeed, machine learning is being used to defend systems and networks across an increasing range of industries and enterprises, so it’s no mystery that there’s also an increasing demand for skilled and qualified security specialists who can apply data science techniques to the task of data security.

This video course introduces you to the concept “unsupervised” model training, or learning, in a security context. Your host, cyber security specialist and data scientist Charles Givre, explains the theory behind commonly used clustering algorithms such as K-means and DBSCAN as well as the direct application to security problems such as anomaly detection. You’ll see how to pipeline your models into a production environment using the Python scikit-learning library. You’ll also learn how to calculate metrics to assess your models’ performance, and how to use Yellowbrick to create visualizations of those performance evaluations.

This video course is one in a set of three individual ones intended for security professionals who want to learn how to use and apply data science to their toughest security problems. Mr. Givre focuses on the tools and techniques that are directly applicable to the industry, and uses security problems and datasets to walk you through the entire data science process from end-to-end.

What you’ll learn—and how you can apply it

The mechanics of several commonly used clustering algorithms such as K-means and DBSCAN
How to reduce the dimensions of large datasets using Principal Component Analysis
Understanding how to evaluate the performance of unsupervised techniques when possible
Using Yellowbrick to produce visualizations of performance evaluations
Applying unsupervised learning techniques to security problems such as detecting anomalies

This video course is for you because…

You’re a security professional with some scripting skills and you want to apply data science techniques to your work to analyze data more efficiently
You’re a network analyst with some scripting skills and you want to use machine learning techniques to better secure your network

Prerequisites:

You should have beginner- to intermediate-level experience with the Python programming language
You should be familiar with security and networking concepts
You should be generally familiar with basic statistical concepts

Materials or downloads needed in advance:

Students are encouraged to use the Griffon Virtual Machine for Data Science, which is available at https://github.com/gtkcyber/griffon-vm. (Griffon is a virtual machine with all data sources and all tools preconfigured)
Students should have access to a computer with at least 8 GB of RAM and 20 to 30 GB of hard drive space