O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Preparing and Exploring Security Data for Machine Learning

Video Description

What is this video course about, and why is it important?

Today, arguably the most important field in the IT industry is security. With more and more commerce and business being conducted online every day, keeping data safe and detecting and repelling attacks is paramount to every organization. One area that shows great potential in the battle against hackers and their exploits is machine learning. Unleashing the increasing power and finesse of these systems toward defeating intrusions and data theft is no longer a theoretical pursuit. Indeed, machine learning is being used to defend systems and networks across an increasing range of industries and enterprises, so it’s no mystery that there’s also an increasing demand for skilled and qualified security specialists who can apply data science techniques to the task of data security.

If you’re a security engineer, network analyst, or anyone else charged with protecting your organization's valuable IT system and data, this video will show you how to quickly and efficiently ingest a variety of data types typically used in security settings and prepare them for analysis in the Python data science ecosystem. Your host, cyber security specialist and data scientist Charles Givre, teaches the concepts behind vectorized computing as it applies specifically to security. Gathering and preparing data is one of the biggest challenges facing anyone who is seeking to do advanced analysis and machine learning. This video will help you learn how to use the Pandas ecosystem to quickly and effectively gather, prepare, and explore security data for advanced analysis and machine learning.

This video is one in a set of three, intended for security professionals who want to learn how to use and apply data science to their toughest security problems. It focuses on the tools and techniques that are directly applicable to the industry, and uses security problems and datasets to walk you through the entire data science process from end-to-end.

What you’ll learn—and how you can apply it

  • How to use Pandas for security data preparation
  • How to ingest, manipulate, and summarize multidimensional data
  • How to quickly extract, transform, and load (ETL) security data from a variety of sources into the Pandas ecosystem, extract features, and prepare the data for machine learning

This video course is for you because…

  • You're a security professional with some scripting skills and you want to apply data science techniques to your work to analyze data more efficiently
  • You're a network analyst with some scripting skills and you want to use machine learning techniques to better secure your network


  • You should have beginner- to intermediate-level experience with the Python programming language
  • You should be familiar with security and networking concepts
  • You should be generally familiar with basic statistical concepts

Materials or downloads needed in advance:

  • You're encouraged to use the Griffon Virtual Machine for Data Science, which is available at https://github.com/gtkcyber/griffon-vm. (Griffon is a virtual machine with all data sources and all tools preconfigured)
  • You should have access to a computer with at least 8 GB RAM and 20 to 30 GB of available hard drive space