Chapter 2

Building Your Analytics Toolbox: A Primer on Using R and Python for Security Analysis

“If you add a little to a little and do this often, soon the little will become great.”

Hesiod

Before you jump right into the various use cases in the book, it's important to ensure you at least have a basic familiarity with the two most prominent languages featured in nearly all of the scenarios: Python (www.python.org/) and R (www.r-project.org/). Although there are an abundance of tools available for data analysis, we feel these two provide virtually all the features necessary to help you go from data to discovery with the least amount impedance.

A sub-theme throughout the book, and the distilled process at the heart of security data science, is idea, exploration, trial (and error) and iteration. It is ineffective at best to attempt to shoehorn this process into the edit/compile/run workflow found in most traditional languages and development environments. The acts of performing data analyses and creating informative visualizations are highly interactive and iterative endeavors. Despite all of their positive features, even standalone Python and R do not truly enable rich, dynamic interaction with code and data. However, when they are coupled with IPython (http://ipython.org/) and RStudio (www.rstudio.com/), respectively, they are transformed into powerful exploration tools, enabling rapid development and testing of everything from gnarly data munging to generating sophisticated visualizations. ...

Get Data-Driven Security: Analysis, Visualization and Dashboards now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.