Chapter 4

Performing Exploratory Security Data Analysis

“Sometimes, bad is bad.”

Huey Lewis and the News, Sports, Chrysalis Records, 1983

What constitutes “security data” is often in the eye of the beholder. Malware analysts gravitate toward process, memory and system binary dumps. Vulnerability researchers dissect new patch releases, and network security professionals tap wired and wireless networks to see what secrets can be sifted from the packets as they make their way from node to node.

This chapter focuses on exploring IP addresses by starting with further analyses on the AlienVault IP Reputation database first seen in Chapter 3. You'll examine aspects of the ZeuS botnet (a fairly nasty bit of malware) from an IP address perspective and then perform some basic analyses on real firewall data. To fully understand the examples in this chapter, you should be familiar with the description of the AlienVault data set and have at least followed along with all previous, preliminary analyses. The other major goal of the chapter is to help you get more proficient in R by walking you through a diversity of examples that bring into play many core programming idioms of the language.

IP addresses—along with domain names and routing concepts—are the building blocks of the Internet. They are defined in RFC 791, the “Internet Protocol / DARPA Internet Program / Protocol Specification” (http://tools.ietf.org/html/rfc791), which has an elegant and succinct way of describing them:

A name indicates ...

Get Data-Driven Security: Analysis, Visualization and Dashboards now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.