Cover image for Network Security Through Data Analysis

Book description

In this practical guide, security researcher Michael Collins shows you several techniques and tools for collecting and analyzing network traffic datasets. You’ll understand how your network is used, and what actions are necessary to protect and improve it. Divided into three sections, this book examines the process of collecting and organizing data, various tools for analysis, and several different analytic scenarios and techniques.

Table of Contents

  1. Preface
    1. Audience
    2. Contents of This Book
    3. Conventions Used in This Book
    4. Using Code Examples
    5. Safari® Books Online
    6. How to Contact Us
    7. Acknowledgements
  2. I. Data
    1. 1. Sensors and Detectors: An Introduction
      1. Vantages: How Sensor Placement Affects Data Collection
      2. Domains: Determining Data That Can Be Collected
      3. Actions: What a Sensor Does with Data
      4. Conclusion
    2. 2. Network Sensors
      1. Network Layering and Its Impact on Instrumentation
        1. Network Layers and Vantage
        2. Network Layers and Addressing
      2. Packet Data
        1. Packet and Frame Formats
        2. Rolling Buffers
        3. Limiting the Data Captured from Each Packet
        4. Filtering Specific Types of Packets
        5. What If It’s Not Ethernet?
      3. NetFlow
        1. NetFlow v5 Formats and Fields
          1. “Flow and Stuff:” NetFlow v9 and IPFIX
        2. NetFlow Generation and Collection
      4. Further Reading
    3. 3. Host and Service Sensors: Logging Traffic at the Source
      1. Accessing and Manipulating Logfiles
      2. The Contents of Logfiles
        1. The Characteristics of a Good Log Message
        2. Existing Logfiles and How to Manipulate Them
      3. Representative Logfile Formats
        1. HTTP: CLF and ELF
        2. SMTP
        3. Microsoft Exchange: Message Tracking Logs
      4. Logfile Transport: Transfers, Syslog, and Message Queues
        1. Transfer and Logfile Rotation
        2. Syslog
      5. Further Reading
    4. 4. Data Storage for Analysis: Relational Databases, Big Data, and Other Options
      1. Log Data and the CRUD Paradigm
        1. Creating a Well-Organized Flat File System: Lessons from SiLK
      2. A Brief Introduction to NoSQL Systems
      3. What Storage Approach to Use
        1. Storage Hierarchy, Query Times, and Aging
  3. II. Tools
    1. 5. The SiLK Suite
      1. What Is SiLK and How Does It Work?
      2. Acquiring and Installing SiLK
        1. The Datafiles
      3. Choosing and Formatting Output Field Manipulation: rwcut
      4. Basic Field Manipulation: rwfilter
        1. Ports and Protocols
        2. Size
        3. IP Addresses
        4. Time
        5. TCP Options
        6. Helper Options
        7. Miscellaneous Filtering Options and Some Hacks
      5. rwfileinfo and Provenance
      6. Combining Information Flows: rwcount
      7. rwset and IP Sets
      8. rwuniq
      9. rwbag
      10. Advanced SiLK Facilities
        1. pmaps
      11. Collecting SiLK Data
        1. YAF
        2. rwptoflow
        3. rwtuc
      12. Further Reading
    2. 6. An Introduction to R for Security Analysts
      1. Installation and Setup
      2. Basics of the Language
        1. The R Prompt
        2. R Variables
        3. Writing Functions
        4. Conditionals and Iteration
      3. Using the R Workspace
      4. Data Frames
      5. Visualization
        1. Visualization Commands
        2. Parameters to Visualization
        3. Annotating a Visualization
        4. Exporting Visualization
      6. Analysis: Statistical Hypothesis Testing
        1. Hypothesis Testing
        2. Testing Data
      7. Further Reading
    3. 7. Classification and Event Tools: IDS, AV, and SEM
      1. How an IDS Works
        1. Basic Vocabulary
        2. Classifier Failure Rates: Understanding the Base-Rate Fallacy
        3. Applying Classification
      2. Improving IDS Performance
        1. Enhancing IDS Detection
        2. Enhancing IDS Response
        3. Prefetching Data
      3. Further Reading
    4. 8. Reference and Lookup: Tools for Figuring Out Who Someone Is
      1. MAC and Hardware Addresses
      2. IP Addressing
        1. IPv4 Addresses, Their Structure, and Significant Addresses
        2. IPv6 Addresses, Their Structure and Significant Addresses
        3. Checking Connectivity: Using ping to Connect to an Address
        4. Tracerouting
        5. IP Intelligence: Geolocation and Demographics
      3. DNS
        1. DNS Name Structure
        2. Forward DNS Querying Using dig
        3. The DNS Reverse Lookup
        4. Using whois to Find Ownership
      4. Additional Reference Tools
        1. DNSBLs
    5. 9. More Tools
      1. Visualization
        1. Graphviz
      2. Communications and Probing
        1. netcat
        2. nmap
        3. Scapy
      3. Packet Inspection and Reference
        1. Wireshark
        2. GeoIP
        3. The NVD, Malware Sites, and the C*Es
        4. Search Engines, Mailing Lists, and People
      4. Further Reading
  4. III. Analytics
    1. 10. Exploratory Data Analysis and Visualization
      1. The Goal of EDA: Applying Analysis
      2. EDA Workflow
      3. Variables and Visualization
      4. Univariate Visualization: Histograms, QQ Plots, Boxplots, and Rank Plots
        1. Histograms
        2. Bar Plots (Not Pie Charts)
        3. The Quantile-Quantile (QQ) Plot
        4. The Five-Number Summary and the Boxplot
        5. Generating a Boxplot
      5. Bivariate Description
        1. Scatterplots
        2. Contingency Tables
      6. Multivariate Visualization
        1. Operationalizing Security Visualization
          1. Rule one: bound and partition your visualization to manage disruptions
          2. Rule two: label anomalies
          3. Rule three: use trendlines, distinguish artifacts from observations
          4. Rule four: be consistent across plots
          5. Rule five: annotate with contextual information
          6. Rule six: avoid flash in favor of expressiveness
          7. Rule seven: when performing long jobs, give the user some status feedback
      7. Further Reading
    2. 11. On Fumbling
      1. Attack Models
      2. Fumbling: Misconfiguration, Automation, and Scanning
        1. Lookup Failures
        2. Automation
        3. Scanning
      3. Identifying Fumbling
        1. TCP Fumbling: The State Machine
          1. Network maps
          2. Unidirectional flow filtering
        2. ICMP Messages and Fumbling
        3. Identifying UDP Fumbling
      4. Fumbling at the Service Level
        1. HTTP Fumbling
        2. SMTP Fumbling
      5. Analyzing Fumbling
        1. Building Fumbling Alarms
        2. Forensic Analysis of Fumbling
        3. Engineering a Network to Take Advantage of Fumbling
      6. Further Reading
    3. 12. Volume and Time Analysis
      1. The Workday and Its Impact on Network Traffic Volume
      2. Beaconing
      3. File Transfers/Raiding
      4. Locality
        1. DDoS, Flash Crowds, and Resource Exhaustion
        2. DDoS and Routing Infrastructure
      5. Applying Volume and Locality Analysis
        1. Data Selection
        2. Using Volume as an Alarm
        3. Using Beaconing as an Alarm
        4. Using Locality as an Alarm
        5. Engineering Solutions
      6. Further Reading
    4. 13. Graph Analysis
      1. Graph Attributes: What Is a Graph?
      2. Labeling, Weight, and Paths
      3. Components and Connectivity
      4. Clustering Coefficient
      5. Analyzing Graphs
        1. Using Component Analysis as an Alarm
        2. Using Centrality Analysis for Forensics
        3. Using Breadth-First Searches Forensically
        4. Using Centrality Analysis for Engineering
      6. Further Reading
    5. 14. Application Identification
      1. Mechanisms for Application Identification
        1. Port Number
        2. Application Identification by Banner Grabbing
        3. Application Identification by Behavior
        4. Application Identification by Subsidiary Site
      2. Application Banners: Identifying and Classifying
        1. Non-Web Banners
        2. Web Client Banners: The User-Agent String
      3. Further Reading
    6. 15. Network Mapping
      1. Creating an Initial Network Inventory and Map
        1. Creating an Inventory: Data, Coverage, and Files
        2. Phase I: The First Three Questions
          1. The Default Network
        3. Phase II: Examining the IP Space
          1. Identifying Asymmetric Traffic
          2. Identifying Dark Space
          3. Finding Network Appliances
        4. Phase III: Identifying Blind and Confusing Traffic
          1. Identifying NATs
          2. Identifying Proxies
          3. Identifying VPN Traffic
        5. Phase IV: Identifying Clients and Servers
          1. Identifying Servers
        6. Identifying Sensing and Blocking Infrastructure
      2. Updating the Inventory: Toward Continuous Audit
      3. Further Reading
  5. Index
  6. Colophon
  7. Copyright