O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Practical Monitoring

Book Description

Do you have a nagging feeling that your monitoring could be improved, but you just aren’t sure how? This is the book for you. Practical Monitoring explains what makes your monitoring less than stellar, and provides a practical approach to designing and implementing a monitoring strategy, from the application down to the hardware in the datacenter and everything in between.

Table of Contents

  1. Preface
    1. Who Should Read This Book
    2. Why I Wrote This Book
    3. A Word on Monitoring Today
    4. Navigating This Book
    5. Online Resources
    6. Conventions Used in This Book
    7. Using Code Examples
    8. O’Reilly Safari
    9. How to Contact Us
    10. Acknowledgments
  2. I. Monitoring Principles
  3. 1. Monitoring Anti-Patterns
    1. Anti-Pattern #1: Tool Obsession
      1. Monitoring Is Multiple Complex Problems Under One Name
      2. Avoid Cargo-Culting Tools
      3. Sometimes, You Really Do Have to Build It
      4. The Single Pane of Glass Is a Myth
    2. Anti-Pattern #2: Monitoring-as-a-Job
    3. Anti-Pattern #3: Checkbox Monitoring
      1. What Does “Working” Actually Mean? Monitor That.
      2. OS Metrics Aren’t Very Useful—for Alerting
      3. Collect Your Metrics More Often
    4. Anti-Pattern #4: Using Monitoring as a Crutch
    5. Anti-Pattern #5: Manual Configuration
    6. Wrap-Up
  4. 2. Monitoring Design Patterns
    1. Pattern #1: Composable Monitoring
      1. The Components of a Monitoring Service
    2. Pattern #2: Monitor from the User Perspective
    3. Pattern #3: Buy, Not Build
      1. It’s Cheaper
      2. You’re (Probably) Not an Expert at Architecting These Tools
      3. SaaS Allows You to Focus on the Company’s Product
      4. No, Really, SaaS Is Actually Better
    4. Pattern #4: Continual Improvement
    5. Wrap-Up
  5. 3. Alerts, On-Call, and Incident Management
    1. What Makes a Good Alert?
      1. Stop Using Email for Alerts
      2. Write Runbooks
      3. Arbitrary Static Thresholds Aren’t the Only Way
      4. Delete and Tune Alerts
      5. Use Maintenance Periods
      6. Attempt Automated Self-Healing First
    2. On-Call
      1. Fixing False Alarms
      2. Cutting Down on Needless Firefighting
      3. Building a Better On-Call Rotation
    3. Incident Management
    4. Postmortems
    5. Wrap-Up
  6. 4. Statistics Primer
    1. Before Statistics in Systems Operations
    2. Math to the Rescue!
    3. Statistics Isn’t Magic
    4. Mean and Average
    5. Median
    6. Seasonality
    7. Quantiles
    8. Standard Deviation
    9. Wrap-Up
  7. II. Monitoring Tactics
  8. 5. Monitoring the Business
    1. Business KPIs
    2. Two Real-World Examples
      1. Yelp
      2. Reddit
    3. Tying Business KPIs to Technical Metrics
    4. My App Doesn’t Have Those Metrics!
    5. Finding Your Company’s Business KPIs
    6. Wrap-Up
  9. 6. Frontend Monitoring
    1. The Cost of a Slow App
    2. Two Approaches to Frontend Monitoring
    3. Document Object Model (DOM)
      1. Frontend Performance Metrics
      2. OK, That’s Great, but How Do I Use This?
    4. Logging
    5. Synthetic Monitoring
    6. Wrap-Up
  10. 7. Application Monitoring
    1. Instrumenting Your Apps with Metrics
      1. How It Works Under the Hood
    2. Monitoring Build and Release Pipelines
    3. Health Endpoint Pattern
    4. Application Logging
      1. Wait a Minute…Should I Have a Metric or a Log Entry?
      2. What Should I Be Logging?
      3. Write to Disk or Write to Network?
    5. Serverless / Function-as-a-Service
    6. Monitoring Microservice Architectures
    7. Wrap-Up
  11. 8. Server Monitoring
    1. Standard OS Metrics
      1. CPU
      2. Memory
      3. Network
      4. Disk
      5. Load
    2. SSL Certificates
    3. SNMP
    4. Web Servers
    5. Database Servers
    6. Load Balancers
    7. Message Queues
    8. Caching
    9. DNS
    10. NTP
    11. Miscellaneous Corporate Infrastructure
      1. DHCP
      2. SMTP
    12. Monitoring Scheduled Jobs
    13. Logging
      1. Collection
      2. Storage
      3. Analysis
    14. Wrap-Up
  12. 9. Network Monitoring
    1. The Pains of SNMP
      1. What Is SNMP?
      2. How Does It Work?
      3. A Word on Security
      4. How Do I Use SNMP?
      5. Interface Metrics
      6. Interface and Logging
      7. Recap
    2. Configuration Tracking
    3. Voice and Video
    4. Routing
    5. Spanning Tree Protocol (STP)
    6. Chassis
      1. CPU and Memory
      2. Hardware
    7. Flow Monitoring
    8. Capacity Planning
      1. Working Backward
      2. Forecasting
    9. Wrap-up
  13. 10. Security Monitoring
    1. Monitoring and Compliance
    2. User, Command, and Filesystem Auditing
      1. Setting Up auditd
      2. auditd and Remote Logs
    3. Host Intrusion Detection System (HIDS)
    4. rkhunter
    5. Network Intrusion Detection System (NIDS)
    6. Wrap-Up
  14. 11. Conducting a Monitoring Assessment
    1. Business KPIs
    2. Frontend Monitoring
    3. Application and Server Monitoring
    4. Security Monitoring
    5. Alerting
    6. Wrap-Up
  15. A. An Example Runbook: Demo App
    1. Demo App
    2. Metadata
    3. Escalation Procedure
    4. External Dependencies
    5. Internal Dependencies
    6. Tech Stack
    7. Metrics and Logs
    8. Alerts
  16. B. Availability Chart
  17. Index