O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Monitoring with Prometheus

Book Description

Learn how to implement metrics-centric monitoring with Prometheus. This introductory book teaches you how to use Prometheus to monitor hosts, applications, and services. We cover installation, basic monitoring, service discovery, alerting, log monitoring, scaling, and visualization. Includes introducing you to monitoring basics, methodologies and approaches. Learn how to monitor in a metric-centric world including building dynamic thresholds, basic anomaly detection and monitoring aggregation and federation. We'll look at how to apply modern patterns like Google's Four Golden Signals, the USE method, and the RED method. We cover monitoring Kubernetes, Docker containers, databases, and we look at instrumenting applications and integrating logging. We focus on the particular challenges of monitoring highly dynamic, transitory environments and new architectures like microservices. We focus on monitoring in the Cloud, including looking at service discovery and monitoring for Cloud platforms.

Table of Contents

  1. Monitoring With Prometheus
    1. 0.1 Who is this book for?
    2. 0.2 Credits and Acknowledgments
    3. 0.3 Technical Reviewers
      1. 0.3.1 Jamie Wilkinson
      2. 0.3.2 Paul Gier
    4. 0.4 Editor
    5. 0.5 Author
    6. 0.6 Conventions in the book
    7. 0.7 Code and Examples
    8. 0.8 Colophon
    9. 0.9 Errata
    10. 0.10 Disclaimer
    11. 0.11 Copyright
    12. 0.12 Version
  2. 1 Introduction
    1. 1.1 What is monitoring?
      1. 1.1.1 Technology as a customer
      2. 1.1.2 The business as a customer
    2. 1.2 Monitoring fundamentals
      1. 1.2.1 Monitoring as afterthought
      2. 1.2.2 Monitoring by rote
      3. 1.2.3 Not monitoring for correctness
      4. 1.2.4 Monitoring statically
      5. 1.2.5 Not monitoring frequently enough
      6. 1.2.6 No automation or self-service
      7. 1.2.7 Good monitoring summary
    3. 1.3 Monitoring mechanics
      1. 1.3.1 Probing and introspection
      2. 1.3.2 Pull versus push
      3. 1.3.3 Types of monitoring data
    4. 1.4 Metrics
      1. 1.4.1 So what’s a metric? So what’s a metric?
      2. 1.4.2 Types of metrics
      3. 1.4.3 Metric summaries
      4. 1.4.4 Metric aggregation
    5. 1.5 Monitoring methodologies
      1. 1.5.1 The USE Method
      2. 1.5.2 The Google Four Golden Signals
    6. 1.6 Contextual, useful alerts and notifications
    7. 1.7 Visualization
    8. 1.8 But didn’t you write that other book? But didn’t you write that other book?
    9. 1.9 What’s in the book? What’s in the book?
    10. 1.10 Summary
  3. 2 Introduction to Prometheus
    1. 2.1 The Prometheus backstory
    2. 2.2 Prometheus architecture
      1. 2.2.1 Metric collection
      2. 2.2.2 Service discovery
      3. 2.2.3 Aggregation and alerting
      4. 2.2.4 Querying data
      5. 2.2.5 Autonomy
      6. 2.2.6 Redundancy and high availability
      7. 2.2.7 Visualization
    3. 2.3 The Prometheus data model
      1. 2.3.1 Metric names
      2. 2.3.2 Labels
      3. 2.3.3 Samples
      4. 2.3.4 Notation
      5. 2.3.5 Metrics retention
    4. 2.4 Security model
    5. 2.5 Prometheus ecosystem
    6. 2.6 Useful Prometheus links
    7. 2.7 Summary
  4. 3 Installation and Getting Started
    1. 3.1 Installing Prometheus
      1. 3.1.1 Installing Prometheus on Linux
      2. 3.1.2 Installing Prometheus on Microsoft Windows
      3. 3.1.3 Alternative Microsoft Windows installation
      4. 3.1.4 Alternative Mac OS X installation
      5. 3.1.5 Stacks
      6. 3.1.6 Installing via configuration management
      7. 3.1.7 Deploying via Kubernetes
    2. 3.2 Configuring Prometheus
      1. 3.2.1 Global
      2. 3.2.2 Alerting
      3. 3.2.3 Rule files
      4. 3.2.4 Scrape configuration
    3. 3.3 Starting the server
      1. 3.3.1 Running Prometheus via Docker
    4. 3.4 First metrics
    5. 3.5 Prometheus expression browser
    6. 3.6 Time series aggregation
    7. 3.7 Capacity planning
      1. 3.7.1 Memory
      2. 3.7.2 Disk
    8. 3.8 Summary
  5. 4 Monitoring Nodes and Containers
    1. 4.1 Monitoring nodes
      1. 4.1.1 Installing the Node Exporter
      2. 4.1.2 Configuring the Node Exporter
      3. 4.1.3 Configuring the Textfile collector
      4. 4.1.4 Enabling the systemd collector
      5. 4.1.5 Running the Node Exporter
      6. 4.1.6 Scraping the Node Exporter
      7. 4.1.7 Filtering collectors on the server
    2. 4.2 Monitoring Docker
      1. 4.2.1 Running cAdvisor
      2. 4.2.2 Scraping cAdvisor
    3. 4.3 Scrape lifecycle
    4. 4.4 Labels
      1. 4.4.1 Label taxonomies
      2. 4.4.2 Relabelling
    5. 4.5 The Node Exporter and cAdvisor metrics
      1. 4.5.1 The trinity and the USE method
      2. 4.5.2 Service status
      3. 4.5.3 Availability and the up metric
      4. 4.5.4 The metadata metric
    6. 4.6 Query permanence
      1. 4.6.1 Recording rules
      2. 4.6.2 Configuring recording rules
      3. 4.6.3 Adding recording rules
    7. 4.7 Visualization
      1. 4.7.1 Installing Grafana
      2. 4.7.2 Starting and configuring Grafana
      3. 4.7.3 Configuring the Grafana web interface
      4. 4.7.4 First dashboard
    8. 4.8 Summary
  6. 5 Service Discovery
    1. 5.1 Scrape lifecycle and static configuration redux
    2. 5.2 File-based discovery
      1. 5.2.1 Writing files for file discovery
    3. 5.3 Inbuilt service discovery plugins
      1. 5.3.1 Amazon EC2 service discovery plugin
    4. 5.4 DNS service discovery
    5. 5.5 Summary
  7. 6 Alerting and Alertmanager
    1. 6.1 Alerting
    2. 6.2 How the Alertmanager works
    3. 6.3 Installing Alertmanager
      1. 6.3.1 Installing Alertmanager on Linux
      2. 6.3.2 Installing Alertmanager on Microsoft Windows
      3. 6.3.3 Stacks
      4. 6.3.4 Installing via configuration management
    4. 6.4 Configuring the Alertmanager
    5. 6.5 Running Alertmanager
    6. 6.6 Configuring Prometheus for Alertmanager
      1. 6.6.1 Alertmanager service discovery
      2. 6.6.2 Monitoring Alertmanager
    7. 6.7 Adding alerting rules
      1. 6.7.1 Adding our first alerting rule
      2. 6.7.2 What happens when an alert fires?
      3. 6.7.3 The alert at the Alertmanager
      4. 6.7.4 Adding new alerts and templates
    8. 6.8 Routing
      1. 6.8.1 Routes
    9. 6.9 Receivers and notification templates
    10. 6.10 Silences and maintenance
      1. 6.10.1 Controlling silences via the Alertmanager
      2. 6.10.2 Controlling silences via amtool
    11. 6.11 Summary
  8. 7 Scaling and Reliability
    1. 7.1 Reliability and fault tolerance
      1. 7.1.1 Duplicate Prometheus servers
      2. 7.1.2 Setting up Alertmanager clustering
      3. 7.1.3 Configuring Prometheus for an Alertmanager cluster
    2. 7.2 Scaling
      1. 7.2.1 Functional scaling
      2. 7.2.2 Horizontal shards
    3. 7.3 Remote storage
    4. 7.4 Third-party tools
    5. 7.5 Summary
  9. 8 Instrumenting Applications
    1. 8.1 An application monitoring primer
      1. 8.1.1 Where should I instrument?
      2. 8.1.2 Instrument taxonomies
    2. 8.2 Metrics
      1. 8.2.1 Application metrics
      2. 8.2.2 Business metrics
      3. 8.2.3 Where to put your metrics
      4. 8.2.4 The utility pattern
      5. 8.2.5 The external pattern
      6. 8.2.6 Building metrics into a sample application
    3. 8.3 Summary
  10. 9 Logging as Instrumentation
    1. 9.1 Processing logs for metrics
    2. 9.2 Introducing mtail
      1. 9.2.1 Installing mtail
      2. 9.2.2 Using mtail
      3. 9.2.3 Running mtail
    3. 9.3 Processing web server access logs
    4. 9.4 Parsing Rails logs into a histogram
    5. 9.5 Deploying mtail
    6. 9.6 Scraping our mtail endpoint
    7. 9.7 Summary
  11. 10 Probing
    1. 10.1 Probing architecture
    2. 10.2 The blackbox exporter
    3. 10.3 Installing the exporter
      1. 10.3.1 Installing the exporter on Linux
      2. 10.3.2 Installing the exporter on Microsoft Windows
      3. 10.3.3 Installing via configuration management
    4. 10.4 Configuring the exporter
      1. 10.4.1 HTTP check
      2. 10.4.2 ICMP check
      3. 10.4.3 DNS check
    5. 10.5 Starting the exporter
    6. 10.6 Creating the Prometheus job
    7. 10.7 Summary
  12. 11 Pushing Metrics and the Pushgateway
    1. 11.1 The Pushgateway
      1. 11.1.1 When not to use the Pushgateway
      2. 11.1.2 Installing the Pushgateway
      3. 11.1.3 Installing the Pushgateway on Linux
      4. 11.1.4 Installing the Pushgateway on Microsoft Windows
      5. 11.1.5 Installing via configuration management
      6. 11.1.6 Configuring and running the Pushgateway
      7. 11.1.7 Sending metrics to the Pushgateway
      8. 11.1.8 Viewing metrics on the Pushgateway
      9. 11.1.9 Deleting metrics in the Pushgateway
      10. 11.1.10 Sending metrics from a client
    2. 11.2 Summary
  13. 12 Monitoring a Stack - Kubernetes
    1. 12.1 Our Kubernetes cluster
    2. 12.2 Running Prometheus on Kubernetes
    3. 12.3 Monitoring Kubernetes
    4. 12.4 Monitoring our Kubernetes nodes
      1. 12.4.1 Node Exporter DaemonSet
      2. 12.4.2 Node Exporter service
      3. 12.4.3 Deploying the Node Exporter
      4. 12.4.4 The Node Exporter job
      5. 12.4.5 Node Explorer rules
    5. 12.5 Kubernetes
      1. 12.5.1 Kube-state-metrics
      2. 12.5.2 Kube API
      3. 12.5.3 CAdvisor and Nodes
    6. 12.6 Summary
  14. 13 Monitoring a Stack - Tornado
    1. 13.1 Sidecar pattern
    2. 13.2 MySQL
      1. 13.2.1 MySQL Prometheus configuration
    3. 13.3 Redis
      1. 13.3.1 Redis Prometheus configuration
    4. 13.4 Tornado
      1. 13.4.1 Adding the Clojure wrapper
      2. 13.4.2 Adding a registry
      3. 13.4.3 Adding metrics
      4. 13.4.4 Exporting the metrics
      5. 13.4.5 Tornado Prometheus configuration
    5. 13.5 Summary