O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Prometheus: Up & Running

Book Description

Get up to speed with Prometheus, the metrics-based monitoring system used by tens of thousands of organizations in production. This practical guide provides application developers, sysadmins, and DevOps practitioners with a hands-on introduction to the most important aspects of Prometheus, including infrastructure and application monitoring, dashboarding and alerting, direct code instrumentation, and metric collection from third-party systems with exporters.

This open source system has gained popularity over the past few years for good reason. With its simple yet powerful data model and query language, Prometheus does one thing, and it does it well. Author and Prometheus developer Brian Brazil guides you through Prometheus setup, the Node exporter, and the Alertmanager, then demonstrates how to use them for application and infrastructure monitoring.

  • Know where and how much to apply instrumentation to your application code
  • Identify metrics with labels using unique key-value pairs
  • Get an introduction to Grafana, a popular tool for building dashboards
  • Learn how to use the Node Exporter to monitor your infrastructure
  • Use service discovery to provide different views of your machines and services
  • Use Prometheus with Kubernetes and examine exporters you can use with containers
  • Convert data from other monitoring systems into the Prometheus format

Table of Contents

  1. Preface
    1. Expanding the Known
    2. Conventions Used in This Book
    3. Using Code Examples
    4. O’Reilly Safari
    5. How to Contact Us
    6. Acknowledgments
  2. I. Introduction
  3. 1. What Is Prometheus?
    1. What Is Monitoring?
      1. A Brief and Incomplete History of Monitoring
      2. Categories of Monitoring
    2. Prometheus Architecture
      1. Client Libraries
      2. Exporters
      3. Service Discovery
      4. Scraping
      5. Storage
      6. Dashboards
      7. Recording Rules and Alerts
      8. Alert Management
      9. Long-Term Storage
    3. What Prometheus Is Not
  4. 2. Getting Started with Prometheus
    1. Running Prometheus
    2. Using the Expression Browser
    3. Running the Node Exporter
    4. Alerting
  5. II. Application Monitoring
  6. 3. Instrumentation
    1. A Simple Program
    2. The Counter
      1. Counting Exceptions
      2. Counting Size
    3. The Gauge
      1. Using Gauges
      2. Callbacks
    4. The Summary
    5. The Histogram
      1. Buckets
    6. Unit Testing Instrumentation
    7. Approaching Instrumentation
      1. What Should I Instrument?
      2. How Much Should I Instrument?
      3. What Should I Name My Metrics?
  7. 4. Exposition
    1. Python
      1. WSGI
      2. Twisted
      3. Multiprocess with Gunicorn
    2. Go
    3. Java
      1. HTTPServer
      2. Servlet
    4. Pushgateway
    5. Bridges
    6. Parsers
    7. Exposition Format
      1. Metric Types
      2. Labels
      3. Escaping
      4. Timestamps
      5. check metrics
  8. 5. Labels
    1. What Are Labels?
    2. Instrumentation and Target Labels
    3. Instrumentation
      1. Metric
      2. Multiple Labels
      3. Child
    4. Aggregating
    5. Label Patterns
      1. Enum
      2. Info
    6. When to Use Labels
      1. Cardinality
  9. 6. Dashboarding with Grafana
    1. Installation
    2. Data Source
    3. Dashboards and Panels
      1. Avoiding the Wall of Graphs
    4. Graph Panel
      1. Time Controls
    5. Singlestat Panel
    6. Table Panel
    7. Template Variables
  10. III. Infrastructure Monitoring
  11. 7. Node Exporter
    1. CPU Collector
    2. Filesystem Collector
    3. Diskstats Collector
    4. Netdev Collector
    5. Meminfo Collector
    6. Hwmon Collector
    7. Stat Collector
    8. Uname Collector
    9. Loadavg Collector
    10. Textfile Collector
      1. Using the Textfile Collector
      2. Timestamps
  12. 8. Service Discovery
    1. Service Discovery Mechanisms
      1. Static
      2. File
      3. Consul
      4. EC2
    2. Relabelling
      1. Choosing What to Scrape
      2. Target Labels
    3. How to Scrape
      1. metric_relabel_configs
      2. Label Clashes and honor_labels
  13. 9. Containers and Kubernetes
    1. cAdvisor
      1. CPU
      2. Memory
      3. Labels
    2. Kubernetes
      1. Running in Kubernetes
      2. Service Discovery
      3. kube-state-metrics
  14. 10. Common Exporters
    1. Consul
    2. HAProxy
    3. Grok Exporter
    4. Blackbox
      1. ICMP
      2. TCP
      3. HTTP
      4. DNS
      5. Prometheus Configuration
  15. 11. Working with Other Monitoring Systems
    1. Other Monitoring Systems
    2. InfluxDB
    3. StatsD
  16. 12. Writing Exporters
    1. Consul Telemetry
    2. Custom Collectors
      1. Labels
    3. Guidelines
  17. IV. PromQL
  18. 13. Introduction to PromQL
    1. Aggregation Basics
      1. Gauge
      2. Counter
      3. Summary
      4. Histogram
    2. Selectors
      1. Matchers
      2. Instant Vector
      3. Range Vector
      4. Offset
    3. HTTP API
      1. query
      2. query_range
  19. 14. Aggregation Operators
    1. Grouping
      1. without
      2. by
    2. Operators
      1. sum
      2. count
      3. avg
      4. stddev and stdvar
      5. min and max
      6. topk and bottomk
      7. quantile
      8. count_values
  20. 15. Binary Operators
    1. Working with Scalars
      1. Arithmetic Operators
      2. Comparison Operators
    2. Vector Matching
      1. One-to-One
      2. Many-to-One and group_left
      3. Many-to-Many and Logical Operators
    3. Operator Precedence
  21. 16. Functions
    1. Changing Type
      1. vector
      2. scalar
    2. Math
      1. abs
      2. ln, log2, and log10
      3. exp
      4. sqrt
      5. ceil and floor
      6. round
      7. clamp_max and clamp_min
    3. Time and Date
      1. time
      2. minute, hour, day_of_week, day_of_month, days_in_month, month, and year
      3. timestamp
    4. Labels
      1. label_replace
      2. label_join
    5. Missing Series and absent
    6. Sorting with sort and sort_desc
    7. Histograms with histogram_quantile
    8. Counters
      1. rate
      2. increase
      3. irate
      4. resets
    9. Changing Gauges
      1. changes
      2. deriv
      3. predict_linear
      4. delta
      5. idelta
      6. holt_winters
    10. Aggregation Over Time
  22. 17. Recording Rules
    1. Using Recording Rules
    2. When to Use Recording Rules
      1. Reducing Cardinality
      2. Composing Range Vector Functions
      3. Rules for APIs
      4. How Not to Use Rules
    3. Naming of Recording Rules
  23. V. Alerting
  24. 18. Alerting
    1. Alerting Rules
      1. for
      2. Alert Labels
      3. Annotations and Templates
      4. What Are Good Alerts?
    2. Configuring Alertmanagers
      1. External Labels
  25. 19. Alertmanager
    1. Notification Pipeline
    2. Configuration File
      1. Routing Tree
      2. Receivers
      3. Inhibitions
    3. Alertmanager Web Interface
  26. VI. Deployment
  27. 20. Putting It All Together
    1. Planning a Rollout
      1. Growing Prometheus
    2. Going Global with Federation
    3. Long-Term Storage
    4. Running Prometheus
      1. Hardware
      2. Configuration Management
      3. Networks and Authentication
    5. Planning for Failure
      1. Alertmanager Clustering
      2. Meta- and Cross-Monitoring
    6. Managing Performance
      1. Detecting a Problem
      2. Finding Expensive Metrics and Targets
      3. Reducing Load
      4. Horizontal Sharding
    7. Managing Change
    8. Getting Help
  28. Index