Service signals in Prometheus

You can tell the health and status of your service with three key metrics. It has become relatively common for service dashboards to instrument and build on these metrics as a baseline for understanding how your service is running.

These key metrics for a web-based service are:

  • Error rate
  • Response time
  • Throughput

Error rate can be gathered by using the labels within the http_request_duration_seconds_count metric, which is included from express-prom-bundle. The query we can use in Prometheus. We can match on the format of the response code and count the increase in the number of 500 responses versus all responses.

The Prometheus query could be:

sum(increase(http_request_duration_seconds_count{status_code=~"^5..$"}[5m])) ...

Get Kubernetes for Developers now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.