Analyzing past postmortems

Once everything is said and done, it is good to go back and review past postmortems. Once a quarter, or once a year, collect all of the postmortems and try and pull together some metrics. These metrics can help to give you an insight into what your team is doing to respond to incidents:

  • Time to recovery
  • Time between failures
  • Number of alerts fired versus postmortems generated
  • Number of alerts fired per on-call rotation

MTTR and MTBF

Outside of incidents, two metrics that are often talked about are mean time to recovery (MTTR) and mean time between failures (MTBF). Looking at these numbers across a year can show how your ability to respond to incidents is improving or changing. Note how the goal is to minimize the time until ...

Get Real-World SRE now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.