Chapter 6. After Action Review (AAR)

Simply put, after the incident is resolved, there are two key areas to evaluate: 1) what broke? and 2) how did the people respond to what broke? To that end, instituting a comprehensive incident review process is a critical step in maximizing future uptime. Without doubt, identifying the cause(s) and or contributing factors of a technology failure are important, as the technology failure provides an opportunity to learn about the operating environment, make improvements, and fine-tune the IRT response mechanism to minimize future IT downtime. More importantly, establishing a positive culture around an honest and in-depth evaluation of the human part of the incident response is critical to improving how the people will engage and perform on future incidents.

Note

Don’t let a good crisis go to waste! Learn from it to be better the next time. It’s all about getting better—not finding blame.

The Name Is Important

There are some in the IT industry who refer to the incident review process as a post mortem. This term was associated with incident reviews because just as a post mortem searches for the cause of a person’s death, the incident review searches for the cause of the technology’s failure. In our opinion, post mortem is not the best term to use for evaluating an incident response or trying to determine the cause of an IT problem. For starters, when evaluating the performance of people, let’s avoid using words typically associated with death! ...

Get Incident Management for Operations now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.