Root-Cause Analysis

Note

Whole Team

We prevent mistakes by fixing our process.

When I hear about a serious mistake on my project, my natural reaction is to get angry or frustrated. I want to blame someone for screwing up.

Unfortunately, this response ignores the reality of Murphy’s Law. If something can go wrong, it will. People are, well, people. Everybody makes mistakes. I certainly do. Aggressively laying blame might cause people to hide their mistakes, or to try to pin them on others, but this dysfunctional behavior won’t actually prevent mistakes.

Instead of getting angry, I try to remember Norm Kerth’s Prime Directive: everybody is doing the best job they can given their abilities and knowledge (see Retrospectives” later in this chapter for the full text of the Prime Directive). Rather than blaming people, I blame the process. What is it about the way we work that allowed this mistake to happen? How can we change the way we work so that it’s harder for something to go wrong?

This is root-cause analysis.

How to Find the Root Cause

A classic approach to root-cause analysis is to ask “why” five times. Here’s a real-world example.

Problem: When we start working on a new task, we spend a lot of time getting the code into a working state.

Why? Because the build is often broken in source control.

Why? Because people check in code without running their tests.

It’s easy to stop here and say, “Aha! We found the problem. People need to run their tests before checking in.” That is a correct answer, ...

Get The Art of Agile Development now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.