Avoiding Blame

In a lot of teams nobody wants to be the sucker who broke everything. When something breaks, people point their finger at the person next to them:

It can't be the transcoding system because that hasn't changed in the past three weeks. It must be a problem in our streaming servers.

Well, ops did an upgrade on the OS on those boxes yesterday. I don't see why I should debug my code until they can prove it's not their problem.

That OS has been in production in our Virginia datacenter for six months. The problem must be in the application configuration.

And so on (see Figure 10-5).

An imperfect scenario for when things break

Figure 10-5. An imperfect scenario for when things break

Everyone has a plausible reason for passing the blame onto someone else, but nobody is stepping up and actually fixing things. Good teams know that until a fix has been pushed it doesn't matter who broke things, and every minute spent being defensive is another minute something is broken for users without a fix going out. They focus on trying every possibility until they've found what was broken.

It's really easy to prove your code or system has a bug. Proving that a simple system or piece of code is bug free is a known hard problem in computer science, even if you assume the expected behavior is clearly defined. It's impossible to prove any slightly complex system has no problems. In comparison, it's really easy to prove your code or system has ...

Get Web Operations now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.