Chapter 2. Dealing with the Root Cause

Mike was in his office, finishing up the list of steps the networking team would take to reduce the likelihood of another outage. Despite close to two decades of working in financial technology, he never got used to the outages, and the adrenaline highs and lows that came with them. No two were alike, and they were always surprising. As the amount of processing power and bandwidth required to run financial intuitions increased, so did the complexity of the systems and networks. Mike often argued with other engineers about whether humans had reached the limits of their ability to understand how these systems function and how they broke down.

The phone rang, and Mike picked it up. It was Bill.

“Hey, Mike, can you please come down to the second floor?”

Mike knew, right away, what was happening. Large financial services firms are known for the boom-and-bust mentality—binge hiring when times are good, and laying off in a series of rounds when times are bad. In his six years at the firm, Mike had seen the way that R.I.F.s—Reductions in Force—were conducted: people were called to the second floor by their manager, and wouldn’t return. At the end of the day, the manager would gather the remaining team to brief them. Since “the departures” were not allowed to come back, their managers had to pack whatever remained in their offices and desks into cardboard boxes, to be shipped home.

Mike checked his firm-issued BlackBerry. It was still working. This ...

Get Beyond Blame now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.