Entire books have been written on troubleshooting techniques. I’ve seen people who are natural troubleshooters and people who aren’t. Some people can seem to smell the source of a complex problem, while others can’t figure out what’s wrong even when they are the cause of the problem.
The most interesting problems are usually the ones that cause the most damage. These are the problems that can make or break your career. I’ve been in the middle of website failures where downtime means zero income for the company for the duration of the outage. I’ve worked through failures in banking networks, where each minute of outage costs millions of dollars in lost trades. The best resolutions were the ones that happened quickly and weren’t necessitated by my mistakes. The ones that were my fault were identified as such as quickly as possible. People make mistakes. When people try to hide their mistakes so they won’t be identified as the causes of outages, the outages often last longer than they would have if the people troubleshooting had been properly informed.
Regardless of the problem or the situation, there are some things to remember when you’re troubleshooting an outage. Here’s my short list.
I once worked with a former Marine sergeant. He had been in combat and had lived though months of rehabilitation after a gunshot wound to his shoulder. He was now working for me as a senior network engineer supporting a global network that included more than 10,000 nodes.