11.3. Summary

In this chapter you saw that incident investigation is a key part of the project lifecycle and is often very difficult. Issues and incidents need to be resolved quickly and efficiently. By taking a step back and looking at some of the common incidents, you can design your systems to help piece together a timeline to support incident investigation.

The following are the key points to take away from this chapter:

  • Incidents will happen. It is important to remember that incidents will happen and they need to be resolved. Not all incidents will require changes to the application code, configuration, and so forth. Some incidents can be resolved through process changes.

  • Understand the incident. Gathering as much information as possible will greatly help as you identify the root cause of the problem. Incorporating the right amount of diagnostics in the application will help you understand the incident without having an effect on performance or functionality.

  • Re-create the incident. Incidents should be re-created in an isolated environment so that you can take a structured and analytical approach to resolve the issue as well as try out various options.

  • Verify the incident and the proposed solution. It is vital that issues be formally verified. You don't want to waste time trying to resolve things that actually aren't issues. It is equally important to verify the proposed resolution to ensure that it will in fact resolve the issue.

  • Implement the solution. Once the issue has been ...

Get Design – Build – Run: Applied Practices and Principles for Production-Ready Software Development now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.