7.2. The Operations Manual

It is important to be ready for a lot of the activities discussed early on in the development lifecycle. You need to make sure the system can support the level of tuning and calibration required by post go-live support and live service. You also need to make sure that the system is well understood by all the teams.

The service delivery team will initially install and configure the system for production use. This will involve understanding all the installation and configuration settings and procedures. Operations will monitor the system 24/7 to ensure that it's up-and-running. When it fails, especially in the early days, it's the developers who get the call at 3 A.M. to look into the issue. The quicker you can resolve the issue, the quicker you can get back to sleep. If the logs or the alerts don't have the right level of information, it's going to take a while to locate and resolve the issue. The better the diagnostics, the easier your life will be. Similarly, the more you can configure and tune your alerts and logging, the easier it will be to put effective monitoring in place. You can also make your life so much easier (and that of the service delivery and operations teams) by adding as much contextual information as possible to errors, events, and logs. This information can be included in the operational procedures, along with the steps to ensure effective operations. This ultimately means you shouldn't need to get a call at all. Chapter 11 covers ...

Get Design – Build – Run: Applied Practices and Principles for Production-Ready Software Development now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.