The rate at which organizations learn may soon become the only sustainable source of competitive advantage.
Part I of this book discussed how to build distributed systems. Now we discuss how to run such systems.
The work done to keep a system running is called operations. More specifically, operations is the work done to keep a system running in a way that meets or exceeds operating parameters specified by a service level agreement (SLA). Operations includes all aspects of a service’s life cycle: from initial launch to the final decommissioning and everything in between.
Operational work tends to focus on availability, speed and performance, security, capacity planning, and software/hardware ...