You are previewing Programming Amazon EC2.

Programming Amazon EC2

Cover of Programming Amazon EC2 by Flavia Paganelli... Published by O'Reilly Media, Inc.

Chapter 5. Managing the Inevitable Downtime

With smaller infrastructures, downtime is inevitable. Small infrastructures are different from large ones in that you basically do not have the means to get rid of all your single points of failure. With physical systems, downtime due to hardware failure is a big problem, and waiting for replacement parts is a nerve-wracking experience. And if you have the funds to stock replacements, you can just as well put them in production and remove your single points of failure. With a cloud infrastructure, you don’t have this problem; you can replace most of your assets whenever you want. This characteristic is central in our approach to managing small infrastructures. You might say we plan to fail.

As in hardware infrastructures, in cloud infrastructures, failing hardware is one cause of trouble. Insufficient capacity is another. In this chapter, we will look at how to measure your system. Is the app up or down? Are the disks over capacity? Is the load breaching expected thresholds? What is the CPU utilization of the RDS instance? We will show you how to monitor your systems from the inside and the outside. We’ll take a close look at CloudWatch. We will describe the tools you can use to understand what your system is doing. With this understanding, you can manage your infrastructure if it goes down, or just help it cope with increasing demands. Having limited resources is an opportunity to optimize your system and get the most out of your hardware. ...

The best content for your career. Discover unlimited learning on demand for around $1/day.