Chapter 6. Improving Your Uptime

We now have an application that uses the techniques outlined in Chapter 3 for scaling gracefully. This application has become very resilient—the single points of failure have been eliminated—and as a consequence, we sleep much better at night. The game has changed: we have our downtime under control. It’s time to start working on improving the quality of our uptime.

The essence of improving our infrastructure comes down to the following:

  • Handling the acceleration of traffic (increase/decrease)

  • Optimizing utilization

Now that we have implemented our infrastructure to handle scale, the peaks are no longer our immediate problem. Instead we have to worry about the acceleration in traffic, how we climb up and down those slopes.

Our infrastructure is probably also growing. We are already more efficient because we scale automatically. If we can increase the traffic individual instances can handle, we will optimize utilization of our assets. And that is exactly the underlying principle of cloud computing: minimize waste by optimizing utilization.

In the upcoming sections, we will look at some ways to measure and monitor the use of an AWS infrastructure. We will then analyze this information and show how you can use it to tune your components to optimize use of resources and become more resilient to changes in traffic.

Measure

As we have shown in the previous chapter, Amazon CloudWatch is a very versatile monitoring tool. You can measure all sorts of things. But ...

Get Programming Amazon EC2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.