Every application is different—they all have their own particular signatures. When monitoring your application regularly with the tools we have described, you will get to know that signature. You will know when to expect your CPU to spike because of a reporting cron job, or what the normal traffic graph is. And anything out of the ordinary is worth a closer look.
There is a wealth of information available through CloudWatch, and with some effort, you can correlate different graphs. What you can’t read from the graphs is why traffic spiked, and it’s not always the application that is misbehaving. It’s important to talk to everyone involved so you can figure out what happened, but more importantly, you will start to learn what to expect. After all, before you can conclude something is fishy, you need to have expectations.
The most dramatic failure is downtime. If your application runs on one instance, that instance is the most vulnerable part of your infrastructure. Luckily, we built our instance in such a way that we can easily launch another one. But after the fact, it is important to know what happened.
If your instance is still around, you can investigate what went wrong. If your instance is not there anymore, or if the Console reports it running but you can’t access it, it is more difficult. In these cases, you can politely ask in the AWS forums if there was a hardware or network failure. Most of the time you get an answer, eventually (Figure 5-6 ...