The remainder of this chapter uses examples to demonstrate some of the important monitoring techniques you need to know and perform.
As mentioned earlier, server statistics paint only a part of the capacity picture. You should also measure and record higher-level metrics specific to your application—not specific to one server, but to the whole system. CPU and server disk usage on a web server doesn't tell the whole tale of what's happening to each web request, and a stream of web requests can involve multiple pieces of hardware.
At Flickr, we have a dashboard that collects these application-level metrics. They are collected on both a daily and cumulative basis. Some of the metrics can be drawn from a database, such as the number of photos uploaded. Others can come from aggregating some of the server statistics, such as total disk space consumed across disparate machines. Data collection techniques can be as simple as running a script from a cron job and putting results into its own database for future mining.
Some of the metrics currently tracked at Flickr are:
Photos uploaded (daily, cumulative)
Photos uploaded per hour
Average photo size (daily, cumulative)
Processing time to segregate photos based on their different sizes (hourly)
User registrations (daily, cumulative)
Pro account signups (daily, cumulative)
Number of photos tagged (daily, cumulative)
API traffic (API keys in use, requests made per second, per key)
Number of unique ...