Most software systems evolve over time. New features are added and old ones pruned. Fluctuating user demand means an efficient system must be able to quickly scale resources up and down. Demands for near-zero downtime require automatic failover to preprovisioned backup systems, normally in a separate datacenter or region.
On top of this, organizations often have multiple such systems to run, or need to run occasional tasks such as data mining that are separate from the main system, but require significant resources or talk to the existing system.
When using multiple resources, it is important to make sure they are efficiently used (i.e., that they’re not sitting idle), but can still cope with spikes in demand. Balancing cost effectiveness against the ability to quickly scale is a difficult task that can be approached in a variety of ways.
All of this means that running a nontrivial system is full of administrative tasks and challenges, the complexity of which should not be underestimated. It quickly becomes impossible to look after machines on an individual level; rather than patching and updating machines one by one, they must be treated identically. When a machine develops a problem, it should be destroyed and replaced, rather than nursed back to health.1
Various software tools and solutions exist to help with these challenges and cover each of the following areas to a greater or lesser degree:
Grouping “hosts”—either ...