Reliability, recoverability, timely error detection, and continuous operations are primary characteristics of any high-availability solution. Cloud Foundry, if used correctly, can become the central platform for your app development and deployment activities. As such, the resiliency of the platform is critical to your business continuity. Preparing for disaster and disruption to Cloud Foundry and the underlying stack is vital.
You can mitigate and handle failures in three key ways:
Design for added resiliency and high availability
Employ backup and restoration mechanisms
Repeatedly run platform verification tests
This chapter provides you with an overview of Cloud Foundry’s built-in resiliency capabilities. It then goes on to discuss how you can further expand resiliency. This chapter also provides you with an overview of some of the available techniques used for backing up and restoring Cloud Foundry.
Resiliency and high availability go hand in hand. End users do not care how resilient your system is; they are concerned only about the availability of their app, service, or functionality. For example, when I am watching Netflix, the last thing on my mind is that Chaos Monkey can take out a vital component. I care only about watching my movie. Operators employ resiliency in technical stacks to promote HA. HA is measured from the perception of the end user. End users experience ...