15.3. Crash resilience or failure transparency

If a program runs on a single machine and a failure occurs we assume that all the memory is lost; that is, all the data structures of the program are lost. If no externally visible effects have been caused by the program it is as though it had never run and it can simply be restarted. In practice, every program will eventually perform output or cause some change of permanent state or communicate an intermediate result. If a crash occurs during a program run and after such an externally visible action, we consider how to restart after the crash.

An aspect of system design is the extent to which an application is offered support to recover from system crashes. Another name for crash resilience is ...

Get Operating Systems: Concurrent and Distributed Software Design now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.