O'Reilly logo

Apache Mesos Essentials by Dharmesh Kakadia

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Fault tolerance

Fault tolerance is an important requirement for a data center OS. The ability to keep functioning in the event of a failure becomes indispensable, when working with large-scale systems. Mesos has no single point of failure in the architecture and can continue to operate in case of faults of various entities. There are three modes of fault tolerance that we have to deal with: machine failures, bugs in Mesos processes, and upgrades. Note that all of the points mentioned earlier can happen with any entity in Mesos. There are mainly three components of Mesos that need to be resilient to these faults: master, slave, and framework.

In case of machine running the Mesos slave fails, the master will notice, and inform the frameworks about ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required