Handling failures in YARN

A successful execution of a YARN application depends on robust coordination of all the YARN components, including containers, ApplicationMaster, NodeManager, and ResourceManager. Any fault in the coordination of the components or lack of sufficient cluster resource can cause the application to fail. The YARN framework is robust in terms of handling failures at different stages in the application execution path. The fault tolerance and recovery of the application depends on its current stage of execution and the component in which the problem occurs. The following section explains the recovery mechanism applied by YARN at component level.

The container failure

Containers are instantiated for executing the map or reduce tasks. ...

Get Learning YARN now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.