Chapter 8. Failures in YARN

Dealing with failures in distributed systems is comparatively more challenging and time consuming. Also, the Hadoop and YARN frameworks run on commodity hardware and cluster size nowadays; this size can vary from several nodes to several thousand nodes. So handling failure scenarios and dealing with ever-growing scaling issues is very important. In this section, we will focus on failures in the YARN framework: the causes of failures and how to overcome them.

In this chapter, we will cover the following topics:

  • ResourceManager failures
  • ApplicationMaster failures
  • NodeManager failures
  • Container failures
  • Hardware failures

We will be dealing with the root causes of these failures and the solutions to them.

ResourceManager failures ...

Get YARN Essentials now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.