Monitoring NodeManager's health

NodeManager is a per-node daemon running on all the slave nodes of the cluster. All the NodeManager nodes are worker nodes that perform application execution. For efficient scheduling, it is important for the ResourceManager to monitor the health of these nodes. Health may include memory, CPU, network usage, and so on. The ResourceManager daemon will not schedule any new application execution requests to an unhealthy NodeManager.

The health checker script

YARN defines a mechanism to monitor health of a node using a script. An administrator needs to define a shell script to monitor the node. If the script returns ERROR as the first word in any of the output lines, then the ResourceManager marks the node as UNHEALTHY ...

Get Learning YARN now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.