Time for action – killing a DataNode process

Firstly, we'll kill a DataNode. Recall that the DataNode process runs on each host in the HDFS cluster and is responsible for the management of blocks within the HDFS filesystem. Because Hadoop, by default, uses a replication factor of 3 for blocks, we should expect a single DataNode failure to have no direct impact on availability, rather it will result in some blocks temporarily falling below the replication threshold. Execute the following steps to kill a DataNode process:

  1. Firstly, check on the original status of the cluster and check whether everything is healthy. We'll use the dfsadmin command for this:
    $ Hadoop dfsadmin -report
    Configured Capacity: 81376493568 (75.79 GB)
    Present Capacity: 61117323920 ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.