Time for action – intentionally causing missing blocks

The next step should be obvious; let's kill three DataNodes in quick succession.

Tip

This is the first of the activities we mentioned that you really should not do on a production cluster. Although there will be no data loss if the steps are followed properly, there is a period when the existing data is unavailable.

The following are the steps to kill three DataNodes in quick succession:

  1. Restart all the nodes by using the following command:
    $ start-all.sh
    
  2. Wait until Hadoop dfsadmin -report shows four live nodes.
  3. Put a new copy of the test file onto HDFS:
    $ Hadoop fs -put file1.data file1.new
    
  4. Log onto three of the cluster hosts and kill the DataNode process on each.
  5. Wait for the usual 10 minutes ...

Get Hadoop Beginner's Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.