Time for action – intentionally causing missing blocks

The next step should be obvious; let's kill three DataNodes in quick succession.

Tip

This is the first of the activities we mentioned that you really should not do on a production cluster. Although there will be no data loss if the steps are followed properly, there is a period when the existing data is unavailable.

The following are the steps to kill three DataNodes in quick succession:

  1. Restart all the nodes by using the following command:
    $ start-all.sh
    
  2. Wait until Hadoop dfsadmin -report shows four live nodes.
  3. Put a new copy of the test file onto HDFS:
    $ Hadoop fs -put file1.data file1.new
    
  4. Log onto three of the cluster hosts and kill the DataNode process on each.
  5. Wait for the usual 10 minutes ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.