Using multiple disks/volumes and limiting HDFS disk usage

Hadoop supports specifying multiple directories for the DataNode data directory. This feature allows us to utilize multiple disks/volumes to store data blocks in DataNodes. Hadoop tries to store equal amounts of data in each directory. It also supports limiting the amount of disk space used by HDFS.

How to do it...

The following steps will show you how to add multiple disk volumes:

  1. Create HDFS data storage directories in each volume.
  2. Locate the hdfs-site.xml configuration file. Provide a comma-separated list of directories corresponding to the data storage locations in each volume under the dfs.datanode.data.dir property as follows:
    <property> <name>dfs.datanode.data.dir</name> <value>/u1/hadoop/data, ...

Get Hadoop MapReduce v2 Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.