HDFS block placement

Replication is an important feature in HDFS; it ensures data reliability against loss and high availability in the face of failures. The default replication factor is 3, though this parameter can be tuned using the dfs.replication configuration parameter. HDFS does not replicate the file as a whole; rather, it chunks the file into fixed size blocks and stores it across the cluster.

The replication factor can be specified at file creation time. It can be changed as and when desired. The salient feature of HDFS is the smart placement of blocks, and this feature distinguishes it from other distributed filesystems. The placement policy is said to be rack-aware, that is, it is cognizant of the physical location of where the block ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.