Setting the file replication factor

HDFS stores files across the cluster by breaking them down into coarser-grained, fixed-size blocks. These coarser-grained data blocks are replicated to different DataNodes mainly for fault-tolerance purposes. Data block replication also has the ability to increase the data locality of the MapReduce computations and to increase the total data access bandwidth as well. Reducing the replication factor helps save storage space in HDFS.

The HDFS replication factor is a file-level property that can be set on a per-file basis. This recipe shows you how to change the default replication factor of an HDFS deployment affecting the new files that will be created afterwards, how to specify a custom replication factor at ...

Get Hadoop MapReduce v2 Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.