Setting the HDFS block size

HDFS stores files across the cluster by breaking them down in to coarser-grained, fixed-size blocks. The default HDFS block size is 64 MB. Block size of a data product can affect the performance of the filesystem operations where larger block sizes would be more effective if you are storing and processing very large files. Block size of a data product can also affect the performance of MapReduce computations, as the default behavior of Hadoop is to create one Map task for each data block of the input files.

How to do it...

The following steps show you how to use the NameNode configuration file to set the HDFS block size:

  1. Add or modify the following code in the $HADOOP_HOME/etc/hadoop/hdfs-site.xml file. The block size ...

Get Hadoop MapReduce v2 Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.