O'Reilly logo

Mastering Hadoop by Sandeep Karanth

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

The Map task

The efficiency of the Map phase is decided by the specifications of the job inputs. We saw that having too many small files leads to proliferation of Map tasks because of a large number of splits. Another important statistic to note is the average runtime of a Map task. Too many or too few Map tasks are both detrimental for job performance. Striking a balance between the two is important, much of which depends on the nature of the application and data.

Tip

A rule of thumb is to have the runtime of a single Map task to be around a minute to three minutes, based on empirical evidence.

The dfs.blocksize attribute

The default block size of files in a cluster is overridden in the cluster configuration file, hdfs-site.xml, generally present ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required