O'Reilly logo
  • Sagar Mainkar thinks this is interesting:

Hadoop runs the job by dividing it into tasks, of which there are two types: map tasks and reduce tasks. The tasks are scheduled using YARN and run on nodes in the cluster. If a task fails, it will be automatically rescheduled to run on a different node.

Hadoop divides the input to a MapReduce job into fixed-size pieces called input splits, or just splits. Hadoop cre...


Cover of Hadoop: The Definitive Guide, 4th Edition


Input Splits