Hadoop 2.x

Until Hadoop 2.x, all the distributions were focused on addressing the limitations in Hadoop 1.x but did not deviate from the core architecture. Hadoop 2.x really changed the underlying architecture assumptions and turned out to be a real breakthrough; most importantly, the introduction of YARN. YARN was a new framework for managing Hadoop cluster, which introduced the ability to handle real-time processing needs in addition to the batch. Some important issues that were addressed are listed as follows:

  • Single NameNode issues
  • Dramatic increase in the number of nodes in the cluster
  • Extension to the number of tasks that can be successfully addressed with Hadoop

The following figure depicts the difference between the Hadoop 1.x and 2.x architectures ...

Get Practical Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.