Operations in the Hadoop 2 world

As mentioned in Chapter 2, Storage, some of the most significant changes made to HDFS in Hadoop 2 involve its fault tolerance and better integration with external systems. This is not just a curiosity, but the NameNode High Availability features, in particular, have made a massive difference in the management of clusters since Hadoop 1. In the bad old days of 2012 or so, a significant part of the operational preparedness of a Hadoop cluster was built around mitigations for, and restoration processes around failure of the NameNode. If the NameNode died in Hadoop 1, and you didn't have a backup of the HDFS fsimage metadata file, then you basically lost access to all your data. If the metadata was permanently lost, ...

Get Learning Hadoop 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.