HDFS high availability

NameNodes are the heart of an HDFS Namespace. The availability of any cluster using HDFS is directly related to the availability of the NameNode.

Secondary NameNode, Checkpoint Node, and Backup Node

In Hadoop 1.X, the concept of a Secondary NameNode was introduced. The Secondary NameNode is a shield against disasters. On the failure of a NameNode, the Secondary NameNode can be used to recover the NameNode. The term Secondary NameNode is a misnomer. It is a cold standby and cannot service requests on its own. The NameNode can, however, read from the Secondary NameNode when encountered with failures.

The NameNode writes all HDFS updates to the edits log in the native filesystem. The log is written in an append-only fashion. The ...

Get Mastering Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.