Summary

This chapter has given a whistle-stop tour through storage on a Hadoop cluster. In particular, we covered:

The high-level architecture of HDFS, the main filesystem used in Hadoop
How HDFS works under the covers and, in particular, its approach to reliability
How Hadoop 2 has added significantly to HDFS, particularly in the form of NameNode HA and filesystem snapshots
What ZooKeeper is and how it is used by Hadoop to enable features such as NameNode automatic failover
An overview of the command-line tools used to access HDFS
The API for filesystems in Hadoop and how at a code level HDFS is just one implementation of a more flexible filesystem abstraction
How data can be serialized onto a Hadoop filesystem and some of the support provided in the ...

Get Learning Hadoop 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Learning Hadoop 2 by Garry Turkington, Gabriele Modena