This chapter has given a whistle-stop tour through storage on a Hadoop cluster. In particular, we covered:
- The high-level architecture of HDFS, the main filesystem used in Hadoop
- How HDFS works under the covers and, in particular, its approach to reliability
- How Hadoop 2 has added significantly to HDFS, particularly in the form of NameNode HA and filesystem snapshots
- What ZooKeeper is and how it is used by Hadoop to enable features such as NameNode automatic failover
- An overview of the command-line tools used to access HDFS
- The API for filesystems in Hadoop and how at a code level HDFS is just one implementation of a more flexible filesystem abstraction
- How data can be serialized onto a Hadoop filesystem and some of the support provided in the ...