Summary

This chapter has given a whistle-stop tour through storage on a Hadoop cluster. In particular, we covered:

  • The high-level architecture of HDFS, the main filesystem used in Hadoop
  • How HDFS works under the covers and, in particular, its approach to reliability
  • How Hadoop 2 has added significantly to HDFS, particularly in the form of NameNode HA and filesystem snapshots
  • What ZooKeeper is and how it is used by Hadoop to enable features such as NameNode automatic failover
  • An overview of the command-line tools used to access HDFS
  • The API for filesystems in Hadoop and how at a code level HDFS is just one implementation of a more flexible filesystem abstraction
  • How data can be serialized onto a Hadoop filesystem and some of the support provided in the ...

Get Learning Hadoop 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.