O'Reilly logo

Learning Hadoop 2 by Garry Turkington, Gabriele Modena

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Summary

This chapter has given a whistle-stop tour through storage on a Hadoop cluster. In particular, we covered:

  • The high-level architecture of HDFS, the main filesystem used in Hadoop
  • How HDFS works under the covers and, in particular, its approach to reliability
  • How Hadoop 2 has added significantly to HDFS, particularly in the form of NameNode HA and filesystem snapshots
  • What ZooKeeper is and how it is used by Hadoop to enable features such as NameNode automatic failover
  • An overview of the command-line tools used to access HDFS
  • The API for filesystems in Hadoop and how at a code level HDFS is just one implementation of a more flexible filesystem abstraction
  • How data can be serialized onto a Hadoop filesystem and some of the support provided in the ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required