Chapter 9. HDFS Replacements

The parallelism and scalability of the MapReduce computing paradigm are greatly influenced by the underlying filesystem. HDFS is the default filesystem that comes with most Hadoop distributions. The filesystem automatically chunks files into blocks and stores them in a replicated fashion across the cluster. The information of the distribution pattern is supplied to the MapReduce engine that can then smartly place tasks so that movement of data over the network is minimized.

However, there are many use cases where HDFS may not be ideal. In this chapter, we will look at the following topics:

  • The strengths and drawbacks of HDFS when compared to other POSIX filesystems.
  • Hadoop's support for other filesystems. One of them ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.