Managing HDFS

As we saw when killing and restarting nodes in Chapter 6, When Things Break, Hadoop automatically manages many of the availability concerns that would consume a lot of effort on a more traditional filesystem. There are some things, however, that we still need to be aware of.

Where to write data

Just as the NameNode can have multiple locations for storage of fsimage specified via the dfs.name.dir property, we explored earlier that there is a similar-appearing property called dfs.data.dir that allows HDFS to use multiple data locations on a host, which we will look at now.

This is a useful mechanism that works very differently from the NameNode property. If multiple directories are specified in dfs.data.dir, Hadoop will view these as ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.