Managing HDFS
As we saw when killing and restarting nodes in Chapter 6, When Things Break, Hadoop automatically manages many of the availability concerns that would consume a lot of effort on a more traditional filesystem. There are some things, however, that we still need to be aware of.
Where to write data
Just as the NameNode can have multiple locations for storage of fsimage
specified via the dfs.name.dir
property, we explored earlier that there is a similar-appearing property called dfs.data.dir
that allows HDFS to use multiple data locations on a host, which we will look at now.
This is a useful mechanism that works very differently from the NameNode property. If multiple directories are specified in dfs.data.dir
, Hadoop will view these as ...
Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.