Hadoop Distributed File System

The Hadoop Distributed File System (HDFS) is one of the most popular implementations of a distributed file system. It is at the core of Hadoop which is a distributed computing platform. HDFS was designed and has evolved with the following goals in mind, which complement the storage requirements for the protection of the CI:

  • Hardware failure: HDFS replicates each file block on three (default) nodes. The core idea of using distributed computing is to be able to leverage commodity hardware and hence the cluster consists of a large number of relatively small-size nodes. With large numbers of nodes, the probability of failure of a node increases. Detection and recovery from these hardware failures without any data ...

Get Artificial Intelligence for Big Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.