The Hadoop Distributed File System (HDFS) is one of the most popular implementations of a distributed file system. It is at the core of Hadoop which is a distributed computing platform. HDFS was designed and has evolved with the following goals in mind, which complement the storage requirements for the protection of the CI:
- Hardware failure: HDFS replicates each file block on three (default) nodes. The core idea of using distributed computing is to be able to leverage commodity hardware and hence the cluster consists of a large number of relatively small-size nodes. With large numbers of nodes, the probability of failure of a node increases. Detection and recovery from these hardware failures without any data ...