CephFS a drop-in replacement for HDFS

Hadoop is a programming framework that supports the processing and storage of large data sets in a distributed computing environment. The Hadoop core includes the analytics Map-Reduce engine and the distributed file system known as HDFS (Hadoop Distributed File System), which has several weaknesses that are listed as follows:

  • It had a single point of failure until the recent versions of HDFS
  • It isn't POSIX compliant
  • It stores at least 3 copies of data
  • It has a centralized name server resulting in scalability challenges

The Apache Hadoop project and other software vendors are working independently to fix these gaps in HDFS.

The Ceph community has done some development in this space, and it has a file system plugin ...

Get Ceph Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.