An introduction to the distributed file system

A distributed file system is practically the same as any file system due to its basic actions such as storing, reading, deleting files, and assigning security levels are support. The main difference is focused on the number of servers that can be used at same time without dealing with complexity of synchronization. In this case, we can store large files in different server nodes without caring about redundancy or parallel operations.

There are a lot of frameworks for distributed file systems, such as Red Hat Cluster FS, Ceph File system, Hadoop Distributed File System (HDFS), and Tachyon File System.

In this chapter, we will use HDFS, which is an open source implementation of Google File System, built ...

Get Practical Data Analysis - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.