Erasure Coding

EC is a key change in Hadoop 3.x promising a significant improvement in HDFS utilization efficiencies as compared to earlier versions where replication factor of 3 for instance caused immense wastage of precious cluster file system for all kinds of data no matter what the relative importance was to the tasks at hand. 

EC can be setup using policies and assigning the policies to directories in HDFS. For this, HDFS provides an ec subcommand to perform administrative commands related to EC:

hdfs ec [generic options]    [-setPolicy -path <path> [-policy <policyName>] [-replicate]]    [-getPolicy -path <path>]    [-unsetPolicy -path <path>]    [-listPolicies]    [-addPolicies -policyFile <file>]    [-listCodecs] [-enablePolicy -policy <policyName>] ...

Get Big Data Analytics with Hadoop 3 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.