Understanding HDFS backups

Data volumes in Hadoop clusters range from terabytes to petabytes, and deciding what data to back up from such clusters is an important decision. A disaster recovery plan for Hadoop clusters needs to be formulated right at the cluster planning stages. The organization needs to identify the datasets they want to back up and plan backup storage requirements accordingly.

Backup schedules also need to be considered when designing a backup solution. The larger the data that needs to be backed up, the more time-consuming the activity. It would be more efficient if backups could be performed during a window when there is the least amount of activity on the cluster. This not only helps the backup commands to run efficiently, ...

Get Cloudera Administration Handbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.