Chapter 4. Backing Up and Restoring HBase Data

In this chapter, we will cover:

  • Full shutdown backup using distcp
  • Using CopyTable to copy data from one table to another
  • Exporting an HBase table to dump files on HDFS
  • Restoring HBase data by importing dump files from HDFS
  • Backing up NameNode metadata
  • Backing up region starting keys
  • Cluster replication

Introduction

If you are thinking about using HBase in production, you will probably want to understand the backup options and practices of HBase. The challenge is that the dataset you need to back up might be huge, so the backup solution must be efficient. It is expected to be able to scale to hundreds of terabytes of storage, and finish restoring the data in a reasonable time frame.

There are two strategies ...

Get HBase Administration Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.