Importing data from another Hadoop cluster

Sometimes, we may want to copy data from one HDFS to another either for development, testing, or production migration. In this recipe, we will learn how to copy data from one HDFS cluster to another.

Getting ready

To perform this recipe, you should already have a running Hadoop cluster.

How to do it...

Hadoop provides a utility called DistCp, which helps us copy data from one cluster to another. Using this utility is as simple as copying from one folder to another:

hadoop distcp hdfs://hadoopCluster1:9000/source hdfs://hadoopCluster2:9000/target

This would use a Map Reduce job to copy data from one cluster to another. You can also specify multiple source files to be copied to the target. There are a couple ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.