Running the WordCount program in a distributed cluster environment

This recipe describes how to run a MapReduce computation in a distributed Hadoop v2 cluster.

Getting ready

Start the Hadoop cluster by following the Setting up HDFS recipe or the Setting up Hadoop ecosystem in a distributed cluster environment using a Hadoop distribution recipe.

How to do it...

Now let's run the WordCount sample in the distributed Hadoop v2 setup:

  1. Upload the wc-input directory in the source repository to the HDFS filesystem. Alternatively, you can upload any other set of text documents as well.
    $ hdfs dfs -copyFromLocal wc-input .
    
  2. Execute the WordCount example from the HADOOP_HOME directory:
    $ hadoop jar hcb-c1-samples.jar \
    chapter1.WordCount \
    wc-input wc-output

Get Hadoop MapReduce v2 Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.