Running the WordCount program in a distributed cluster environment
This recipe describes how to run a MapReduce computation in a distributed Hadoop v2 cluster.
Getting ready
Start the Hadoop cluster by following the Setting up HDFS recipe or the Setting up Hadoop ecosystem in a distributed cluster environment using a Hadoop distribution recipe.
How to do it...
Now let's run the WordCount sample in the distributed Hadoop v2 setup:
- Upload the
wc-input
directory in the source repository to the HDFS filesystem. Alternatively, you can upload any other set of text documents as well.$ hdfs dfs -copyFromLocal wc-input .
- Execute the WordCount example from the
HADOOP_HOME
directory:$ hadoop jar hcb-c1-samples.jar \ chapter1.WordCount \ wc-input wc-output
Get Hadoop MapReduce v2 Cookbook - Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.