Chapter 8. Integration with Hadoop

In this chapter, we will cover the following recipes:

  • Executing our first sample MapReduce job using the mongo-hadoop connector
  • Writing our first Hadoop MapReduce job
  • Running MapReduce jobs on Hadoop using streaming
  • Running a MapReduce job on Amazon EMR

Introduction

Hadoop is a well-known open source software to process large datasets. It also has an API for the MapReduce programming model, which is widely used. Nearly all the big data solutions have some sort of support to integrate them with Hadoop in order to use its MapReduce framework. MongoDB has a connector as well that integrates with Hadoop and lets us write MapReduce jobs using the Hadoop MapReduce API, process the data residing in the MongoDB/MongoDB dumps, ...

Get MongoDB Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.