Chapter 7. Large-Scale MapReduce

In this chapter, we will consider how to write MapReduce jobs, how to design a large-scale MapReduce using HBase, how the internals of it work, and how to optimize the HBase framework to do it. In doing so, we will discuss the following:

  • MapReduce frameworks
  • When to use MapReduce and when not to
  • Case study with example code and explanations

Introduction

HBase provides various ways to leverage the potential of MapReduce based on the stack and the architecture you are going to use.

Before we start, let's do a quick revisit to the components, which will be used in MapReduce:

  • Record reader
  • Mapper
  • Combiner
  • Practitioner
  • Shuffle and sort
  • Reduce
  • Output format
  • Record reader: The core responsibility of a record reader is to analyze the ...

Get HBase High Performance Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.