Chapter 28. MapReduce and Monoids

Introduction

This chapter is based on Jimmy Lin’s paper[15] titled “Monoidify! Monoids as a Design Principle for Efficient MapReduce Algorithms.” Lin clearly introduces monoids as a design principle for efficient MapReduce algorithms. But what is a monoid, what properties define it, and how does it aid the MapReduce paradigm? Lin shows that when your MapReduce operations are not monoids, it is very hard to use combiners efficiently.

Also, David Saile[13] states that:

A monoid is an algebraic structure with a single associative binary operation and an identity element. For example, the natural numbers1 N form a monoid under addition with identity element zero. In classic MapReduce, the mapper is not constrained, but the reducer is required to be (the iterated application of) an associative operation. Recent research argued that reduction is in fact monoidal in known applications of MapReduce. That is, reduction is indeed the iterated application of an associative operation “•” with a unit u. In the case of the word-occurrence count example, reduction iterates addition “+” with “0” as unit. The parallel execution schedule may be more flexible if commutativity is required in addition to associativity.

A detailed analysis of common MapReduce computations on the basis of monoids can be found in [5]. Next, we will briefly review MapReduce’s combiners and abstract algebra’s monoids and see how they are related to each other. Converting nonassociative ...

Get Data Algorithms now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.