Day 2: The Batch Layer

Yesterday we saw how we could use Hadoop to parallelize across a cluster of machines. MapReduce can be used to solve a huge range of problems, but today we’re going to concentrate on how it fits into the Lambda Architecture.

Before we look at that, however, let’s consider the problem that the Lambda Architecture exists to solve—what’s wrong with traditional data systems?

Problems with Traditional Data Systems

Data systems are nothing new—we’ve been using databases to answer questions about the data stored within them for almost as long as computers have existed. Traditional databases work well up to a point, but the volume of data we’re trying to handle these days is pushing them beyond the point where ...

Get Seven Concurrency Models in Seven Weeks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.