Executing map side joins in Hive

Map side joins are special types of optimizations; Hive executes these automatically based on table sizes. In this recipe, we are going to explore map side joins in further detail.

Getting ready

To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Hive installed on it. Here, I am using Hive 1.2.1.

How to do it...

To perform map joins, we need two types of datasets that have something in common to join. One dataset also has to be big, and the other has to be small in comparison. Consider a situation where we have two tables for employees and departments; the employee table has a structure (ID, name, salary, and department ID) and the department table has an ID and a name. ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.