Performing table joins in Hive

In the previous chapter, we talked about how to perform joins in Pig. In this recipe, we are going to take a look at how to perform joins in Hive. Hive supports various types of joins such as inner, outer, and so on.

Getting ready

To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Hive installed on it. Here, I am using Hive 1.2.1.

How to do it...

To perform joins, we will need two types of datasets, which have something in common to join. Consider a situation where we have two employee tables and departments, and every employee table has a structure (ID, name, salary, and department ID) and every department table has an ID and a name. We will quickly create tables and load ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.