Performing a join with Hive

This recipe will guide you on how to use Hive to perform a join across two datasets. The first dataset is the book details dataset of the Book-Crossing database and the second dataset is the reviewer ratings for those books. This recipe will use Hive to find the authors with the most number of ratings of more than 3 stars.

Getting ready

Follow the previous Hive batch mode – using a query file recipe.

How to do it...

This section demonstrates how to perform a join using Hive. Proceed with the following steps:

  1. Start the Hive CLI and use the Book-Crossing database:
    $ hive
    hive > USE bookcrossing;
    
  2. Create the books and book ratings tables by executing the create-book-crossing.hql Hive query file after referring to the previous ...

Get Hadoop MapReduce v2 Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.