Time for action – performing a join

Joins are a very frequently used tool in SQL, though sometimes appear a little intimidating to those new to the language. Essentially a join allows rows in multiple tables to be logically combined together based on a conditional statement. Hive has rich support for joins which we will now examine.

  1. Create the following as join.hql:
    SELECT t1.sighted, t2.full_name
    FROM ufodata t1 JOIN states t2
    ON (LOWER(t2.abbreviation) = LOWER(SUBSTR( t1.sighting_location, (LENGTH(t1.sighting_location)-1)))) 
    LIMIT 5 ;
  2. Execute the query:
    $ hive -f join.hql
    

    You will receive the following response:

    OK
    20060930  Alaska
    20051018  Alaska
    20050707  Alaska
    20100112  Alaska
    20100625  Alaska
    Time taken: 33.255 seconds
    

What just happened?

The ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.