Time for action – validating the table

The easiest way to do some initial validation is to perform some summary queries to validate the import. This is similar to the types of activities for which we used Hadoop Streaming in Chapter 4, Developing MapReduce Programs.

  1. Instead of using the Hive shell, pass the following HiveQL to the hive command-line tool to count the number of entries in the table:
    $ hive -e "select count(*) from ufodata;"
    

    You will receive the following response:

    Total MapReduce jobs = 1
    Launching Job 1 out of 1
    
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2012-03-03 16:15:15,510 Stage-1 map = 0%,  reduce = 0%
    2012-03-03 16:15:21,552 Stage-1 map = 100%,  reduce = 0%
    2012-03-03 16:15:30,622 Stage-1 ...

Get Hadoop Beginner's Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.