Chapter 6. Hadoop SQL Engines

Data is the new oil. No: Data is the new soil.

—David McCandless

One of the biggest decisions in the design of a Hadoop ecosystem is selecting the SQL engines for the use cases. You have to ask yourself, for different types of applications and projects, should we use Hive on Tez, Impala, Spark SQL, Phoenix for HBase, and so on? The decision gets harder as each new release adds functionality that overlaps other SQL engines. In this chapter we discuss Hadoop SQL engines and two of the primary tools that use these engines, Hive and Pig.

Where SQL Was Born

In the early days of computing, everything was file based and only geeks could parse and process such data. With RDBMSs, SQL became the universal language of data ...

Get Virtualizing Hadoop: How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.