Impala

Hive is not the only product providing SQL-on-Hadoop capability. The second most widely used is likely Impala, announced in late 2012 and released in spring 2013. Though originally developed internally within Cloudera, its source code is periodically pushed to an open source Git repository (https://github.com/cloudera/impala).

Impala was created out of the same perception of Hive's weaknesses that led to the Stinger initiative.

Impala also took some inspiration from Google Dremel (http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36632.pdf) which was first openly described by a paper published in 2009. Dremel was built at Google to address the gap between the need for very fast queries on very large datasets and ...

Get Learning Hadoop 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.