Conclusion

In this article, you have learned how Impala fits into the Hadoop software stack:

  • Querying data files stored in HDFS.
  • Enabling interactive queries for data originally managed by Hive. Using Hive where convenient for some ETL tasks, then querying the data in Impala.
  • Providing a SQL frontend for data managed by HBase.
  • Using data files produced by MapReduce, Pig, and other Hadoop components.
  • Utilizing data formats from simple (text), to compact and efficient (Avro, RCFile, SequenceFile), to optimized for data warehouse queries (Parquet).

You have seen the interesting benefits Impala brings to users coming from different backgrounds:

  • For Hadoop users, how Impala brings the familiarity and flexibility of fast, interactive SQL to the Hadoop world.
  • For database users, how the combination of Hadoop and Impala makes it simple to set up a distributed database for data warehouse-style queries.

You have gotten a taste of what is involved in setting up Impala, loading data, and running queries.

The rest is in your hands!

Get Cloudera Impala now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.