Getting Started with Impala

Depending on your background and existing Apache Hadoop infrastructure, you can approach the Cloudera Impala product from different angles:

  • If you are from a database background and a Hadoop novice, the Cloudera QuickStart VM lets you try out the basic Impala features straight out of the box. This single-node VM configuration is suitable to get your feet wet with Impala. (For performance or scalability testing, you would use real hardware in a cluster configuration.) You run the VM in VMWare, KVM, or VirtualBox, start the Impala service through the Cloudera Manager web interface, and then interact with Impala through the impala-shell interpreter or the ODBC and JDBC interfaces.
  • For more serious testing or large-scale deployment, you can download and install the Cloudera Impala software in a real cluster environment. You can freely install the software either through standalone packages or by using the Cloudera Manager “parcel” feature, which enables easier upgrades. You install the Impala server on each data node and designate one node (typically the same as the Hadoop namenode) to also run the Impala StateStore daemon. The simplest way to get up and running is through the Cloudera Manager application, where you can bootstrap the whole process of setting up a Hadoop cluster with Impala just by specifying a list of hostnames for the cluster.
  • If you want to understand how Impala works at a deep level, you can get the Impala source code from GitHub and ...

Get Cloudera Impala now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.