Appendix D. Distributions

There are more choices to install HBase than using the Apache releases. Here we list what is available alternatively.

Cloudera’s Distribution Including Apache Hadoop

Cloudera’s Distribution including Apache Hadoop (hereafter CDH) is based on the most recent stable version of Apache Hadoop with numerous patches, backports, and updates. Cloudera makes the distribution available in a number of different formats: source and binary tar files, RPMs, Debian packages, VMware images, and scripts for running CDH in the cloud. CDH is free, released under the Apache 2.0 license and available at http://www.cloudera.com/hadoop/.

To simplify deployment, Cloudera hosts packages on public yum and apt repositories. CDH enables you to install and configure Hadoop, and HBase, on each machine using a single command. Kickstart users can commission entire Hadoop clusters without manual intervention.

CDH manages cross-component versions and provides a stable platform with a compatible set of packages that work together. As of CDH3, the following packages are included, many of which are covered elsewhere in this book:

HDFS

Self-healing distributed filesystem

MapReduce

Powerful, parallel data processing framework

Hadoop Common

A set of utilities that support the Hadoop subprojects

HBase

Hadoop database for random read/write access

Hive

SQL-like queries and tables on large data sets

Pig

Dataflow language and compiler

Oozie

Workflow for interdependent Hadoop jobs

Sqoop

Integrates databases and data warehouses ...

Get HBase: The Definitive Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.