Appendix B. Cloudera’s Distribution for Hadoop
Cloudera’s Distribution for Hadoop (hereafter CDH) is based on the most recent stable version of Apache Hadoop with numerous patches, backports, and updates. Cloudera makes the distribution available in a number of different formats: source and binary tar files, RPMs, Debian packages, VMware images, and scripts for running CDH in the cloud. CDH is free, released under the Apache 2.0 license and available at http://www.cloudera.com/hadoop/.
To simplify deployment, Cloudera hosts packages on public yum
and apt
repositories. CDH enables you to install and configure Hadoop on each
machine using a single command. Kickstart users can commission entire
Hadoop clusters without manual intervention.
CDH manages cross-component versions and provides a stable platform with a compatible set of packages that work together. As of CDH3, the following packages are included, many of which are covered elsewhere in this book:
HDFS – Self-healing distributed file system
MapReduce – Powerful, parallel data processing framework
Hadoop Common – A set of utilities that support the Hadoop subprojects
HBase – Hadoop database for random read/write access
Hive – SQL-like queries and tables on large datasets
Pig – Dataflow language and compiler
Oozie – Workflow for interdependent Hadoop jobs
Sqoop – Integrate databases and data warehouses with Hadoop
Flume – Highly reliable, configurable streaming data collection
ZooKeeper – Coordination service for distributed applications ...
Get Hadoop: The Definitive Guide, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.