Appendix B. Installing Hadoop Ecosystem Products

In addition to the core functionality provided in Hadoop, this book covers several other Hadoop ecosystem projects that are built on top of Hadoop. In a typical setting, these products are often installed either on the same cluster that hosts Hadoop and YARN, or are configured to connect to the Hadoop cluster. In this book, we will assume that you have setup and configured Apache Hadoop in a single node, pseudo-distributed mode. However, there are several other options to get up and running with a single node Hadoop cluster along with the Hadoop ecosystem products that we will discuss in this book.

Packaged Hadoop Distributions

The easiest way to get up and running with a single-machine configuration of Hadoop is to install one of the virtualized Hadoop distributions provided by the major Hadoop vendors. These include Cloudera’s Quickstart VM, Hortonworks Sandbox, or MapR’s sandbox for Hadoop. These virtual machines contain a single-node Hadoop cluster in addition to the popular Apache Hadoop ecosystem projects as well as proprietary applications and tools that are included in a simple turn-key bundle. You can use your preferred virtualization software such as VMWare Player or Virtualbox to run these VMs.

Self-Installation of Apache Hadoop Ecosystem Products

If you are not using a packaged distribution of Hadoop, but instead installing Apache Hadoop manually, then you will also need to manually install and configure the various ...

Get Data Analytics with Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.