In this appendix, we cover the details of installing the tools for the stack used in this book.
You can download the latest version of Hadoop from the Apache Hadoop downloads page. At the time of writing, the latest Hadoop was 2.7.3, but this will probably have changed by the time you’re reading this.
A recipe for a headless install of Hadoop is available in manual_install.sh. In addition to
downloading and unpackaging Hadoop, we also need to set up our Hadoop
environment variables (
HADOOP_CONF_DIR), and we need to put Hadoop’s
executables in our
PATH. First, set up
PROJECT_HOME variable to help find
the right paths. You will need to set this yourself by editing your
Now we can set up our environment directly. Here is the relevant section of manual_install.sh:
# May need to update this link... see http://hadoop.apache.org/releases.html curl -Lko /tmp/hadoop-2.7.3.tar.gz \ http://apache.osuosl.org/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz mkdir hadoop tar -xvf /tmp/hadoop-2.7.3.tar.gz -C hadoop --strip-components=1 echo '# Hadoop environment setup' >> ~/.bash_profile export HADOOP_HOME=$PROJECT_HOME/hadoop echo 'export HADOOP_HOME=$PROJECT_HOME/hadoop' >> ~/.bash_profile export PATH=$PATH:$HADOOP_HOME/bin echo 'export PATH=$PATH:$HADOOP_HOME/bin' >> ~/.bash_profile export HADOOP_CLASSPATH=$(hadoop classpath) echo 'export ...