You are previewing Hadoop Operations and Cluster Management Cookbook.
O'Reilly logo
Hadoop Operations and Cluster Management Cookbook

Book Description

Over 60 recipes showing you how to design, configure, manage, monitor, and tune a Hadoop cluster

  • Hands-on recipes to configure a Hadoop cluster from bare metal hardware nodes

  • Practical and in depth explanation of cluster management commands

  • Easy-to-understand recipes for securing and monitoring a Hadoop cluster, and design considerations

  • Recipes showing you how to tune the performance of a Hadoop cluster

  • Learn how to build a Hadoop cluster in the cloud

In Detail

We are facing an avalanche of data. The unstructured data we gather can contain many insights that could hold the key to business success or failure. Harnessing the ability to analyze and process this data with Hadoop is one of the most highly sought after skills in today's job market. Hadoop, by combining the computing and storage powers of a large number of commodity machines, solves this problem in an elegant way!

Hadoop Operations and Cluster Management Cookbook is a practical and hands-on guide for designing and managing a Hadoop cluster. It will help you understand how Hadoop works and guide you through cluster management tasks.

This book explains real-world, big data problems and the features of Hadoop that enables it to handle such problems. It breaks down the mystery of a Hadoop cluster and will guide you through a number of clear, practical recipes that will help you to manage a Hadoop cluster.

We will start by installing and configuring a Hadoop cluster, while explaining hardware selection and networking considerations. We will also cover the topic of securing a Hadoop cluster with Kerberos, configuring cluster high availability and monitoring a cluster. And if you want to know how to build a Hadoop cluster on the Amazon EC2 cloud, then this is a book for you.

Table of Contents

  1. Hadoop Operations and Cluster Management Cookbook
    1. Table of Contents
    2. Hadoop Operations and Cluster Management Cookbook
    3. Credits
    4. About the Author
    5. About the Reviewers
    6. www.PacktPub.com
      1. Support files, eBooks, discount offers and more
        1. Why Subscribe?
        2. Free Access for Packt account holders
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Errata
        2. Piracy
        3. Questions
    8. 1. Big Data and Hadoop
      1. Introduction
      2. Defining a Big Data problem
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. See also
      3. Building a Hadoop-based Big Data platform
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more...
          1. Hadoop common
          2. Apache HBase
          3. Apache Mahout
          4. Apache Pig
          5. Apache Hive
          6. Apache ZooKeeper
          7. Apache Oozie
          8. Apache Sqoop
          9. Apache Flume
          10. Apache Avro
      4. Choosing from Hadoop alternatives
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more...
          1. MPI
          2. HPCC
    9. 2. Preparing for Hadoop Installation
      1. Introduction
      2. Choosing hardware for cluster nodes
        1. How to do it...
        2. How it works...
        3. See also
      3. Designing the cluster network
        1. How to do it...
        2. How it works...
      4. Configuring the cluster administrator machine
        1. Getting ready
        2. How to do it...
        3. See also
      5. Creating the kickstart file and boot media
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      6. Installing the Linux operating system
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
          1. Configuring DHCP for network booting
          2. Configuring TFTP for network booting:
      7. Installing Java and other tools
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
      8. Configuring SSH
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
          1. Erroneous SSH settings
          2. Erroneous iptables configuration
          3. Erroneous SELinux configuration
        5. See also
    10. 3. Configuring a Hadoop Cluster
      1. Introduction
      2. Choosing a Hadoop version
        1. Getting ready
        2. How to do it...
        3. See also
      3. Configuring Hadoop in pseudo-distributed mode
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      4. Configuring Hadoop in fully-distributed mode
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      5. Validating Hadoop installation
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
          1. Can't start HDFS daemons
          2. Cluster is missing slave nodes
          3. MapReduce daemons can't be started
        5. See also
      6. Configuring ZooKeeper
        1. Getting ready
        2. How to do it...
        3. See also
      7. Installing HBase
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. See also
      8. Installing Hive
        1. Getting ready
        2. How to do it...
        3. See also
      9. Installing Pig
        1. Getting ready
        2. How to do it...
        3. See also
      10. Installing Mahout
        1. Getting ready
        2. How to do it...
        3. See also
    11. 4. Managing a Hadoop Cluster
      1. Introduction
      2. Managing the HDFS cluster
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
        5. See also
      3. Configuring SecondaryNameNode
        1. Getting ready
        2. How to do it...
        3. There's more...
        4. See also
      4. Managing the MapReduce cluster
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. See also
      5. Managing TaskTracker
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. See also
      6. Decommissioning DataNode
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. See also
      7. Replacing a slave node
        1. Getting ready
        2. How to do it...
        3. See also
      8. Managing MapReduce jobs
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
          1. More job management commands
          2. Managing tasks
          3. Managing jobs through the web UI
        5. See also
      9. Checking job history from the web UI
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. See also
      10. Importing data to HDFS
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      11. Manipulating files on HDFS
        1. Getting ready
        2. How to do it...
        3. How it works…
      12. Configuring the HDFS quota
        1. Getting ready
        2. How to do it...
        3. How it works…
      13. Configuring CapacityScheduler
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      14. Configuring Fair Scheduler
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. See also
      15. Configuring Hadoop daemon logging
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
          1. Configuring Hadoop logging with hadoop-env.sh
          2. Configuring Hadoop security logging
          3. Hadoop logging file naming conventions
        5. See also
      16. Configuring Hadoop audit logging
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. See also
      17. Upgrading Hadoop
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. See also
    12. 5. Hardening a Hadoop Cluster
      1. Introduction
      2. Configuring service-level authentication
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      3. Configuring job authorization with ACL
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. See also
      4. Securing a Hadoop cluster with Kerberos
        1. Getting ready
        2. How to do it...
        3. See also
      5. Configuring web UI authentication
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      6. Recovering from NameNode failure
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
          1. NameNode resilience with multiple hard drives
          2. Recovering NameNode from the checkpoint of a SecondaryNameNode
        5. See also
      7. Configuring NameNode high availability
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      8. Configuring HDFS federation
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
          1. Decommissioning a NameNode from the cluster
          2. Running balancer
          3. Adding a new NameNode
        5. See also
    13. 6. Monitoring a Hadoop Cluster
      1. Introduction
      2. Monitoring a Hadoop cluster with JMX
        1. Getting ready
        2. How to do it...
        3. See also
      3. Monitoring a Hadoop cluster with Ganglia
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. See also
      4. Monitoring a Hadoop cluster with Nagios
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. See also
      5. Monitoring a Hadoop cluster with Ambari
        1. Getting ready
        2. How to do it...
        3. See also
      6. Monitoring a Hadoop cluster with Chukwa
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There’s more...
        5. See also
    14. 7. Tuning a Hadoop Cluster for Best Performance
      1. Introduction
      2. Benchmarking and profiling a Hadoop cluster
        1. Getting ready
        2. How to do it...
        3. How it works…
        4. There's more...
        5. See also
      3. Analyzing job history with Rumen
        1. Getting ready
        2. How to do it...
        3. See also
      4. Benchmarking a Hadoop cluster with GridMix
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
          1. Benchmarking Hadoop cluster with GridMix1
          2. Benchmarking Hadoop cluster with GridMix3
        5. See also
      5. Using Hadoop Vaidya to identify performance problems
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      6. Balancing data blocks for a Hadoop cluster
        1. Getting ready
        2. How to do it...
        3. How it works…
      7. Choosing a proper block size
        1. Getting ready
        2. How to do it...
      8. Using compression for input and output
        1. Getting ready
        2. How to do it...
        3. How it works...
      9. Configuring speculative execution
        1. Getting ready
        2. How to do it...
        3. How it works...
      10. Setting proper number of map and reduce slots for the TaskTracker
        1. Getting ready
        2. How to do it...
      11. Tuning the JobTracker configuration
        1. Getting ready
        2. How to do it...
        3. How it works…
        4. See also
      12. Tuning the TaskTracker configuration
        1. Getting ready
        2. How to do it...
        3. How it works…
        4. See also
      13. Tuning shuffle, merge, and sort parameters
        1. Getting ready
        2. How to do it...
        3. How it works…
        4. See also
      14. Configuring memory for a Hadoop cluster
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. See also
      15. Setting proper number of parallel copies
        1. Getting ready
        2. How to do it...
        3. See also
      16. Tuning JVM parameters
        1. Getting ready
        2. How to do it...
        3. See also
      17. Configuring JVM Reuse
        1. Getting ready
        2. How to do it...
        3. See also
      18. Configuring the reducer initialization time
        1. Getting ready
        2. How to do it...
        3. See also
    15. 8. Building a Hadoop Cluster with Amazon EC2 and S3
      1. Introduction
      2. Registering with Amazon Web Services (AWS)
        1. Getting ready
        2. How to do it...
        3. See also
      3. Managing AWS security credentials
        1. Getting ready
        2. How to do it...
        3. How it works...
      4. Preparing a local machine for EC2 connection
        1. Getting ready
        2. How to do it...
      5. Creating an Amazon Machine Image (AMI)
        1. Getting ready
        2. How to do it...
        3. There's more...
          1. Creating an AMI from an existing AMI
          2. Creating an EBS-backed AMI
        4. See also
      6. Using S3 to host data
        1. Getting ready
        2. How to do it...
      7. Configuring a Hadoop cluster with the new AMI
        1. Getting ready
        2. How to do it...
        3. There's more...
          1. Data processing with Amazon Elastic MapReduce
        4. See also
    16. Index