You are previewing Apache Hadoop™ YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop™ 2.
O'Reilly logo
Apache Hadoop™ YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop™ 2

Book Description

“This book is a critically needed resource for the newly released Apache Hadoop 2.0, highlighting YARN as the significant breakthrough that broadens Hadoop beyond the MapReduce paradigm.”
—From the Foreword by Raymie Stata, CEO of Altiscale

The Insider’s Guide to Building Distributed, Big Data Applications with Apache Hadoop™ YARN

Apache Hadoop is helping drive the Big Data revolution. Now, its data processing has been completely overhauled: Apache Hadoop YARN provides resource management at data center scale and easier ways to create distributed applications that process petabytes of data. And now in Apache Hadoop™ YARN, two Hadoop technical leaders show you how to develop new applications and adapt existing code to fully leverage these revolutionary advances.

YARN project founder Arun Murthy and project lead Vinod Kumar Vavilapalli demonstrate how YARN increases scalability and cluster utilization, enables new programming models and services, and opens new options beyond Java and batch processing. They walk you through the entire YARN project lifecycle, from installation through deployment.

You’ll find many examples drawn from the authors’ cutting-edge experience—first as Hadoop’s earliest developers and implementers at Yahoo! and now as Hortonworks developers moving the platform forward and helping customers succeed with it.

Coverage includes

  • YARN’s goals, design, architecture, and components—how it expands the Apache Hadoop ecosystem

  • Exploring YARN on a single node 

  • Administering YARN clusters and Capacity Scheduler 

  • Running existing MapReduce applications 

  • Developing a large-scale clustered YARN application 

  • Discovering new open source frameworks that run under YARN

  • Table of Contents

    1. About This eBook
    2. Title Page
    3. Copyright Page
    4. Contents
    5. Foreword by Raymie Stata
    6. Foreword by Paul Dix
    7. Preface
      1. Focus of the Book
      2. Book Structure
      3. Book Conventions
      4. Additional Content and Accompanying Code
    8. Acknowledgments
    9. About the Authors
    10. 1. Apache Hadoop YARN: A Brief History and Rationale
      1. Introduction
      2. Apache Hadoop
      3. Phase 0: The Era of Ad Hoc Clusters
      4. Phase 1: Hadoop on Demand
      5. Phase 2: Dawn of the Shared Compute Clusters
      6. Phase 3: Emergence of YARN
      7. Conclusion
    11. 2. Apache Hadoop YARN Install Quick Start
      1. Getting Started
      2. Steps to Configure a Single-Node YARN Cluster
      3. Run Sample MapReduce Examples
      4. Wrap-up
    12. 3. Apache Hadoop YARN Core Concepts
      1. Beyond MapReduce
      2. Apache Hadoop MapReduce
      3. Apache Hadoop YARN
      4. YARN Components
      5. Wrap-up
    13. 4. Functional Overview of YARN Components
      1. Architecture Overview
      2. ResourceManager
      3. YARN Scheduling Components
      4. Containers
      5. NodeManager
      6. ApplicationMaster
      7. YARN Resource Model
      8. Managing Application Dependencies
      9. Wrap-up
    14. 5. Installing Apache Hadoop YARN
      1. The Basics
      2. System Preparation
      3. Script-based Installation of Hadoop 2
      4. Script-based Uninstall
      5. Configuration File Processing
      6. Configuration File Settings
      7. Start-up Scripts
      8. Installing Hadoop with Apache Ambari
      9. Wrap-up
    15. 6. Apache Hadoop YARN Administration
      1. Script-based Configuration
      2. Monitoring Cluster Health: Nagios
      3. Real-time Monitoring: Ganglia
      4. Administration with Ambari
      5. JVM Analysis
      6. Basic YARN Administration
      7. Wrap-up
    16. 7. Apache Hadoop YARN Architecture Guide
      1. Overview
      2. ResourceManager
      3. NodeManager
      4. ApplicationMaster
      5. YARN Containers
      6. Summary for Application-writers
      7. Wrap-up
    17. 8. Capacity Scheduler in YARN
      1. Introduction to the Capacity Scheduler
      2. Capacity Scheduler Configuration
      3. Queues
      4. Hierarchical Queues
      5. Queue Access Control
      6. Capacity Management with Queues
      7. User Limits
      8. Reservations
      9. State of the Queues
      10. Limits on Applications
      11. User Interface
      12. Wrap-up
    18. 9. MapReduce with Apache Hadoop YARN
      1. Running Hadoop YARN MapReduce Examples
      2. MapReduce Compatibility
      3. The MapReduce ApplicationMaster
      4. Calculating the Capacity of a Node
      5. Changes to the Shuffle Service
      6. Running Existing Hadoop Version 1 Applications
      7. Running MapReduce Version 1 Existing Code
      8. Advanced Features
      9. Wrap-up
    19. 10. Apache Hadoop YARN Application Example
      1. The YARN Client
      2. The ApplicationMaster
      3. Wrap-up
    20. 11. Using Apache Hadoop YARN Distributed-Shell
      1. Using the YARN Distributed-Shell
      2. Internals of the Distributed-Shell
      3. Wrap-up
    21. 12. Apache Hadoop YARN Frameworks
      1. Distributed-Shell
      2. Hadoop MapReduce
      3. Apache Tez
      4. Apache Giraph
      5. Hoya: HBase on YARN
      6. Dryad on YARN
      7. Apache Spark
      8. Apache Storm
      9. REEF: Retainable Evaluator Execution Framework
      10. Hamster: Hadoop and MPI on the Same Cluster
      11. Wrap-up
    22. A. Supplemental Content and Code Downloads
      1. Available Downloads
    23. B. YARN Installation Scripts
    24. C. YARN Administration Scripts
    25. D. Nagios Modules
    26. E. Resources and Additional Information
    27. F. HDFS Quick Reference
      1. Quick Command Reference
    28. Index