You are previewing Professional Hadoop Solutions.
O'Reilly logo
Professional Hadoop Solutions

Book Description

The go-to guidebook for deploying Big Data solutions with Hadoop

Today's enterprise architects need to understand how the Hadoop frameworks and APIs fit together, and how they can be integrated to deliver real-world solutions. This book is a practical, detailed guide to building and implementing those solutions, with code-level instruction in the popular Wrox tradition. It covers storing data with HDFS and Hbase, processing data with MapReduce, and automating data processing with Oozie. Hadoop security, running Hadoop with Amazon Web Services, best practices, and automating Hadoop processes in real time are also covered in depth.

With in-depth code examples in Java and XML and the latest on recent additions to the Hadoop ecosystem, this complete resource also covers the use of APIs, exposing their inner workings and allowing architects and developers to better leverage and customize them.

  • The ultimate guide for developers, designers, and architects who need to build and deploy Hadoop applications

  • Covers storing and processing data with various technologies, automating data processing, Hadoop security, and delivering real-time solutions

  • Includes detailed, real-world examples and code-level guidelines

  • Explains when, why, and how to use these tools effectively

  • Written by a team of Hadoop experts in the programmer-to-programmer Wrox style

Professional Hadoop Solutions is the reference enterprise architects and developers need to maximize the power of Hadoop.

Table of Contents

  1. Cover
  2. Contents
  3. Chapter 1: Big Data and the Hadoop Ecosystem
    1. Big Data Meets Hadoop
    2. The Hadoop Ecosystem
    3. Hadoop Core Components
    4. Hadoop Distributions
    5. Developing Enterprise Applications with Hadoop
    6. Summary
  4. Chapter 2: Storing Data in Hadoop
    1. HDFS
    2. HBase
    3. Combining HDFS and HBase for Effective Data Storage
    4. Using Apache Avro
    5. Managing Metadata with HCatalog
    6. Choosing an Appropriate Hadoop Data Organization for Your Applications
    7. Summary
  5. Chapter 3: Processing Your Data with MapReduce
    1. Getting to Know MapReduce
    2. Your First MapReduce Application
    3. Designing MapReduce Implementations
    4. Summary
  6. Chapter 4: Customizing MapReduce Execution
    1. Controlling MapReduce Execution with InputFormat
    2. Reading Data Your Way with Custom RecordReaders
    3. Organizing Output Data with Custom Output Formats
    4. Writing Data Your Way with Custom RecordWriters
    5. Optimizing Your MapReduce Execution with a Combiner
    6. Controlling Reducer Execution with Partitioners
    7. Using Non-Java Code with Hadoop
    8. Summary
  7. Chapter 5: Building Reliable MapReduce Apps
    1. Unit Testing MapReduce Applications
    2. Local Application Testing with Eclipse
    3. Using Logging for Hadoop Testing
    4. Reporting Metrics with Job Counters
    5. Defensive Programming in MapReduce
    6. Summary
  8. Chapter 6: Automating Data Processing with Oozie
    1. Getting to Know Oozie
    2. Oozie Workflow
    3. Oozie Coordinator
    4. Oozie Bundle
    5. Oozie Parameterization with Expression Language
    6. Oozie Job Execution Model
    7. Accessing Oozie
    8. Oozie SLA
    9. Summary
  9. Chapter 7: Using Oozie
    1. Validating Information about Places Using Probes
    2. Designing Place Validation Based on Probes
    3. Designing Oozie Workflows
    4. Implementing Oozie Workflow Applications
    5. Implementing Workflow Activities
    6. Implementing Oozie Coordinator Applications
    7. Implementing Oozie Bundle Applications
    8. Deploying, Testing, and Executing Oozie Applications
    9. Using the Oozie Console to Get Information about Oozie Applications
    10. Summary
  10. Chapter 8: Advanced Oozie Features
    1. Building Custom Oozie Workflow Actions
    2. Adding Dynamic Execution to Oozie Workflows
    3. Using the Oozie Java API
    4. Using Uber Jars with Oozie Applications
    5. Data Ingestion Conveyer
    6. Summary
  11. Chapter 9: Real-Time Hadoop
    1. Real-Time Applications in the Real World
    2. Using HBase for Implementing Real-Time Applications
    3. Using Specialized Real-Time Hadoop Query Systems
    4. Using Hadoop-Based Event-Processing Systems
    5. Summary
  12. Chapter 10: Hadoop Security
    1. A Brief History: Understanding Hadoop Security Challenges
    2. Authentication
    3. Authorization
    4. Oozie Authentication and Authorization
    5. Network Encryption
    6. Security Enhancements with Project Rhino
    7. Putting it All Together — Best Practices for Securing Hadoop
    8. Summary
  13. Chapter 11: Running Hadoop Applications on AWS
    1. Getting to Know AWS
    2. Options for Running Hadoop on AWS
    3. Understanding the EMR-Hadoop Relationship
    4. Using AWS S3
    5. Automating EMR Job Flow Creation and Job Execution
    6. Orchestrating Job Execution in EMR
    7. Summary
  14. Chapter 12: Building Enterprise Security Solutions for Hadoop Implementations
    1. Security Concerns for Enterprise Applications
    2. What Hadoop Security Doesn’t Natively Provide for Enterprise Applications
    3. Approaches for Securing Enterprise Applications Using Hadoop
    4. Summary
  15. Chapter 13: Hadoop’s Future
    1. Simplifying MapReduce Programming with DSLs
    2. Faster, More Scalable Processing
    3. Security Enhancements
    4. Emerging Trends
    5. Summary
  16. Appendix: Useful Reading
  17. Introduction
  18. Advertisements