You are previewing Architecting HBase Applications.
O'Reilly logo
Architecting HBase Applications

Book Description

Lots of HBase books, online HBase guides, and HBase mailing lists/forums are available if you need to know how HBase works. But if you want to take a deep dive into use cases, features, and troubleshooting, Architecting HBase Applications is the right source for you.

With this book, you’ll learn a controlled set of APIs that coincide with use-case examples and easily deployed use-case models, as well as sizing/best practices to help jump start your enterprise application development and deployment.

Table of Contents

  1. Foreword
  2. Preface
    1. Who Should Read This Book?
    2. How This Book Is Organized
    3. Additional Resources
    4. Conventions Used in This Book
    5. Using Code Examples
    6. Safari® Books Online
    7. How to Contact Us
    8. Acknowledgments
      1. From Kevin
      2. From Jean-Marc
  3. I. Introduction to HBase
  4. 1. What Is HBase?
    1. Column-Oriented Versus Row-Oriented
    2. Implementation and Use Cases
  5. 2. HBase Principles
    1. Table Format
      1. Table Layout
      2. Table Storage
    2. Internal Table Operations
      1. Compaction
      2. Splits (Auto-Sharding)
      3. Balancing
    3. Dependencies
    4. HBase Roles
      1. Master Server
      2. RegionServer
      3. Thrift Server
      4. REST Server
  6. 3. HBase Ecosystem
    1. Monitoring Tools
      1. Cloudera Manager
      2. Apache Ambari
      3. Hannibal
    2. SQL
      1. Apache Phoenix
      2. Apache Trafodion
      3. Splice Machine
      4. Honorable Mentions (Kylin, Themis, Tephra, Hive, and Impala)
    3. Frameworks
      1. OpenTSDB
      2. Kite
      3. HappyBase
      4. AsyncHBase
  7. 4. HBase Sizing and Tuning Overview
    1. Hardware
    2. Storage
    3. Networking
    4. OS Tuning
    5. Hadoop Tuning
    6. HBase Tuning
    7. Different Workload Tuning
  8. 5. Environment Setup
    1. System Requirements
      1. Operating System
      2. Virtual Machine
      3. Resources
      4. Java
    2. HBase Standalone Installation
    3. HBase in a VM
    4. Local Versus VM
      1. Local Mode
      2. Virtual Linux Environment
      3. QuickStart VM (or Equivalent)
    5. Troubleshooting
      1. IP/Name Configuration
      2. Access to the /tmp Folder
      3. Environment Variables
      4. Available Memory
    6. First Steps
      1. Basic Operations
      2. Import Code Examples
      3. Testing the Examples
    7. Pseudodistributed and Fully Distributed
  9. II. Use Cases
  10. 6. Use Case: HBase as a System of Record
    1. Ingest/Pre-Processing
    2. Processing/Serving
    3. User Experience
  11. 7. Implementation of an Underlying Storage Engine
    1. Table Design
      1. Table Schema
      2. Table Parameters
      3. Implementation
    2. Data conversion
      1. Generate Test Data
      2. Create Avro Schema
      3. Implement MapReduce Transformation
    3. HFile Validation
    4. Bulk Loading
    5. Data Validation
      1. Table Size
      2. File Content
    6. Data Indexing
    7. Data Retrieval
    8. Going Further
  12. 8. Use Case: Near Real-Time Event Processing
    1. Ingest/Pre-Processing
    2. Near Real-Time Event Processing
    3. Processing/Serving
  13. 9. Implementation of Near Real-Time Event Processing
    1. Application Flow
      1. Kafka
      2. Flume
      3. HBase
      4. Lily
      5. Solr
    2. Implementation
      1. Data Generation
      2. Kafka
      3. Flume
      4. Serializer
      5. HBase
      6. Lily
      7. Solr
      8. Testing
    3. Going Further
  14. 10. Use Case: HBase as a Master Data Management Tool
    1. Ingest
    2. Processing
  15. 11. Implementation of HBase as a Master Data Management Tool
    1. MapReduce Versus Spark
    2. Get Spark Interacting with HBase
      1. Run Spark over an HBase Table
      2. Calling HBase from Spark
    3. Implementing Spark with HBase
      1. Spark and HBase: Puts
      2. Spark on HBase: Bulk Load
      3. Spark Over HBase
    4. Going Further
  16. 12. Use Case: Document Store
    1. Serving
    2. Ingest
    3. Clean Up
  17. 13. Implementation of Document Store
    1. MOBs
      1. Storage
      2. Usage
      3. Too Big
    2. Consistency
    3. Going Further
  18. III. Troubleshooting
  19. 14. Too Many Regions
    1. Consequences
    2. Causes
      1. Misconfiguration
      2. Misoperation
    3. Solution
      1. Before 0.98
      2. Starting with 0.98
    4. Prevention
      1. Regions Size
      2. Key and Table Design
  20. 15. Too Many Column Families
    1. Consequences
      1. Memory
      2. Compactions
      3. Split
    2. Causes, Solution, and Prevention
      1. Delete a Column Family
      2. Merge a Column Family
      3. Separate a Column Family into a New Table
  21. 16. Hotspotting
    1. Consequences
    2. Causes
      1. Monotonically Incrementing Keys
      2. Poorly Distributed Keys
      3. Small Reference Tables
      4. Applications Issues
      5. Meta Region Hotspotting
    3. Prevention and Solution
  22. 17. Timeouts and Garbage Collection
    1. Consequences
    2. Causes
      1. Storage Failure
      2. Power-Saving Features
      3. Network Failure
    3. Solutions
    4. Prevention
      1. Reduce Heap Size
      2. Off-Heap BlockCache
      3. Using the G1GC Algorithm
      4. Configure Swappiness to 0 or 1
      5. Disable Environment-Friendly Features
      6. Hardware Duplication
  23. 18. HBCK and Inconsistencies
    1. HBase Filesystem Layout
    2. Reading META
    3. Reading HBase on HDFS
    4. General HBCK Overview
    5. Using HBCK
  24. Index