You are previewing HBase: The Definitive Guide, 2nd Edition.
O'Reilly logo
HBase: The Definitive Guide, 2nd Edition

Book Description

If you’re looking for a scalable storage solution to accommodate a virtually endless amount of data, this updated edition shows you how Apache HBase can meet your needs. Modeled after Google’s BigTable architecture, HBase scales to billions of rows and millions of columns, while ensuring that write and read performance remain constant.

Fully revised for HBase 1.0, this second edition brings you up to speed on the new HBase client API, as well as security features and new case studies that demonstrate HBase use in the real world. Whether you just started to evaluate this non-relational database, or plan to put it into practice right away, this book has your back.

Table of Contents

  1. Foreword: Michael Stack
    1. For First Revision (2011)
    2. For Second Revision (2015)
  2. Foreword: Carter Page
  3. Preface
    1. General Information
      1. HBase Version
    2. What is in this Book?
    3. Target Audience
    4. What is New in the Second Edition?
    5. Conventions Used in This Book
    6. Using Code Examples
    7. Safari® Books Online
    8. How to Contact Us
    9. Acknowledgments
  4. 1. Introduction
    1. The Dawn of Big Data
    2. The Problem with Relational Database Systems
    3. Nonrelational Database Systems, Not-Only SQL or NoSQL?
      1. Dimensions
      2. Scalability
      3. Database (De-)Normalization
    4. Building Blocks
      1. Backdrop
      2. Namespaces, Tables, Rows, Columns, and Cells
      3. Auto-Sharding
      4. Storage API
      5. Implementation
      6. Summary
    5. HBase: The Hadoop Database
      1. History
      2. Nomenclature
      3. Summary
  5. 2. Installation
    1. Quick-Start Guide
    2. Requirements
      1. Hardware
      2. Software
    3. Filesystems for HBase
      1. Local
      2. HDFS
      3. S3
      4. Other Filesystems
    4. Installation Choices
      1. Apache Binary Release
      2. Building from Source
    5. Run Modes
      1. Standalone Mode
      2. Distributed Mode
    6. Configuration
      1. hbase-site.xml and hbase-default.xml
      2. hbase-env.sh and hbase-env.cmd
      3. regionserver
      4. log4j.properties
      5. Example Configuration
      6. Client Configuration
    7. Deployment
      1. Script-Based
      2. Apache Whirr
      3. Puppet and Chef
    8. Operating a Cluster
      1. Running and Confirming Your Installation
      2. Web-based UI Introduction
      3. Shell Introduction
      4. Stopping the Cluster
  6. 3. Client API: The Basics
    1. General Notes
    2. Data Types and Hierarchy
      1. Generic Attributes
      2. Operations: Fingerprint and ID
      3. Query versus Mutation
      4. Durability, Consistency, and Isolation
      5. The Cell
      6. API Building Blocks
    3. CRUD Operations
      1. Put Method
      2. Get Method
      3. Delete Method
      4. Append Method
      5. Mutate Method
    4. Batch Operations
    5. Scans
      1. Introduction
      2. The ResultScanner Class
      3. Scanner Caching
      4. Scanner Batching
      5. Slicing Rows
      6. Load Column Families on Demand
      7. Scanner Metrics
    6. Miscellaneous Features
      1. The Table Utility Methods
      2. The Bytes Class
  7. 4. Client API: Advanced Features
    1. Filters
      1. Introduction to Filters
      2. Comparison Filters
      3. Dedicated Filters
      4. Decorating Filters
      5. FilterList
      6. Custom Filters
      7. Filter Parser Utility
      8. Filters Summary
    2. Counters
      1. Introduction to Counters
      2. Single Counters
      3. Multiple Counters
    3. Coprocessors
      1. Introduction to Coprocessors
      2. The Coprocessor Class Trinity
      3. Coprocessor Loading
      4. Endpoints
      5. Observers
      6. The ObserverContext Class
      7. The RegionObserver Class
      8. The MasterObserver Class
      9. The RegionServerObserver Class
      10. The WALObserver Class
      11. The BulkLoadObserver Class
      12. The EndPointObserver Class
  8. 5. Client API: Administrative Features
    1. Schema Definition
      1. Namespaces
      2. Tables
      3. Table Properties
      4. Column Families
    2. Cluster Administration
      1. Basic Operations
      2. Namespace Operations
      3. Table Operations
      4. Schema Operations
      5. Cluster Operations
      6. Cluster Status Information
    3. ReplicationAdmin
  9. 6. Available Clients
    1. Introduction
      1. Gateways
      2. Frameworks
    2. Gateway Clients
      1. Native Java
      2. REST
      3. Thrift
      4. Thrift2
      5. SQL over NoSQL
    3. Framework Clients
      1. MapReduce
      2. Hive
      3. Pig
      4. Cascading
      5. Other Clients
    4. Shell
      1. Basics
      2. Commands
      3. Scripting
    5. Web-based UI
      1. Master UI Status Page
      2. Master UI Related Pages
      3. Region Server UI Status Page
      4. Shared Pages
  10. 7. Hadoop Integration
    1. Framework
      1. MapReduce Introduction
      2. Processing Classes
      3. Supporting Classes
      4. MapReduce Locality
      5. Table Splits
    2. MapReduce over Tables
      1. Preparation
      2. Table as a Data Sink
      3. Table as a Data Source
      4. Table as both Data Source and Sink
      5. Custom Processing
    3. MapReduce over Snapshots
    4. Bulk Loading Data
  11. 8. Advanced Usage
    1. Key Design
      1. Concepts
      2. Tall-Narrow Versus Flat-Wide Tables
      3. Partial Key Scans
      4. Pagination
      5. Time Series Data
      6. Time-Ordered Relations
      7. Aging-out Regions
      8. Application-driven Replicas
    2. Advanced Schemas
    3. Secondary Indexes
    4. Search Integration
    5. Transactions
      1. Region-local Transactions
    6. Versioning
      1. Implicit Versioning
      2. Custom Versioning
  12. 9. Cluster Monitoring
    1. Introduction
    2. The Metrics Framework
      1. Metrics Building Blocks
      2. Configuration
      3. Metrics UI
      4. Master Metrics
      5. Region Server Metrics
      6. RPC Metrics
      7. UserGroupInformation Metrics
      8. JVM Metrics
    3. Ganglia
      1. Installation
      2. Usage
    4. JMX
      1. JConsole
      2. JMX Remote API
    5. Nagios
    6. OpenTSDB
  13. 10. Performance Tuning
    1. Heap Tuning
      1. Java Heap Sizing
      2. Tuning Heap Shares
    2. Garbage Collection Tuning
      1. Introduction
      2. Concurrent Mark Sweep (CMS)
      3. Garbage First (G1)
      4. Garbage Collection Information
    3. Memstore-Local Allocation Buffer
    4. HDFS Read Tuning
      1. Short-Circuit Reads
      2. Hedged Reads
    5. Block Cache Tuning
      1. Introduction
      2. Cache Types
      3. Single vs. Multi-level Caching
      4. Basic Cache Configuration
      5. Advanced Cache Configuration
      6. Cache Selection
    6. Compression
      1. Available Codecs
      2. Verifying Installation
      3. Enabling Compression
    7. Key Encoding
      1. Available Codecs
      2. Enabling Key Encoding
    8. Bloom Filters
    9. Region Split Handling
      1. Number of Regions
      2. Managed Splitting
      3. Region Hotspotting
      4. Presplitting Regions
    10. Merging Regions
      1. Online: Merge with API and Shell
      2. Offline: Merge Tool
    11. Region Ergonomics
    12. Compaction Tuning
      1. Compaction Settings
      2. Compaction Throttling
    13. Region Flush Tuning
    14. RPC Tuning
      1. RPC Scheduling
      2. Slow Query Logging
    15. Load Balancing
    16. Client API: Best Practices
    17. Configuration
    18. Load Tests
      1. Performance Evaluation
      2. Load Test Tool
      3. YCSB
  14. A. Upgrade from Previous Releases
    1. Upgrading to HBase 0.90.x
      1. From 0.20.x or 0.89.x
      2. Within 0.90.x
    2. Upgrading to HBase 0.92.0
    3. Upgrading to HBase 0.98.x
    4. Migrate API to HBase 1.0.x
      1. Migrate Coprocessors to post HBase 0.96
      2. Migrate Custom Filters to post HBase 0.96