You are previewing HBase Design Patterns.
O'Reilly logo
HBase Design Patterns

Book Description

Design and implement successful patterns to develop scalable applications with HBase

In Detail

With the increasing use of NoSQL in general and HBase in particular, knowing how to build practical applications depends on the application of design patterns. These patterns, distilled from extensive practical experience of multiple demanding projects, guarantee the correctness and scalability of the HBase application. They are also generally applicable to most NoSQL databases.

Starting with the basics, this book will show you how to install HBase in different node settings. You will then be introduced to key generation and management and the storage of large files in HBase. Moving on, this book will delve into the principles of using time-based data in HBase, and show you some cases on denormalization of data while working with HBase. Finally, you will learn how to translate the familiar SQL design practices into the NoSQL world. With this concise guide, you will get a better idea of typical storage patterns, application design templates, HBase explorer in multiple scenarios with minimum effort, and reading data from multiple region servers.

What You Will Learn

  • Install and configure a Hadoop cluster and HBase
  • Write Java code to read and write HBase
  • Explore Phoenix open source project to talk to HBase in SQL
  • Store single entities, generate keys, use lists, maps, and sets
  • Utilize UUID for generic key generation to store data and deal with large files
  • Use denormalization to optimize performance
  • Represent one-to-many and many-to-many relationships and deal with transactions
  • Troubleshoot and optimize your application
  • Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at If you purchased this book elsewhere, you can visit and register to have the files e-mailed directly to you.

    Table of Contents

    1. HBase Design Patterns
      1. Table of Contents
      2. HBase Design Patterns
      3. Credits
      4. About the Authors
      5. About the Reviewers
        1. Support files, eBooks, discount offers, and more
          1. Why subscribe?
          2. Free access for Packt account holders
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the example code
          2. Errata
          3. Piracy
          4. Questions
      8. 1. Starting Out with HBase
        1. Installing HBase
          1. Creating a single-node HBase cluster
          2. Creating a distributed HBase cluster
        2. Selecting an instance
          1. Spot instances
        3. Adding storage
        4. Security groups
        5. Starting the instance
        6. Summary
      9. 2. Reading, Writing, and Using SQL
        1. Inspecting the cluster
        2. HBase tables, families, and cells
        3. The HBase shell
        4. Project Phoenix — a SQL for HBase
          1. Installing Phoenix
        5. Summary
      10. 3. Using HBase Tables for Single Entities
        1. Storing user information
          1. A solution for storing user information
        2. Sets, maps, and lists
        3. Generating the test data
        4. Analyzing your query
        5. Exercise
          1. Solution
        6. Summary
      11. 4. Dealing with Large Files
        1. Storing files using keys
        2. Using UUID
        3. What to do when your binary files grow larger
          1. Using Google Blobstore to store large files
          2. Facebook's Haystack for the storage of large files
          3. Twitter solution to store large files
          4. Amazon S3 storage for very large objects
            1. A practical approach
          5. Practical recommendations
          6. A practical lab
        4. Exercises
        5. Summary
      12. 5. Time Series Data
        1. Using time-based keys to store time series data
        2. Avoiding region hotspotting
        3. Tall and narrow rows versus wide rows
        4. OpenTSDB principles
          1. The overall design of TSDB
          2. The row key
          3. The timestamp
          4. Compactions
          5. The UID table schema
        5. Summary
      13. 6. Denormalization Use Cases
        1. Storing all the objects for a user
        2. Dealing with lost usernames and passwords
          1. Generating data for performance testing
        3. Tables for storing videos
          1. Manual exercises
          2. Generating data for performance testing
        4. A popularity contest
        5. The section tag index
        6. Summary
      14. 7. Advanced Patterns for Data Modeling
        1. Many-to-many relationships in HBase
          1. Creating a many-to-many relationship for a university with students and courses
          2. Creating a many-to-many relationship for a social network
        2. Applying the many-to-many relationship techniques for a video site
        3. Event time data – keeping track of what is going on
        4. Dealing with transactions
        5. Trafodion – transactional SQL on HBase
        6. Summary
      15. 8. Performance Optimization
        1. Loading bulk data into HBase
        2. Importing data into HBase using MapReduce
        3. Importing data from HDFS into HBase
          1. Pig for MapReduce
          2. Java MapReduce
          3. Using HBase's bulk loader utility
            1. Staging data files into HDFS
              1. Creating an HBase table
              2. Run the import
          4. Bulk import scenarios
        4. Profiling HBase applications
          1. More tips for high-performing HBase writes
            1. Batch writes
            2. Setting memory buffers
            3. Turning off autofush
            4. Turning off WAL
          2. More tips for high-performing HBase reads
            1. The scan cache
            2. Only read the families or columns needed
            3. The block cache
        5. Benchmarking or load testing HBase
          1. HBase's built-in benchmark
          2. YCSB
          3. JMeter for custom workloads
        6. Monitoring HBase
          1. Ganglia
          2. OpenTSDB
          3. Collecting metrics via the JMX interface
        7. Summary
      16. Index