Learning HBase

Book description

Learn the fundamentals of HBase administration and development with the help of real-time scenarios

In Detail

Apache HBase is a nonrelational NoSQL database management system that runs on top of HDFS. It is an open source, distributed, versioned, column-oriented store. It facilitates the tech industry with random, real-time read/write access to your Big Data with the benefit of linear scalability on the fly.

This book will take you through a series of core tasks in HBase. The introductory chapter will give you all the information you need about the HBase ecosystem. Furthermore, you'll learn how to configure, create, verify, and test clusters. The book also explores different parameters of Hadoop and HBase that need to be considered for optimization and a trouble-free operation of the cluster. It will focus more on HBase's data model, storage, and structure layout. You will also get to know the different options that can be used to speed up the operation and functioning of HBase. The book will also teach the users basic- and advance-level coding in Java for HBase. By the end of the book, you will have learned how to use HBase with large data sets and integrate them with Hadoop.

What You Will Learn

  • Understand the fundamentals of HBase
  • Understand the prerequisites necessary to get started with HBase
  • Install and configure a new HBase cluster
  • Optimize an HBase cluster using different Hadoop and HBase parameters
  • Make clusters more reliable using different troubleshooting and maintenance techniques
  • Get to grips with the HBase data model and its operations
  • Get to know the benefits of using Hadoop tools/JARs for HBase

Table of contents

  1. Learning HBase
    1. Table of Contents
    2. Learning HBase
    3. Credits
    4. About the Author
    5. Acknowledgments
    6. About the Reviewers
    7. www.PacktPub.com
      1. Support files, eBooks, discount offers, and more
        1. Why subscribe?
        2. Free access for Packt account holders
    8. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Errata
        3. Piracy
        4. Questions
    9. 1. Understanding the HBase Ecosystem
      1. HBase layout on top of Hadoop
      2. Comparing architectural differences between RDBMs and HBase
      3. HBase features
      4. HBase in the Hadoop ecosystem
        1. Data representation in HBase
        2. Hadoop
          1. Core daemons of Hadoop
          2. Comparing HBase with Hadoop
      5. Comparing functional differences between RDBMs and HBase
        1. Logical view of row-oriented databases
        2. Logical view of column-oriented databases
          1. Pros and cons of column-oriented databases
      6. About the internal storage architecture of HBase
      7. Getting started with HBase
        1. When it started
        2. HBase components and functionalities
        3. ZooKeeper
          1. Why an odd number of ZooKeepers?
          2. HMaster
            1. If a master node goes down
          3. RegionServer
            1. Components of a RegionServer
          4. Client
          5. Catalog tables
        4. Who is using HBase and why?
        5. When should we think of using HBase?
        6. When not to use HBase
        7. Understanding some open source HBase tools
        8. The Hadoop-HBase version compatibility table
      8. Applications of HBase
      9. HBase pros and cons
      10. Summary
    10. 2. Let's Begin with HBase
      1. Understanding HBase components in detail
        1. HFile
        2. Region
        3. Scalability – understanding the scale up and scale out processes
          1. Scale in
          2. Scale out
      2. Reading and writing cycle
        1. Write-Ahead Logs
        2. MemStore
      3. HBase housekeeping
        1. Compaction
          1. Minor compaction
          2. Major compaction
        2. Region split
        3. Region assignment
        4. Region merge
        5. RegionServer failovers
      4. The HBase delete request
        1. The reading and writing cycle
      5. List of available HBase distributions
      6. Prerequisites and capacity planning for HBase
        1. The forward DNS resolution
        2. The reverse DNS resolution
          1. Java
        3. SSH
          1. Domain Name Server
          2. Using Network Time Protocol to keep your node on time
          3. OS-level changes and tuning up OS for HBase
      7. Summary
    11. 3. Let's Start Building It
      1. Downloading Java on Ubuntu
      2. Considering host configurations
        1. Host file based
        2. Command based
        3. File based
        4. DNS based
      3. Installing and configuring SSH
        1. Installing SSH on Ubuntu/Red Hat/CentOS
        2. Configuring SSH
      4. Installing and configuring NTP
      5. Performing capacity planning
      6. Installing and configuring Hadoop
        1. core-site.xml
        2. hdfs-site.xml
        3. yarn-site.xml
        4. mapred-site.xml
        5. hadoop-env.sh
        6. yarn-env.sh
          1. Slaves file
      7. Hadoop start up steps
      8. Configuring Apache HBase
        1. Configuring HBase in the standalone mode
        2. Configuring HBase in the distributed mode
          1. hbase-site.xml
          2. HBase-env.sh
          3. regionservers
      9. Installing and configuring ZooKeeper
      10. Installing Cloudera Hadoop and HBase
        1. Downloading the required RPM packages
        2. Installing Cloudera in an easier way
      11. Installing the Hadoop and MapReduce packages
      12. Installing Hadoop on Windows
      13. Summary
    12. 4. Optimizing the HBase/Hadoop Cluster
      1. Setup types for Hadoop and HBase clusters
      2. Recommendations for CDH cluster configuration
      3. Capacity planning
      4. Hadoop optimization
        1. General optimization tips
        2. Optimizing Java GC
        3. Optimizing Linux OS
        4. Optimizing the Hadoop parameter
        5. Optimizing MapReduce
          1. Rack awareness in Hadoop
          2. Number of Map and Reduce limits in configuration files
            1. Considering and deciding the maximum number of Map and Reduce tasks
      5. Optimizing HBase
        1. Hadoop
        2. Memory
        3. Java
        4. OS
        5. HBase
      6. Optimizing ZooKeeper
      7. Important files in Hadoop
      8. Important files in HBase
      9. Summary
    13. 5. The Storage, Structure Layout, and Data Model of HBase
      1. Data types in HBase
      2. Storing data in HBase – logical view versus actual physical view
        1. Namespace
          1. Commands available for namespaces
      3. Services of HBase
        1. Row key
        2. Column family
        3. Column
        4. Cell
        5. Version
        6. Timestamp
      4. Data model operations
        1. Get
        2. Put
        3. Scan
        4. Delete
      5. Versioning and why
      6. Deciding the number of the version
        1. Lower bound of versions
        2. Upper bound of versions
      7. Schema designing
        1. Types of table designs
        2. Benefits of Short Wide and Tall-Thin design patterns
        3. Composite key designing
          1. Real-time use case of schema in an HBase table
          2. Schema change operations
      8. Calculating the data size stored in HBase
      9. Summary
    14. 6. HBase Cluster Maintenance and Troubleshooting
      1. Hadoop shell commands
        1. Types of Hadoop shell commands
          1. Administration commands
          2. User commands
          3. File system-related commands
            1. Difference between copyToLocal/copyFromLocal and get/put
      2. HBase shell commands
      3. HBase administration tools
        1. hbck – HBase check
        2. HBase health check script
      4. Writing HBase shell scripts
      5. Using the Hadoop tool or JARs for HBase
      6. Connecting HBase with Hive
      7. HBase region management
        1. Compaction
        2. Merge
      8. HBase node management
        1. Commissioning
        2. Decommissioning
      9. Implementing security
        1. Secure access
          1. Requirement
        2. Kerberos KDC
        3. Client-side security configuration
          1. Client-side security configuration for thrift requests
        4. Server-side security configuration
        5. Simple security
          1. Server-side configuration
          2. Client-side configuration
        6. The tag security feature
        7. Access control in HBase
          1. Server-side access control
        8. Cell-level access using tags
        9. Configuring ZooKeeper for security
      10. Troubleshooting the most frequent HBase errors and their explanations
        1. What might fail in cluster
        2. Monitoring HBase health
          1. HBase web UI
            1. Master
            2. RegionServer
          2. ZooKeeper command line
          3. Linux tools
      11. Summary
    15. 7. Scripting in HBase
      1. HBase backup and restore techniques
        1. Offline backup / full-shutdown backup
          1. Backup
          2. Restore
        2. Online backup
          1. The HBase snapshot
            1. Online
            2. Offline
          2. The HBase replication method
            1. Setting up cluster replication
            2. Backup and restore using Export and Import commands
              1. Export
              2. Import
          3. Miscellaneous utilities
          4. CopyTable
          5. HTable API
          6. Backup using a Mozilla tool
      2. HBase on Windows
      3. Scripting in HBase
        1. The .irbrc file
        2. Getting the HBase timestamp from HBase shell
        3. Enabling debugging shell
        4. Enabling the debug level in HBase shell
        5. Enabling SQL in HBase
      4. Contributing to HBase
      5. Summary
    16. 8. Coding HBase in Java
      1. Setting up the environment for development
        1. Building a Java client to code in HBase
      2. Data types
      3. Data model Java operations
        1. Read
          1. Get()
            1. Constructors
            2. Supported methods
          2. Scan()
            1. Constructors
            2. Methods
        2. Write
          1. Put()
            1. Constructors
            2. Methods
        3. Modify
          1. Delete()
            1. Constructors
            2. Methods
      4. HBase filters
        1. Types of filters
      5. Client APIs
      6. Summary
    17. 9. Advance Coding in Java for HBase
      1. Interfaces, classes, and exceptions
      2. Code related to administrative tasks
      3. Data operation code
      4. MapReduce and HBase
      5. RESTful services and Thrift services interface
        1. REST service interfaces
        2. Thrift
      6. Coding for HDFS operations
      7. Some advance topics in brief
        1. Coprocessors
          1. Types of coprocessors
        2. Bloom filters
        3. The Lily project
          1. Features
      8. Summary
    18. 10. HBase Use Cases
      1. HBase in industry today
      2. The future of HBase against relational databases
      3. Some real-world project examples' use cases
        1. HBase at Facebook
          1. Choosing HBase
          2. Storing in HBase
          3. The architecture of a Facebook message
          4. Facts and figures
        2. HBase at Pinterest
          1. The layout architecture
        3. HBase at Groupon
          1. The layout architecture
        4. HBase at LongTail Video
          1. The layout architecture
        5. HBase at Aadhaar (UIDAI)
          1. The layout architecture
      4. Useful links and references
      5. Summary
    19. Index

Product information

  • Title: Learning HBase
  • Author(s): Shashwat Shriparv
  • Release date: November 2014
  • Publisher(s): Packt Publishing
  • ISBN: 9781783985944