Posted on by & filed under Content - Highlights and Reviews, Programming & Development.

With the increasing popularity of Big Data and new demands to store and process large amounts of data, newer technologies are rapidly being developed to meet the needs of developers. There are a number of new database technologies that have emerged recently, which are branded under the umbrella term of NoSQL, which enable databases to achieve unprecedented levels of scalability by forgoing the rigid structure of traditional SQL databases and adopting a more flexible data model. It would not be unfair to say that this NoSQL movement was ignited by the pioneering work of Google engineers in the form of Google BigTable atop of the Google FileSystem. This work is detailed in the paper BigTable: A Distributed Storage System for Structured Data. Since Bigtable is proprietary, it was not available for use for developers in general; not until Apache HBase was born.

HBase is an open source, NoSQL database that is modeled after Google’s BigTable implementation. HBase allows a large amount of data to be stored and processed in the form of very large tables, comprising of billions of rows of data. HBase achieves scalability by distributing the data over a cluster of nodes, and it achieves this distribution by running on top of Apache HDFS (Hadoop Distributed FileSystem) and Hadoop itself. In this article, we will discuss getting started with HBase and briefly cover using the HBase shell to store and retrieve data from the HBase database.

Getting Started

First we need to download the HBase bundle from Apache Download Mirrors. Select the latest stable version for download and save it to a local directory. Next, we need to extract the contents of the bundle into a directory (assuming a tarball has been downloaded):

Since HBase is written in Java, it requires a Java Runtime Environment (JRE) of version 1.6 or later. To check the installation of Java, open a console (or command prompt) and type:

If the command runs and the output gives the right version, then we are good to go. Otherwise you need to install the latest JRE before proceeding. The installation procedure for JRE is beyond the scope of this article, however read 17B-1: Installing the Java Runtime Environment for details on installing JRE on Ubuntu in Ubuntu Made Easy.

Now, we need to open an instance of HBase to begin storing data. To do this, open a console and type:

Next, we need to open the HBase shell. To do this, we type in the console:

If the command runs successfully, this will open the HBase shell. Type ‘help’ and press <ENTER> to view the list of commands available.

Storing and Retrieving Data

As mentioned before, NoSQL databases do not use the rigid data model prevalent in SQL databases, but instead adopt more flexible models for storing data. HBase is no exception and the model it exposes is based on storing data in very large tables, comprising of millions of rows. Each row is composed of multiple columns and each of those columns belong to a column family. Data is stored in table cells, which are the intersection of rows and columns, and it is represented in the form of uninterpreted array of bytes (that is, no type information is stored). Each table cell is also timestamped.

There are no JOINs between tables, and data is stored within each row in such a way that related data is stored in the columns themselves. The reason for choosing such a data model is because in traditional SQL databases the dependencies between tables means that a complex query needs to fetch data from multiple machines before the results can be returned. This adds an overhead which increases as the number of machines increase thereby limiting the extent to which such databases can be scaled. HBase suffers from no such issues; since a row is self-contained and each row is completely stored on one machine, so results can be fetched and returned quickly.

A detailed discussion on modeling data for Apache HBase is beyond the scope of this article and will be covered in a future blog post. For the time being, we will consider a simple table ‘blogs’ with a single column family ‘post’, for use in the following examples.

First, we create a table ‘blogs’ by typing in our HBase shell:

The create command creates the table with one column family ‘post’. The list command is used to ensure that the table has been created successfully.

In order to store and retrieve data, we use the put and get commands provided by the shell, respectively:

The commands above create 3 rows in our ‘blogs’ table. Each row has a key (row1, row2 and row3) and one column ‘post:title’. In HBase, the column names are specified as column family name followed by a colon(: ) followed by the column key. Hence, ‘post’ is the column family name and ‘title’ is the column key.

To check that the rows have actually been created we can use the scan command.

This will output the rows we just created. To retrieve a row, we can use the get command of the HBase shell:

This will output the first row we stored earlier. Once we are done using the shell, we can close it by typing:


This post provided a 10,000 ft. overview of Apache HBase covering only its basic installation and shell usage. There are a lot of good books available that cover HBase in much more detail and provide guidance for advanced usage (see below). These should be consulted for more in-depth coverage of HBase.

Safari Books Online has the content you need

Check out these Hbase books available from Safari Books Online:

HBase: The Definitive Guide shows you how Apache HBase can fulfill your needs. As the open source implementation of Google’s BigTable architecture, HBase scales to billions of rows and millions of columns, while ensuring that write and read performance remain constant. This book provides the details you require to evaluate this high-performance, non-relational database, or put it into practice right away.
HBase Administration Cookbook provides practical examples and simple step-by-step instructions for you to administrate HBase with ease. The recipes cover a wide range of processes for managing a fully distributed, highly available HBase cluster on the cloud. Working with such a huge amount of data means that an organized and manageable process is key and this book will help you to achieve that.
HBase in Action has all of the knowledge you need to design, build, and run applications using HBase. First, it introduces you to the fundamentals of distributed systems and large scale data handling. Then, you’ll explore real-world applications and code samples with just enough theory to understand the practical techniques. You’ll see how to build applications with HBase and take advantage of the MapReduce processing framework. And along the way you’ll learn patterns and best practices.

About the author

Shaneeb Kamran is a Computer Engineer from one of the leading universities of Pakistan. His programming journey started at the age of 12 and ever since he has dabbled himself in every new and shiny software technology he could get his hands on. He is currently involved in a startup that is working on cloud computing products.

Tags: Apache Hadoop, Apache HBase, Apache HDFS, BigTable, JRE, NoSQL,

One Response to “HBase – An OpenSource BigTable Database”

  1. Marty McEnroe

    The article contains:

    put ‘blogs’, ‘row1’, ‘post:title’, ‘HBase – An open source Bigtable database’
    put ‘blogs’, ‘row1’, ‘post:title’, ‘Data modeling for HBase’
    put ‘blogs’, ‘row1’, ‘post:title’, ‘Configuring and Managing HBase’

    Is it supposed to be?

    put ‘blogs’, ‘row1’, ‘post:title’, ‘HBase – An open source Bigtable database’
    put ‘blogs’, ‘row2’, ‘post:title’, ‘Data modeling for HBase’
    put ‘blogs’, ‘row3’, ‘post:title’, ‘Configuring and Managing HBase’