Posted on by & filed under Content - Highlights and Reviews, Information Technology, Programming & Development, Web Development.

HBase has two special tables that most users will never have to touch, “-ROOT-” and “.META.”. .META. holds the location of the regions of all the tables. -ROOT- then holds the location of .META. And because the regionserver holding -ROOT- can crash we won’t always know where -ROOT- lives, so we store the location of -ROOT- in a znode in a ZooKeeper.

In the original BigTable paper, .META. could span multiple regions, allowing a very large cluster. However, in practice HBase cluster sizes never grow beyond one region’s worth of meta information, so the .META. region never splits. This means the -ROOT- region is really just one row pointing to .META.

Basic Layout

The -ROOT- table will always look something like:

Row Column Family Column Qualifier Value
.META.,,1 info regioninfo NAME =>
‘.META.,,1’,
STARTKEY => ”,
ENDKEY => ”,
ENCODED =>
1028785192
server hbase-server:54236
serverstartcode 1351700754213
v \x00\x01

.META. keeps track of the current location of each region, for each table, along with the some information about the region like its name, its HRegionInfo, and the server info. A row in .META. corresponds to a single region. Lets suppose that we created a table ‘t’; .META. would look something like:

Row Column Family Column Qualifier Value
t,,1351700811858.
04a79dcbbc
info regioninfo NAME =>
‘t,,1351700811858.
90a3b2353709773ebc2423.
04a79dcbbc90a3b23
53709773ebc2423.’,
STARTKEY => ”,
ENDKEY => ”,
ENCODED => 04a79
dcbbc90a3b23537
09773ebc2423,
server 10.7.73.121:64782
serverstartcode 1351986939360

The information stored in each cell is pretty self-explanatory:

  • regioninfo is the serialized version of the HRegionInfo
  • server is the server’s host:port
  • serverstartcode is the time the server was started

The more interesting thing is actually the rowkey, which is actually a composite key of the:

(1) name of the table,
(2) start key of the region,
(3) the regionid – usually just the timestamp the region was created
(4) the hash of (1),(2), and (3)

Together, these ensure that regions with earlier start keys will always sort before regions with later start keys (remember, in HBase keys are lexiographically sorted).

NOTE: This description of -ROOT- and .META. is applicable only to the currently released versions of HBase (up to 0.94.x). There is discussion in the community to rewrite much of .META., but it remains unclear as to when people will have time to actually do the work.

Looking up a region

The HTable handles the lookup for the correct region when doing a put. As a simple straw man, we could just scan down .META. from the name of the table we are looking for until we find a row that corresponds to our table. Actually, as we scan down .META. what we are looking for is the row before the key would insert*.

For example, suppose we have one table, with three regions, with start keys: null (first region), bar, and foo. .META.’s row keys then look something like:

If we try to insert the key ‘baz’ into the table ‘table’, the region hosting our row then corresponds to the second row (table,bar,1351700819876.18e79dcbbc90a3b2353705632adb1111) in .META.

However, if we keep doing this scan each time, this will cause a LOT of overhead as we scan all the rows for a table; this overhead only worsens as tables get larger and span more regions. Instead, we can do something smart and do a point scan for a key that looks like:

and then find the matching row or the previous one.

* This led to a lot of code in HBase for the currently deprecated HTable#getRowOrBefore code, that really only works correctly on .META. (hopefully this is going away in HBASE-2600).

Anatomy of a split

Eventually, as you keep adding data to a table the region will need split, so HBase can continue to scale linearly. The data from the ‘parent’ region is split between its ‘child’ regions, after which HBase redirects requests to the new regions.

Before we can bring the child regions online we need to do two things:

(1) offline the parent region
(2) add the child regions to the parent info (giving us provenance)

The first is handled via an update to the HRegionInfo serialized under the info:regioninfo column. The latter is handled by adding two columns – info:splitA (with the ‘top’ child’s HRegionInfo) and info:splitB (with the ‘bottom’ child’s HRegionInfo). This ensures we have some way to track what happened to a region, where it came from, and to later clean up old regions via the CatalogJanitor.

Updating .META.

After we update the parent, we can insert the new children into .META.. The bottom child has the lower half of the region (you can provide your own custom split policy, but that’s an advanced feature; see the HBase Guide for more info), so its start key will naturally sort _after_ its parent region.

The ‘top’ child however will have the same start key as its parent region. This is where the regionid in the row key comes into play – if we didn’t have it, the top child would have the same row key as the parent, making it very hard to disentangle which is which! Since we use the timestamp of the region as the regionid (or timestamp +1 if it’s the same as the parent), we can always guarantee that the child region will sort to a later row than the parent.

The other important thing that must occur is the ‘bottom’ child must be inserted into .META. before the top child. Otherwise, we can see a ‘hole’ in .META. when looking for the place to insert a key. Suppose the middle row in our above example split on the row ‘helloworld’. If we inserted the ‘top’ child first, we will have a state that looks like:

When we get to the last row in .META. we know that the previous row is the region hosting the key ‘help!’. However, because of the split, this is no longer the case, but we don’t know where to go! When we insert the ‘bottom’ child first, we ensure that we find the correct region:

Recovering from crashes

Generally, HBase is good about recovering from a server failure, but sometimes it can get wonky. If a regionserver crashes in the middle of a split – or any number of other things do sideways – you may get a hole in .META., a malformed region on disk, overlapping regions, or any number of issues. In this case, your first recourse is to use hbck – a tool built after many hours spent by engineers at Cloudera fixing just these types of problems – to try and fix your issue. For more information on using hbck, run:

for more information on your options for repair. Beyond that, you are left to fix things by hand and hopefully the information above will help solve your issue.

How would you design .META. and -ROOT- from scratch? What kind of problems have you seen on your cluster? Anything that hbck couldn’t fix?

Safari Books Online has the content you need

Below are some HBase books to help you develop applications, or you can check out all of the HBase books and training videos available from Safari Books Online. You can browse the content in preview mode or you can gain access to more information with a free trial or subscription to Safari Books Online.

If your organization is looking for a storage solution to accommodate a virtually endless amount of data, this book will show you how Apache HBase can fulfill your needs. As the open source implementation of Google’s BigTable architecture, HBase scales to billions of rows and millions of columns, while ensuring that write and read performance remain constant. HBase: The Definitive Guide provides the details you require to evaluate this high-performance, non-relational database, or put it into practice right away.
HBase Administration Cookbook provides practical examples and simple step-by-step instructions for you to administrate HBase with ease. The recipes cover a wide range of processes for managing a fully distributed, highly available HBase cluster on the cloud. Working with such a huge amount of data means that an organized and manageable process is key and this book will help you to achieve that.
Ready to unlock the power of your data? With Hadoop: The Definitive Guide, you’ll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. You will also find illuminating case studies that demonstrate how Hadoop is used to solve specific problems. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters.

Start your FREE 10-day trial to Safari Books Online

About this author

Jesse Yates has been living and breathing distributed systems since college. He’s worked with Hadoop, HBase, Storm, and almost all the other Big Data buzz words too. In his free time he writes for his blog, rock climbs and runs marathons. He currently works as a software developer at Salesforce.com and is a committer on HBase.

Tags: .META, distributed systems, Hadoop, HBase, HTable, scalable databases, zookeeper,

Comments are closed.