This post assumes that you are familiar with HBase. If you don’t have a lot of experience with HBase, be sure to look at the HBase books available from Safari Books Online.
HBase is a massively scalable database capable of housing petabytes of data over hundreds of machines while handling millions of writes a second with sub-millisecond response times. HBase currently runs Facebook’s ‘Messages’ product, a large part of StumbleUpon and is used by EBay, Twitter and Yahoo (to name a few).
If you are planning on building to scale and part of your plan includes running HBase, you have to consider not only how to prevent disasters, but also how to test and iterate rapidly on new features. HBase’s replication can be a large part of this solution – enabling disaster recovery, testing, and dark launches.
Overview of HBase Replication
Internally, HBase – much like many other databases – uses a write-ahead log (WAL) per regionserver as a way to ensure data is not lost before it is synced to, in the optimal format on disk. The WAL is not meant to be used to serve reads or writes, it’s only there in case the server crashes before its in-memory state (the MemStore) can be written to HFiles (per column family, sorted files on HDFS). If the server does crash, then the WAL is replayed from the last edit stored in an HFiles up to the last edit in the WAL.
HBase replicates whole WAL edits (preserving atomicity) from the primary cluster to the slave (or ‘peer’) cluster via a simple push mechanism. The WALs for each regionserver are maintained in a highly available service – ZooKeeper – as a durable set of ‘replication queues’. Replication happens entirely asynchronously to the primary cluster, ensuring minimal impact on the primary cluster – the HBase servers aren’t dramatically affected and remain available, even if the peer clusters are down. If the peers are down, the names of the remaining WALs to replicate are saved in Zookeeper to be replicated later.
At first blush, HBase’s replication is pretty simple, but can be used for a variety of purposes. Some particularly useful features include circular replication (master-master is a special case of this), multi-cluster replication and a WAL Player. These utilities can then be leveraged to use replication for more than just disaster recovery. We can then have multiple clusters running at the same time, managing the same data, for different purposes. Further, we don’t need to worry as much about complicated availability, failover and timeliness issues because it’s all built into HBase.
Cross datacenter replication was one of the early features of HBase and is a minimum requirement for running HBase with 5 9’s of availability (99.999% uptime – 5.26 minutes downtime per year). Part of this means, that if your datacenter is a smoking hole in the ground, you need a way to ensure that most of the data from the primary data center can be recovered with minimal (no?) downtime.
Generally, HBase replication is no more than a couple of minutes behind – each regionserver has a thread dedicated to copying over data as soon as it is written – but this is not a hard and fast guarantee. If you can’t accept any data loss, then you will need to build your own custom, cross-datacenter database akin to Google’s Spanner to handle this case in a timely manner. However, most applications using HBase with replication prefer availability over blocking writes (i.e. everything anyone has written on HBase), though there is a slight chance that replication gets behind and you might lose data if the datacenter goes down at the just the wrong time.
Once we extract data from the primary cluster, we are free to play without fear of threatening production. Further, since replication happens in near real-time, we can setup one cluster as a ‘shadow’ deployment/’dark launch’ of a new version of HBase on a realistic workload.
What’s important to note here is replication makes edits to the cluster through the general HBase API – simple puts, deletes, mutations, etc. – which cause the slave cluster to produce its own WALs. You can save these logs (see Future File Cleaning Blog Post) and later use them as mock workloads for testing new clusters or versions of HBase. You can leverage the WALPlayer to reproducibly replay WAL edits into a test cluster.
The WAL Player is quite simple – it’s a MapReduce job that just filters edits for excluded tables and then passes them along to a target table. Here’s the important part:
Notably there is no special consideration for the timing of the edits, so you can’t use this for performance testing – just correctness testing. The details of how you could hack the WALPlayer to implement real performance testing are outside of the scope of this article, but conceptually you would use your own custom output format to include the timestamp and then modify the MultiTableOutputFormat to take these timestamps into consideration when writing into the table.
Your main HBase cluster will be specifically tuned for OLTP type workloads – optimized for low-latency reads/writes to a relatively small subset of the database. This will almost always conflict with the data warehouse jobs as well as MapReduce based backup jobs (Lars Hofhansl wrote a great post on this), both of which generally consume a lot of bandwidth in the hopes of optimizing throughput over latency. The peer cluster can be tuned to its own workload, rather than being torn between real-time responsiveness and bandwidth hungry jobs.
While HBase replication provides a lot of great opportunities, there are some issues that have to be considered before using the built-in replication. However, replication is currently in production at several companies, so these perils are considered fairly low priority.
The last major issue in with replication is HBASE-2611 and concerns a possible failure situation when failing over queues. When a regionserver dies, all the regionserver race to lock the dead server’s queue. The winner then copies the dead server’s queue to its own queue and happily preceeds to replicate the remaining edits. However, if the ‘winner’ also dies in the midst of this failover the queues from the original failure will never get copied over. This means the WAL edits from the original failure are never copied to the remote cluster, though the WALs will still be retained on the source cluster.
Timeliness Of Edits
Above I mentioned you could use replication for disaster recovery. However, if you have a hard requirement to never loose data after a certain time span – say, older than thirty minutes- HBase doesn’t have a good way to ensure replication time bounds; things get replicated when they get replicated.
If you need to ensure writes are always replicated, you can track the last replicated edit per regionserver from the exposed by the JMX metrics (see the replication metrics source code, particularly “source.ageOfLastShippedOp”) and compare that against the latest write time from your clients. It’s a lot of work to keep track of all the last edit times of all the clients and then coordinating them all and ensuring all the servers are coordinated. HBase throws up its hands and let you handle it, if you really, really need it (as mentioned above, this usually isn’t a problem).
Leave it on Or Catchup
If you ever turn off replication (or don’t have it on the from the start), you have two options (1) lose all the edits between when you turned it off and then back on again or (2) run CopyTable to copy over the state of _each table_. This works because edits to HBase are idempotent, so replaying edits to a table on the remote cluster can at worst only leave the table in the same state. Alternatively, you can run VerifyReplication to determine the rows that have not been copied over, but this will cause a large overhead on both clusters. There really is no workaround for this unless you are using you own using HLog retention polic (*File Cleaning POST LINK*) to ensure WALs are retained, which can then be replayed to the target cluster.
HBase replication can be used for a variety of purposes beyond the obvious disaster recovery – as an augment to your backup solution, as a way to test new features and versions of HBase, and for running your data warehouse operations. While there are some pitfalls with the current implementation, they are minor when compared to the gains.
How do you use replication? Have you found any other problems?
Safari Books Online has the content you need
Below are some HBase books to help you develop applications, or you can check out all of the HBase books and training videos available from Safari Books Online. You can browse the content in preview mode or you can gain access to more information with a free trial or subscription to Safari Books Online.
|If your organization is looking for a storage solution to accommodate a virtually endless amount of data, this book will show you how Apache HBase can fulfill your needs. As the open source implementation of Google’s BigTable architecture, HBase scales to billions of rows and millions of columns, while ensuring that write and read performance remain constant. HBase: The Definitive Guide provides the details you require to evaluate this high-performance, non-relational database, or put it into practice right away.|
|HBase Administration Cookbook provides practical examples and simple step-by-step instructions for you to administrate HBase with ease. The recipes cover a wide range of processes for managing a fully distributed, highly available HBase cluster on the cloud. Working with such a huge amount of data means that an organized and manageable process is key and this book will help you to achieve that.|
|Ready to unlock the power of your data? With Hadoop: The Definitive Guide, you’ll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. You will also find illuminating case studies that demonstrate how Hadoop is used to solve specific problems. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters.|
About this author
|Jesse Yates has been living and breathing distributed systems since college. He’s worked with Hadoop, HBase, Storm, and almost all the other Big Data buzz words too. In his free time he writes for his blog, rock climbs and runs marathons. He currently works as a software developer at Salesforce.com and is a committer on HBase.|