The HBase write path

HDFS is an append-only file system, so how could a database that supports random record updates be built on top of it?

HBase is what's called a log-structured merge tree, or an LSM, database. In an LSM database, data is stored within a multilevel storage hierarchy, with movement of data between levels happening in batches. Cassandra is another example of an LSM database.

When a write for a key is issued from the HBase client, the client looks up Zookeeper to get the location of the RegionServer that hosts the META region. It then queries the META region to find out a table's regions, their key ranges, and the RegionServers they are hosted on.

The client then makes an RPC call to the RegionServer that contains the key ...

Get Seven NoSQL Databases in a Week now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.