It is quite useful for advanced users (or those who are just plain adventurous) to fully comprehend how a system of their choice works behind the scenes. This chapter explains the various moving parts of HBase and how they work together.
Before we look into the architecture itself, however, we will first address a more fundamental difference between typical RDBMS storage structures and alternative ones. Specifically, we will look briefly at B-trees, or rather B+ trees, as they are commonly used in relational storage engines, and Log-Structured Merge Trees, which (to some extent) form the basis for Bigtable’s storage architecture, as discussed in Building Blocks.
Note that RDBMSes do not use B-tree-type structures exclusively, nor do all NoSQL solutions use different architectures. You will find a colorful variety of mix-and-match technologies, but with one common objective: use the best strategy for the problem at hand.
B+ trees have some specific features that allow for efficient insertion, lookup, and deletion of records that are identified by keys. They represent dynamic, multilevel indexes with lower and upper bounds as far as the number of keys in each segment (also called page) is concerned. Using these segments, they achieve a much higher fanout compared to binary trees, resulting in a much lower number of I/O operations to find a specific key.
In addition, they also enable you to do range scans very efficiently, ...