HBase architecture in a nutshell

  • The HBase cluster is comprised of one active master and one or more backup master servers
  • The cluster has multiple RegionServers
  • The HBase table is always large and rows are divided into partitions/shards called regions
  • Each RegionServer hosts one or many regions
  • The HBase catalog is known as META table, which stores the locations of table regions
  • ZooKeeper stores the locations of the META table
  • During a write, the client sends the put request to the HRegionServer
  • Data is written to WAL
  • Then data is pushed into MemStore and an acknowledgement is sent to the client
  • Once enough data is accumulated in MemStore, it flushes data to the Hfile on HDFS
  • The HBase compaction process activates periodically to merge ...

Get Modern Big Data Processing with Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.