Foreword

Throughout my 25 years in the software industry, I have experienced many disruptive changes: the Internet, the World Wide Web, mainframes, the client–server model, and more. I once worked on a team that was implementing software to make oil refineries safer. Our 40-person team shared a single DEC VAX—a machine no more powerful than the cell phone I use today.

I still remember a day in the early 90s when we were scheduled to receive new machines. The machines being replaced were located on the third floor, and were large, heavy, washing machine–sized behemoths. A bunch of us waited inside the “machine room” to see how they would heave the new machines up all those flights of stairs. We imagined a giant crane, the street outside being blocked off…in short, a big operation!

But what actually happened was quite different. A man entered the room carrying a small box under his arm. He placed it on top of one of the old “washing machines,” switched some cables around, did some tests, and left. That was it? Wow. Things change!

That is the joy of being part of the tech industry: if we are willing to learn new things and move with it, we will never be bored, and will never cease to be amazed. What seemed impossible just a few years ago is suddenly commonplace.

Big data is such a change. Big data is everywhere. The revolution started by Google with the Google File System and BigTable has now reached almost every tech company, bank, government, and startup. Directly or indirectly, for better or for worse, these systems touch the lives of almost every human being on the planet.

Apache HBase, like BigTable before it, has a unique place in this ecosystem: it provides an updatable view into essentially unlimited datasets stored in an immutable, distributed filesystem. As such, it bridges the gap between pure file storage and OLTP/OLAP databases.

HBase is everywhere: Facebook, Apple, Salesforce.com, Adobe, Yahoo!, Bloomberg, Huawei, The Gap, and many other companies use it. Google adopted the HBase API for its public cloud BigTable offering, a testament to the popularity of HBase.

Despite its ubiquity, HBase is not plug and play. Distributed systems are hard. Terms such as partition tolerance, consistency, and availability inevitably creep into every discussion, soon followed by even more esoteric terms such as hotspotting and salting. Scaling to hundreds or thousands of machines requires painful trade-offs, and these trade-offs make it harder to use these systems optimally. HBase is no exception.

In my years in the HBase and Hadoop communities, I have experienced these challenges firsthand. Use cases must be designed and architected carefully in order to play to the strengths of HBase.

This book, written by two insiders who have been on the ground supporting customers, is a much needed guide that details how to architect applications that will work well with HBase and that will scale to thousands of machines.

If you are building or are planning to build new applications that need highly scalable and reliable storage, this book is for you. Jean-Marc and Kevin have seen it all: the use cases, the mistakes people make, the assumptions from single-server systems that no longer hold. But most importantly, they know what works well, how to fix what doesn’t, and how to make sense of it all

Get Architecting HBase Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.