Foreword

It has not been very long since the phrase “Hadoop security” was an oxymoron. Early versions of the big data platform, built and used at web companies like Yahoo! and Facebook, didn’t try very hard to protect the data they stored. They didn’t really have to—very little sensitive data went into Hadoop. Status updates and news stories aren’t attractive targets for bad guys. You don’t have to work that hard to lock them down.

As the platform has moved into more traditional enterprise use, though, it has begun to work with more traditional enterprise data. Financial transactions, personal bank account and tax information, medical records, and similar kinds of data are exactly what bad guys are after. Because Hadoop is now used in retail, banking, and healthcare applications, it has attracted the attention of thieves as well.

And if data is a juicy target, big data may be the biggest and juiciest of all. Hadoop collects more data from more places, and combines and analyzes it in more ways than any predecessor system, ever. It creates tremendous value in doing so.

Clearly, then, “Hadoop security” is a big deal.

This book, written by two of the people who’ve been instrumental in driving security into the platform, tells the story of Hadoop’s evolution from its early, wide open consumer Internet days to its current status as a trusted place for sensitive data. Ben and Joey review the history of Hadoop security, covering its advances and its evolution alongside new business problems. They cover topics like identity, encryption, key management and business practices, and discuss them in a real-world context.

It’s an interesting story. Hadoop today has come a long way from the software that Facebook chose for image storage a decade ago. It offers much more power, many more ways to process and analyze data, much more scale, and much better performance. Therefore it has more pieces that need to be secured, separately and in combination.

The best thing about this book, though, is that it doesn’t merely describe. It prescribes. It tells you, very clearly and with the detail that you expect from seasoned practitioners who have built Hadoop and used it, how to manage your big data securely. It gives you the very best advice available on how to analyze, process, and understand data using the state-of-the-art platform—and how to do so safely.

Get Hadoop Security now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.