Preface

Apache Hadoop is still a relatively young technology, but that has not limited its rapid adoption and the explosion of tools that make up the vast ecosystem around it. This is certainly an exciting time for Hadoop users. While the opportunity to add value to an organization has never been greater, Hadoop still provides a lot of challenges to those responsible for securing access to data and ensuring that systems respect relevant policies and regulations. There exists a wealth of information available to developers building solutions with Hadoop and administrators seeking to deploy and operate it. However, guidance on how to design and implement a secure Hadoop deployment has been lacking.

This book provides in-depth information about the many security features available in Hadoop and organizes it using common computer security concepts. It begins with introductory material in the first chapter, followed by material organized into four larger parts: Part I, Security Architecture; Part II, Authentication, Authorization, and Accounting; Part III, Data Security; and Part IV, PUtting It All Together. These parts cover the early stages of designing a physical and logical security architecture all the way through implementing common security access controls and protecting data. Finally, the book wraps up with use cases that gather many of the concepts covered in the book into real-world examples.

Audience

This book targets Hadoop administrators charged with securing their big data platform and established security architects who need to design and integrate a Hadoop security plan within a larger enterprise architecture. It presents many Hadoop security concepts including authentication, authorization, accounting, encryption, and system architecture.

Chapter 1 includes an overview of some of the security concepts used throughout this book, as well as a brief description of the Hadoop ecosystem. If you are new to Hadoop, we encourage you to review Hadoop Operations and Hadoop: The Definitive Guide as needed. We assume that you are familiar with Linux, computer networks, and general system architecture. For administrators who do not have experience with securing distributed systems, we provide an overview in Chapter 2. Practiced security architects might want to skip that chapter unless they’re looking for a review. In general, we don’t assume that you have a programming background, and try to focus on the architectural and operational aspects of implementing Hadoop security.

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates new terms, URLs, email addresses, filenames, and file extensions.

Constant width

Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.

Constant width bold

Shows commands or other text that should be typed literally by the user.

Constant width italic

Shows text that should be replaced with user-supplied values or by values determined by context.

Tip

This element signifies a tip or suggestion.

Note

This element signifies a general note.

Warning

This element indicates a warning or caution.

Using Code Examples

Throughout this book, we provide examples of configuration files to help guide you in securing your own Hadoop environment. A downloadable version of some of those examples is available at https://github.com/hadoop-security/examples. In Chapter 13, we provide a complete example of designing, implementing, and deploying a web interface for saving snapshots of web pages. The complete source code for the example, along with instructions for securely configuring a Hadoop cluster for deployment of the application, is available for download at GitHub.

This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.

We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Hadoop Security by Ben Spivey and Joey Echeverria (O’Reilly). Copyright 2015 Ben Spivey and Joey Echeverria, 978-1-491-90098-7.”

If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at .

Safari® Books Online

Note

Safari Books Online is an on-demand digital library that delivers expert content in both book and video form from the world’s leading authors in technology and business.

Technology professionals, software developers, web designers, and business and creative professionals use Safari Books Online as their primary resource for research, problem solving, learning, and certification training.

Safari Books Online offers a range of plans and pricing for enterprise, government, education, and individuals.

Members have access to thousands of books, training videos, and prepublication manuscripts in one fully searchable database from publishers like O’Reilly Media, Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit Press, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, Course Technology, and hundreds more. For more information about Safari Books Online, please visit us online.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

  • O’Reilly Media, Inc.
  • 1005 Gravenstein Highway North
  • Sebastopol, CA 95472
  • 800-998-9938 (in the United States or Canada)
  • 707-829-0515 (international or local)
  • 707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://bit.ly/hadoop-security.

To comment or ask technical questions about this book, send email to .

For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.

Find us on Facebook: http://facebook.com/oreilly

Follow us on Twitter: http://twitter.com/oreillymedia

Watch us on YouTube: http://www.youtube.com/oreillymedia

Acknowledgments

Ben and Joey would like to thank the following people who have made this book possible: our editor, Marie Beaugureau, and all of the O’Reilly Media staff; Ann Spencer; Eddie Garcia for his guest chapter contribution; our primary technical reviewers, Patrick Angeles, Brian Burton, Sean Busbey, Mubashir Kazia, and Alex Moundalexis; Jarek Jarcec Cecho; fellow authors Eric Sammer, Lars George, and Tom White for their valuable insight; and the folks at Cloudera for their collective support to us and all other authors.

From Joey

I would like to dedicate this book to Maria Antonia Fernandez, Jose Fernandez, and Sarah Echeverria, three people that inspired me every day and taught me that I could achieve anything I set out to achieve. I also want to thank my parents, Maria and Fred Echeverria, and my brothers and sisters, Fred, Marietta, Angeline, and Paul Echeverria, and Victoria Schandevel, for their love and support throughout this process. I couldn’t have done this without the incredible support of the Apache Hadoop community. I couldn’t possibly list everybody that has made an impact, but you need look no further than Ben’s list for a great start. Lastly, I’d like to thank my coauthor, Ben. This is quite a thing we’ve done, Bennie (you’re welcome, Paul).

From Ben

I would like to dedicate this book to the loving memory of Ginny Venable and Rob Trosinski, two people that I miss dearly. I would like to thank my wife, Theresa, for her endless support and understanding, and Oliver Morton for always making me smile. To my parents, Rich and Linda, thank you for always showing me the value of education and setting the example of professional excellence. Thanks to Matt, Jess, Noah, and the rest of the Spivey family; Mary, Jarrod, and Dolly Trosinski; the Swope family; and the following people that have helped me greatly along the way: Hemal Kanani (BOOM), Ted Malaska, Eric Driscoll, Paul Beduhn, Kari Neidigh, Jeremy Beard, Jeff Shmain, Marlo Carrillo, Joe Prosser, Jeff Holoman, Kevin O’Dell, Jean-Marc Spaggiari, Madhu Ganta, Linden Hillenbrand, Adam Smieszny, Benjamin Vera-Tudela, Prashant Sharma, Sekou Mckissick, Melissa Hueman, Adam Taylor, Kaufman Ng, Steve Ross, Prateek Rungta, Steve Totman, Ryan Blue, Susan Greslik, Todd Grayson, Woody Christy, Vini Varadharajan, Prasad Mujumdar, Aaron Myers, Phil Langdale, Phil Zeyliger, Brock Noland, Michael Ridley, Ryan Geno, Brian Schrameck, Michael Katzenellenbogen, Don Brown, Barry Hurry, Skip Smith, Sarah Stanger, Jason Hogue, Joe Wilcox, Allen Hsiao, Jason Trost, Greg Bednarski, Ray Scott, Mike Wilson, Doug Gardner, Peter Guerra, Josh Sullivan, Christine Mallick, Rick Whitford, Kurt Lorenz, Jason Nowlin, and Chuck Wigelsworth. Last but not least, thanks to Joey for giving in to my pleading to help write this book—I never could have done this alone! For those that I have inadvertently forgotten, please accept my sincere apologies.

From Eddie

I would like to thank my family and friends for their support and encouragement on my first book writing experience. Thank you, Sandra, Kassy, Sammy, Ally, Ben, Joey, Mark, and Peter.

Disclaimer

Thank you for reading this book. While the authors of this book have made every attempt to explain, document, and recommend different security features in the Hadoop ecosystem, there is no warranty expressed or implied that using any of these features will result in a fully secured cluster. From a security point of view, no information system is 100% secure, regardless of the mechanisms used to protect it. We encourage a constant security review process for your Hadoop environment to ensure the best possible security stance. The authors of this book and O’Reilly Media are not responsible for any damage that might or might not have come as a result of using any of the features described in this book. Use at your own risk.

Get Hadoop Security now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.