Chapter 10

Hadoop Security

WHAT’S IN THIS CHAPTER?

  • Understanding Hadoop security challenges and history
  • Understanding authentication
  • Understanding authorization
  • Getting to know network encryption
  • Getting to know Hadoop ecosystem security
  • Taking a look at upcoming changes and enhancements with Project Rhino
  • Reviewing best practices for securing Hadoop

One of the biggest growing concerns in today’s Big Data environments revolves around information security. Specifically, organizations must be able to meet access control restrictions, confidentiality rules, privacy restrictions, and may need to support legal mandates related to the use and protection of their data and their analysis of large data sets. Because Hadoop was designed for formatting large amounts of unstructured data on commodity servers in an environment of de facto trust, security was never a driver for its design or development.

Over the past five years, many organizations using Hadoop have been challenged to meet stricter security requirements. As Hadoop’s popularity has increased, its security architecture has been exposed to intense scrutiny by security professionals. At the same time, in this era of Big Data, researchers are documenting challenges related to privacy and access control related to the processing of large data sets. These concerns have challenged the Hadoop community to introduce security mechanisms for satisfying requirements for authentication, access control, and privacy. Security mechanisms are ...

Get Professional Hadoop Solutions now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.