CHAPTER 6Hadoop Security

Given that Hadoop is used for storing and processing an organization's data, it is important to secure the Hadoop cluster. The security requirements vary depending on the sensitivity of data stored on the cluster. Some clusters are used to address a single use case with very few users (dedicated clusters). Some other clusters are general-purpose clusters used by many users belonging to different teams. The security requirements of a dedicated cluster are different from that of a shared cluster. In addition to storing lots of data for a long time, Hadoop accepts arbitrary programs from users, which are launched as independent Java processes on many machines in the cluster. If not properly constrained, these programs can create unwanted effects on the cluster, data, and programs run by other users.

When Hadoop was originally developed, the security features were limited, but over the years, many security features have been added. New features are being developed and existing features are being enhanced all of the time. In this chapter we will discuss various security features supported by Hadoop. We start with perimeter security to protect the network of the Hadoop cluster. We will go over the authentication mechanism supported by Hadoop to identify the user. Once a user is properly identified, the authorization rules specify ...

Get Professional Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.