Chapter 6. Authorization

In “Authentication”, we saw how the various Hadoop ecosystem projects support strong authentication to ensure that users are who they claim to be. However, authentication is only part of the overall security story—you also need a way to model which actions or data an authenticated user can access. The protection of resources in this manner is called authorization and is probably one of the most complex topics related to Hadoop security. Each service is relatively unique in the services it provides, and thus the authorization model it supports. The sections in this chapter are divided into subsections based on how each service implements authorization.

We start by looking at HDFS and its support for POSIX-style file permissions, as well as its support for service-level authorization to restrict user access to specific HDFS functions. Next, we turn our attention to MapReduce and YARN, which support a similar style of service-level authorization as well as a queue-based model controlling access to system resources. In the case of MapReduce and YARN, authorization is useful for both security and resource management/multitenancy (for more information on resource management, we recommend Hadoop Operations by Eric Sammer [O’Reilly]). Finally, we cover the authorization features of the popular BigTable clones, Apache HBase and Apache Accumulo, including a discussion of the pros and cons of role-based and attribute-based security as well as a discussion ...

Get Hadoop Security now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.