Chapter 6. Identity, Authentication, and Authorization

Multitenancy is a fancy word for supporting many independent entities within a single larger system. The ability to support multiple discreet entities in a system is generally useful when it is costly or complex to operate an instance of that system for each entity. An example of this is the ability to run multiple databases within a single database server, a feature supported by almost all RDBMS vendors. By bringing multiple users of a service together, we can take advantage of the economies of scale and offer greater service as a whole. A simple example of this in the context of Hadoop is that, if a large cluster is built to run hourly production MapReduce jobs, there are generally free resources between executions of those jobs. If we were to silo users by group or use case, these lulls in resource usage would be lost across all groups. Instead, it often (but admittedly not always) makes sense to combine smaller silo clusters into a single large cluster. Not only does this simplify operations, but it increases the available capacity to service consumers, on average, which improves system resource utilization.

Unfortunately, running multitenant systems comes with some obvious challenges. Controlling access to data and resources immediately becomes a point of concern, especially when the data is sensitive in nature. It may be that two different groups of users should not see each other’s data or even not know one another exist. ...

Get Hadoop Operations now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.