Chapter 11. Accessing and Interacting with Clusters

It is our job as architects to ensure that users can take full advantage of the data and services hosted in the cluster. To do this, we need to guarantee that users (both humans and applications) can access the cluster services in a safe and secure way. In this chapter, we explore typical architectures for providing users access to cluster services and data while applying the authentication and authorization controls we encountered in Chapter 9.

First we look at the different ways in which a user might interact with the cluster, and then we explore how we can enable these through our cluster architecture and supporting technologies, like proxies and load balancers. After we have established the architecture, we take a look at user workbenches, such as Hue and Cloudera Data Science Workbench (CDSW). Finally, we look at the options for transferring files into and out of the cluster.

Access Mechanisms

Each component in the cluster provides one or more access mechanisms through which users can interact with it. These come in a few different varieties and should be pretty familiar to most practitioners. Table 11-1, at the end of this section, summarizes the access mechanisms supported by commonly used services.

Programmatic Access

Programmatic access mechanisms include the following:

APIs

Many of the components in a Hadoop cluster provide application programming interface (API) libraries to be used by user code, which abstract ...

Get Architecting Modern Data Platforms now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.