Chapter 15. Patterns for Cluster Usage

Eventually you, or your organization, will be at the point where the use of clusters running in your cloud provider is no longer just for research or proof-of-concept work. The important questions now change from whether it is a good idea at all to how best to take advantage of the clusters:

  • When should clusters be created and how long should they last?

  • Who should be able to use them?

  • How should they be created?

  • How much work should be sent to the cloud?

Every organization has different answers to these questions, but knowing that there are choices to be made helps you formulate the plan to get from experimentation to regular use of cloud clusters.

Long-Running or Transient?

Of all the questions, the one that tends to come up the earliest and has the most effect on the answers for others is the question of when. When do you create clusters running in your cloud provider, and relatedly, how long should those clusters be available?

There are two dominant answers to the question. The first, which is most like the way that on-prem clusters are used, is that clusters should be set up in advance and tended so that they are always available for anyone to use. Administrators monitor them and resolve problems as they arise, perhaps even increasing or decreasing their sizes or adjusting the mix of service components in response to demand. Meanwhile, users coordinate to work on them, each sharing the storage and computation facilities with ...

Get Moving Hadoop to the Cloud now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.