Chapter 16

Deploying Hadoop

In This Chapter

arrow Examining the components that comprise a Hadoop cluster

arrow Designing the Hadoop cluster components

arrow Reviewing Hadoop deployment form factors

arrow Sizing a Hadoop cluster

At its core, Hadoop is a system for storing and processing data at a massive scale using a cluster of many individual compute nodes. In this chapter, we describe the tasks involved in building a Hadoop cluster, all the way from the hardware components in the compute nodes to different cluster configuration patterns, to how to appropriately size clusters. In at least one way, Hadoop is no different from many other IT systems: If you don’t design your cluster to match your business requirements, you get bad results.

Working with Hadoop Cluster Components

While you’re getting your feet wet with Hadoop, you’re likely to limit yourself to using a pseudo-distributed cluster running in a virtual machine on a personal computer. Though this environment is a good one for testing and learning, it’s obviously inappropriate for production-level performance and scalability. In this section, ...

Get Hadoop For Dummies now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.