Chapter 14. Basics of Virtualization for Hadoop

In this chapter, we assess virtualization technologies on a basic level. Although virtualized IT infrastructure scales well when stacking individual small to medium-sized applications, scaling virtual compute clusters and distributed systems requires special attention.

We begin with compute virtualization, which means running virtual machines (VMs) in a hypervisor, such as KVM or VMware. This is the most basic and well-defined building block in virtualized infrastructure. (In addition to virtualization on hypervisors, containerization is an emerging and relevant technology for enterprises; we cover it in Chapter 15.)

Even more important to our discussion of Hadoop in the cloud is the subject of storage virtualization, which we look at next. This means abstracting storage devices into containers that are centrally hosted in remote storage arrays based on storage area network (SAN) or object storage technology.

The third layer of virtualization to consider is network virtualization, also referred to as Software Defined Networks (SDN). As we will see, your choice of virtualization mechanisms will drive the lifecycle model of your clusters in the cloud.

We will cover all of these subjects in this chapter.

Get Architecting Modern Data Platforms now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.