Server virtualization, which involves running multiple virtual servers on a single physical machine, is ubiquitous and is one of the key foundational layers of modern cloud computing architectures. Virtualization lets a physical server’s resources such as CPU, RAM and storage to be shared among several virtual servers. Virtualization helps substantially lower your costs of supporting a complex computing environment, besides speeding up deployments, as virtual machines can be spun up in a fraction of the time it takes to order, receive and configure physical servers.
In this chapter, I discuss the basic architecture of server virtualization first, and follow it up by explaining the concept of a hypervisor, which is the key piece of software that serves as a resource allocation and hardware abstraction layer between the physical server and the virtual servers you create on the physical server. I also explain the different types of virtualization such as full and paravirtualization.
This chapter isn’t limited to traditional hardware virtualization. Container virtualization is relatively new and is quite different from hardware virtualization. Unlike hardware virtualization, container virtualization doesn’t mimic a physical server with its own OS and resources. This type of virtualization is all about enabling applications to execute in a common OS kernel. There’s no need for a separate OS for each application, and therefore the containers are lightweight, and thus impose a lower overhead compared to hardware virtualization.
This chapter introduces the Linux Containers technology. Linux containers keep applications together with their runtime components by combining application isolation and image-based deployment strategies. By packaging the applications with their libraries and dependencies such as the required binaries, containers make the applications autonomous. This frees up the applications from their dependence on various components of the underlying operating system.
The fact that containers don’t include an OS kernel means that they’re faster and much more agile than VMs (virtual machines). The big difference is that all containers on a host must use the same OS kernel. The chapter delves into the Linux technology that makes possible containers – namely, namespaces, Linux Control Groups (Cgroups) and SELinux. Chapter 5 continues the discussion of containers, and is dedicated to container virtualization including Docker, currently the most popular way to containerize applications.
Linux server virtualization is the running of one or more virtual machines on a physical server that’s running the Linux operating system. Normally a server runs a single operating system (OS) at a time. As a result, application vendors had to rewrite portions of their applications so they’d work on various types of operating systems. Obviously, this is a costly process in terms of time and effort.
Hardware virtualization, which lets a single server run multiple operating systems, became a great solution for this problem. Servers running virtualization software are able to host applications that run on different operating systems, using a single hardware platform as the foundation. The host operating system supports multiple virtual machines, each of which could belong to the same, or a different OS.
Virtualization didn’t happen overnight. IBM mainframe systems from about 45 years ago started allowing applications to use a portion of a system’s resources. Virtualization become mainstream technology in the early 2000’s when technology made it possible to offer virtualization on x86 servers. The awareness that the server utilization rate was extremely low, as well as the rising cost of maintaining data centers with their high power costs, has made virtualization wide spread. A majority of the servers running across the world today are virtual – virtual servers way outnumber physical servers.
As you can guess, unlike a physical machine, a virtual machine (VM) doesn’t really exist – it’s a software artifact that imitates or mimics a physical server. That doesn’t mean that a VM is something that’s only in our minds – it actually consists of a set of files.
There’s a main VM configuration file that specifies how much memory and storage is allocated to the VM. The configuration file also names the virtual NICs assigned to the VM, as well as the I/O it’s allowed to access. One of these files is the VM configuration file, which specifies how many CPUs, how much RAM and which I/O devices the VM can access. The configuration files show the VM storage as a set of virtual disks, which are actually files in the underlying physical file system.
When an administrator needs to duplicate a physical server, a lot of work is required to acquire the new server, install the OS and application files on it, and copy the data over. Since a VM is just a set of files, you can get one ready in literally minutes after making just a handful of changes in the VM configuration file. Alternatively, you can provision new VMs through VM templates. A template contains default settings for hardware and software. Provisioning tools can simply use a VM template and customize it when they deploy new servers.
Virtualization in the early x86 was based purely software-based virtualization. Although Pope and Goldberg in their seminal paper “Formal Requirements for Virutalizable Third Generation Architectures” specified the three key properties for a virtual machine monitor (efficiency, resource control and equivalence), it wasn’t until the mid-2000’s that the x86 architecture started satisfying these three requirements. Hardware-assisted virtualization is the way that these ideal requirements started being realized.
Software-based virtualization has inherent limitations. The x86 architecture employs the concept of privilege levels (also called privilege reigns) for processing machine instructions
The key software that makes virtualization possible is the virtual machine monitor (VMM), actually known by its other name, hypervisor. A hypervisor is the software that does the heavy lifting in virtualized systems – it coordinates the low-level interaction between virtual machines and the underling host physical server hardware. The hypervisor sits between the VMs and the physical server and allows the VMs to partake of the physical server’s resources such as disk drives, RAM and CPU.
Virtualization lets a powerful physical server appear as several smaller servers, thus saving you space, power and other infrastructure expenses. A big advantage of virtualizing an environment is resource sharing among the servers, meaning that when one of the virtual servers is idle or almost so, other servers running on the same physical server can use the idle resources granted to the first server and speed up their own processing.
Virtualization lets the resources of the host server such as CPU, RAM, physical storage and network bandwidth be shared among the virtual servers running on top of a physical server. Often, even on a non-virtualized server shortage of any one of these resources can slow applications down. How, then, can multiple servers share a single set of resources without bringing the system to a halt? Let’s learn how virtualization typically handles the sharing of these resources among the VMs, to avoid bottlenecks and other performance issues.
In the context of resource allocation to the virtual machines, it’s important to understand the key concept of overcommitting. Overcommitting is the allocation of more virtualized CPUs or memory than there’s available on the physical server, and refers to the fact that you assume that none of the virtual servers will use their resources to the full extent on a continuous basis. This allows you to allocate the physical server’s resources in a way that the sum of the allocated resources often exceeds the physical resource limit of the server. Using virtual resources in this fashion allows you to increase guest density on a physical server.
Storage is not as much virtualized as the other server resources are. You simply allocate a chunk of the host storage space to each of the VMs, and this space is exclusively reserved for those VMs. Multiple VMs writing to the same storage disk might cause bottlenecks, but you can avoid them through the use of high performance disks, RAID arrays configured for speed, and network storage systems, all of which increase the throughput of data.
When it comes to storage, virtualaizion often uses the concept of thin provisioning, which lets you allocate storage in a flexible manner so as to optimize the storage available to each guest VM. Thin provisioning makes it appear that there’s more physical storage on the guest than what’s really available. Thin provisioning is different from overprovisioning, and applies only to storage and not to CPU or RAM.
CPU sharing is done on the basis of time slicing, wherein all the processing requests are sliced up and shared among the virtual servers. In effect, this is the same as running multiple processes on a non-virtualized server. CPU is probably the hardest resource to share, since CPU requests need to be satisfied in a timely fashion. You may at times see a small waiting time for CPU, but this is to be expected, and is no big deal. However, excessive waiting times can create havoc with the performance of applications.
Virtualized CPUs (vCPUs) can be overcommitted. You need to be careful with overcommitting vCPUs, as loads at or close to 100% CPU usage may lead to requests being dropped, or slow response times. You’re likely to see performance deterioration when running more vCPUs on a VM than are present on the physical server. Virtual CPUs are best overcommitted when each guest VM has a small number of vCPUs when compared to the total CPUs of the underlying host. A hypervisor such as KVM can easily handle switches between the VMs, when you assign vCPUs at a ratio of five CPUs (on 5 VMs) per on physical CPU on the host server.
Network bandwidth can be overprovisioned since it’s unlikely that all VMs will be fully utilizing their network bandwidth at all times.
It’s possible to overcommit memory as well, since it’s not common to see the RAM being fully used by all the VMs running on a server at any given time. Some hypervisors can perform a “memory reclamation”, whereby they reclaim RAM from the VMs to balance the load among the VMs.
It’s important to remember that applications that use 100% of the allocated memory or CPU on a VM can become unstable in an overcommitted virtual environment. In a production environment, it’s crtical to test extensively before overcommitting memory or CPU resources, as the overcommit ratios depend on the nature of the workloads.
Virtualization offers several benefits to an IT department, such as the following:
Virtualization lowers the cost of hardware purchases and maintenance, power and cooling, data center space and involves far lesser administrative and management effort.
Server consolidation is probably the most common, and one of the biggest motivating factors behind the drive to virtualize systems. Consolidation means that you reduce the footprint of your physical servers, saving not only capital outlays but also operating costs in terms of lower energy consumption in the data center. For example, you need fewer floor switches and networking capacity with virtualization when compared to physical servers.
Since the guest operating systems are fully isolated from the underlying host, even if the VM is corrupted, the host is still in an operating state.
You can move a running virtual machine from one physical server to another, without disconnecting either the client or the applications. You can move a running VM to a different physical server without impacting the users, using tools such as vMotion (VMware) and Live Migration (RedHat Linux), both of which enhance the uptime of the virtualized systems.
You can move VMs from one physical server to another for load balancing purposes, so you can load balance your applications across the infrastructure.
You can quickly restart a failed VM on a different physical server. Since virtual guests aren’t very dependent on the hardware, and the host provides snapshot features. You can easily restore a known running system in the case of a disaster.
To summarize, virtualization offers several compelling benefits, which led to tis widespread usage in today’s IT environments. Reduced capital outlays for purchase and support since you need to purchase fewer physical servers, faster provisioning, the ease of supporting legacy applications side by side with current applications, and the fact that virtualization gets you attuned to the way things are done in modern cloud based environments , have all been factors for its widespread use.
Virtualization isn’t a costless solution – you do need to keep in mind the following drawbacks of virtualization:
There’s often a performance overhead for the abstraction layer of virtualization
Overprovisioning is always a potential problem in a virtualized environment and this could lead to performance degradation, especially during peak usage.
Rewriting existing applications for a virtual environment may impose a stiff upfront cost
Losing a single hypervisor could means losing all the VMs based on that hypervisor
Administrators need specialized training and expertise to successfully manage the virtualized environments.
In addition to sharing the CPU and RAM of the parent server, VM guests share the I/O as well. The classification of hypervisors into different types is based on two basic criteria: the amount of hardware that’s virtualized and the extent of the modifications required of the guest system. Modern virtualization is hardware based and doesn’t use traditional software I/O virtualization (emulation) techniques. Software virtualization uses slow techniques such as binary translation to run unmodified operating systems. By virtualizing at the hardware level, virtualization seeks to deliver native performance levels. The following sections explain the two popular I/O virtualization techniques – paravirtualization and full virtualization.
Paravirtualization, as the name itself indicates, isn’t really “complete virtualization” since the guest OS needs to be modified.
The paravirtualization method presents a software interface to the VM that’s similar to that of the host hardware. That is, instead of emulating the hardware environment, it acts as a thin layer to enable the guest system to share the system resources.
Under paravirtualization, the kernel of the guest OS running on the host server is modified, so it can recognize the virtualization software layer (hypervisor). Privileged operations are replaced by calls to the hypervisor, in order to reduce the time the guest OS will spend performing operations that are more difficult to run in the virtual environment than in the non-virtualized environment. Costly operations are performed on the native host system instead of on the guest’s virtualized system. The hypervisor performs tasks on behalf of the guest OS and provides interfaces for critical kernel operations such as interrupt handling and memory management. Both Xen and VMWare are popular examples of paravirtualization.
A big difference between fully virtualized and paravirtualized architectures is that you can run different operating systems between guest and host systems under full virtualization, but not under paravirtualization.
Since paravirtualization modifies the OS, it’s also called OS-assisted virtualization, with the guest OS being aware that it’s being virtualized. Paravirtualization offers the following benefits:
Under paravirtualization, the hypervisor and the virtual guests communicate directly, with the lower overhead due to direct access to the underlying hardware translating to higher performance. VMs that are “aware” that they’re virtualized offer higher performance.
Since paravirtualization doesn’t include any device driver at all, it uses the device drivers in one of the guest operating systems, called the privileged guest. You therefore aren’t limited to the device drivers contained in the virtualization software.
Paravirtualization, however, requires you to modify either the guest OS or the use of paravirtualized drivers. It therefore imposes the following limitations:
You’re limited to open source operating systems and proprietary operating systems where the owners have consented to make the required code modifications to work with a specific hypervisor. Paravirtualization isn’t very portable since it doesn’t support unmodified operating systems such as Microsoft Windows.
Support and maintainability issues in production environments due to the OS kernel modifications needed for paravirtualization.
Paravirtualization can cover the whole kernel or just the drivers that virtualize the I/O devices. Xen, an open source virtualization project, is a good example of a paravirtualized environment. Xen virtualizes the CPU and memory by modifying the Linux kernel and it virtualizes the I/O with custom guest OS device drivers.
In addition to full and paravirtualization, there’s also something called software virtualization, which uses emulation techniques to run unmodified virtual operating systems. Linux distributions such as RedHat Linux don’t support software virtualization.
Full virtualization is a technique where the guest operating system is presented a simulated hardware interface by a hardware emulator. In full virtualization, the virtualization software, usually referred to as a hypervisor (guest OS drivers) emulates all hardware devices on the virtual system. The hypervisor creates an emulated hardware device and presents it to the guest operating system. This emulated hardware environment is also called a Virtual Machine Monitor or VMM, as explained earlier.
Guests use the features of the underlying host physical system to create a new virtual system called a virtual machine. All components of that the virtual machine presents to the operating system are virtualized. The hypervisor simulates specific hardware. For example when QEMU simulates an x86 machine, it provides a virtual Realtek 8139C+PCI as the network adapter. This means that the guest OS is unaware that it’s running on virtual, and not on real hardware.
he VM allows the guest OS to run without any modifications and the OS behaves as if it has exclusive access to the underlying host system. Since the physical devices on the host server may be different from the emulated drivers, the hypervisor needs to process the I/O before it goes to the physical device, thus forcing the I/O operations to move through two software layers. This means not only slower I/O performance but also higher CPU usage.
In paravirtualization, the virtualaizion software layer abstracts only a portion of the host system’s resources, and in full virtualization, it abstracts all of the host system resources.
Since the guest OS is a full emulation of the host hardware, this virtualization technique is called full virtualization.
You can run multiple unmodified guest operating systems independently on the same box with full virtualization. It’s the hypervisor that helps run the guest operating systems without any modification, by coordinating the CPU of the virtual machine and the host machine’s system resources.
The hypervisor offers CPU emulation to modify privileged and protected CPU operations performed by the guest OS. The hypervisor intercepts the system calls made by the guest operating systems to the emulated host hardware and maps them to the actual underlying hardware. You can have guest systems belonging to various operating systems such as Linux and Windows running on the same host server. Once again, the guest operating systems are completely unaware of the fact that they’re virtualized and thus don’t require any modifications.
Full virtualization requires complete emulation, which means more resources for processing from the hypervisor.
QEMU (which underlies KVM, to be discussed later in this chapter), VMWare ESXi and VirtualBox are popular fully virtualized hypervisors. Full virtualization offers many benefits, as summarized here:
The hypervisor offers a standardized environment for hardware for the guest OS. Since the guest OS and the hypervisor are a consistent package together, you can migrate this package across different types of physical servers.
The guest OS doesn’t require any modification.
It simplifies migration and portability for the virtual machines
Applications run in truly isolated guest operating systems
The method supports multiple operating systems which may be different in terms of their patch level or even completely different from each other, such as the Windows and Linux operating systems
The biggest drawback of full virtualization is that since the hypervisor needs to process data, some of the processing power of the host server is commandeered by the hypervisor and this degrades performance somewhat.
The hypervisor, by presenting virtualized hardware interfaces to all the VM guests, controls the platform resources. There are two types of Hypervisors, based on where exactly the hypervisor sits relative to the operating system and the host, named Type 1 and Type 2 hypervisors.
A Type 1 hypervisor (also called a native or bare metal hypervisor) is software that runs directly on the bare metal of the physical server, just as the host OS does. Once you install and configure the hypervisor, you can start creating guest machines on the host server.
Architecturally, the Type 1 hypervisor sits directly on the host hardware and is responsible for allocating memory to the virtual machines, as well as providing an interface for administration and for monitoring tools. VMWare ESX Server, Microsoft Hyper-V and several variations of the open source KVM hypervisor are examples of a Type 1 hypervisor.
Due to its direct access to the host server, the Type 1 hypervisor doesn’t require separate CPU cycles or memory for the VMs and thus delivers greater performance.
It’s important to understand that most implementations of a bare metal hypervisor require virtualization support at the hardware level through hardware assisted virtualization techniques (explained later in this chapter), and VMWare and KVM are two such hypervisors.
A Type 2 hypervisor (also called a hosted hypervisor) is deployed by loading it on top of the underlying OS running on the physical server, such as Linux or Windows. The virtualization layer runs like a hosted application directly in top of the host OS. The hypervisor provides each of the virtual machines running on the host system with resources such as a virtual BIOS, virtual devices and virtual memory. The guest operating systems depend on the host OS for accessing the host’s resources.
A Type 2 hypervisor is useful in situations where you don’t want to dedicate an entire server for virtualization. For example, you may want to run a Linux OS on your Windows laptop – both VMWare Workstation and Oracle VM Virtual Box are examples of Type 2 hypervisors.
Traditionally, a Type-1 hypervisor is defined as a “small operating system”. Since a Type 1 hypervisor directly controls the resources of the underlying host, its performance is generally better than that of a Type 2 hypervisor, which depends on the OS to handle all interactions with the hardware. Since Type 2 hypervisors need to perform extra processing (‘instruction translation’), they can potentially adversely affect the host server and the applications as well.
You can pack more VMs with a Type 1 hypervisor because this type of hypervisor doesn’t compete with the host OS for resources.
Under kernel level virtualization, the host OS contains extensions within its kernel to manage virtual machines. The virtualization layer is embedded in the operating system kernel itself. Since the hypervisor is embedded in the Linux kernel, it has a very small footprint and disk and network performance is higher in this mode. The popular open source Kernel Virtual Machine (KVM) virtualization model uses kernel level virtualization (hardware-assisted virtualization method).
Virtualization solutions that use a Type-2 hypervisor such as VirtualBox are great for enabling single users or small organizations to run multiple VMs on a single physical server. VirtualBox and similar solutions run as client applications and not directly on the host server hardware. Enterprise computing requires high performance virtualization strategies that are closer to the host’s physical hardware. Bare metal virtualization involves much less overhead and also exploits the built in hardware support for virtualization better than Type-2 hypervisors.
Most Linux systems support two types of open-source bare-metal virtualization technologies: Xen and Kernel Virtual Machine (KVM). Both Xen and KVM support full virtualization, and Xen also supports the paravirtualization mode. Let’s start with a review of the older Xen technology and then move on to KVM virtualization, which is the de facto standard for virtualization in most Linux distributions today.
Xen was created in 2003 and acquired later on by Citrix, which announced in 2013 that the Xen Project would be a collaborative project between itself, Xen’s main contributor, and the Linux foundation. Xen is very popular in the public cloud environment with companies such as Amazon Web Services and Rackspace Cloud using it for their customers.
Xen is capable of running multiple types of guest operating systems. When you boot the Xen hypervisor on the host physical hardware, it automatically starts a primary virtual machine called Domain 0 (or dom0), or the management domain. Domain 0 manages the systems and by performing tasks such as creating additional virtual machines, and managing the virtual devices for the virtual machines, as well as tasks such as suspending, resuming, and migrating virtual machines, the primary VM will provide the virtual management capabilities for all other VMs, called the Xen guests. You administer Xen through the xm command-line suite.
The Xen daemon, named xend, runs in the dom0 VM and is the central controller of virtual resources across all VMs running on the Xen hypervisor. You can manage the VMs using an open source virtual machine manager such as OpenXenManager, or a commercial manager such as Citrix XenCenter.
Xen is a Type 1 hypervisor and so it runs directly on the host hardware. Xen inserts a virtualization layer between the hardware and the virtual machines, by creating pools of system resources, and the VMs treat the virtualized resources as if they were physical resources.
Xen uses paravirtualization, means the guest OS must be modified to support the Xen environment. The modification of the guest OS lets Xen use the guest OS as the “most privileged software”. Paravirtualization also enables Xen to use more efficient interfaces such as virtual block devices to emulate hardware devices.
Xen offers highly optimized performance due to its combination of paravirtualization and hardware assisted virtualization. However, it has a fairly large footprint and integrating it isn’t easy and could overwhelm the Linux kernel over time. It also relies on third-party products for device drivers as well as for backup and recovery and for fault tolerance. High I/O usually slows down Xen based systems.
While Xen offers a higher performance than KVM, it’s the ease of use of KVM virtualization which has led to it’s becoming the leading virtualization solution in Linux environments.
KVM supports native virtualization on processors that contain extensions for hardware virtualization. KVM supports several types of processers and guest operating systems, such as Linux (many distributions), Windows, and Solaris. There’s also a modified version of QEMU that uses KVM to run Mac OS X virtual machines.
Linux KVM (Kernel-based Virtual Machine) is the most popular open-source virtualization technology today. Over the past few years, KVM has overtaken Xen as the default open source technology for creating virtual machines on most Linux distributions.
Although KVM has been part of the Linux kernel since the 2.6.20 release (2007), until release 3.0, you had to apply several patches to integrate KVM support into the Linux kernel. Post 3.0 Linux kernels automatically enable KVM’s integration into the kernel, allowing it to take advantage of improvements in the Linux kernel versions. Being a part of the Linux kernel is a big deal, since it means frequent updates and a lower Total Cost of Operation (TCO). In addition, KVM is highly secure since it’s integrated with SELinux in both RedHat Linux and CentOS.
KVM differs from Xen in that it uses the Linux kernel as its hypervisor. Although a Type-1 hypervisor is supposed to be similar to a small OS, the fact that you can configure a custom lightweight Linux kernel and the availability of large amounts of RAM on today’s powerful 64-bit servers means that the size of the Linux kernel isn’t a hindrance.
Just as Xen has its xm toolset, KVM has an administrative infrastructure that it has inherited from QEMU (short for Quick Emulator) a Linux emulation and virtualization package which achieves superior performance by using dynamic translation. By executing the guest cod directly on the host CPU, QEMU achieves performance close to the native OS.
QEMU supports virtualization while executing under the Xen hypervisor or by using the Linux KVM kernel module .Red Hat has developed the libvert virtualization API to help simplify the administration of various virtualization technologies such as KVM, Xen, LXC containers, VirtualBox and Microsoft Hyper-V. As an administrator, it’s great to learn libvert because you can manage multiple virtualization technologies by learning a single set of commands (command line and graphical) based on the libvert API.
In order to support KVM virtualization, you need to install various packages, with the required package list depending on your Linux distribution.
When you create one or two KVM based VMs, you can use disk images that you can create on the local disk storage of the host. Each VM will in essence be a disk image stored in a local file. However, for creating enterprise wide virtualization environments, this manual process of creating VMs is quite tedious and hard to manage. The libvert package lets you create storage pools to serve as an abstraction for the actual VM images and file systems.
The libvert package provides standard, technology independent administrative commands to manage virtualization environments.
A storage pool is a specific amount of storage set aside by the administrator for use by the guest VMs.
Storage pools are divided into storage volumes which are then assigned to guest VMs as block devices.
A storage pool can be a local directory, physical disk, logical volume, or a network file system (NFS) or block–level networked storage managed by libvert. Using libvert, you manage the storage pool and create and store VM images in the pool. Note that in order to perform a live migration of a VM to a different server, you should locate the VM disk image in an NFS, block-level networked storage, or in HBA (SCSI Host Bus Adapter) storage that can be accessed from multiple hosts.
The libvirt package contains the virsh command suite that provides the commands to create and manage the virtualization objects that libvert uses, such as the domains (VMs), storage pool, networks, devices, etc. Following is an example that shows how to create an NFS based (netfs) storage pool:
virsh pool-create-as NFS-POOL netfs \ --source-host 192.168.6.248 \ --source-path /DATA/POOL \ --target /var/lib/libvirt/images/MY-NFS-POOL
In this command, MY-NFS-POOL is the name of the new storage pool and the local mount point that’ll be used to access this NFS based storage pool is /var/lib/libvirt/images/MY-NFS-POOL. Once you create the storage pool as shown here, you can create VMs in that pool with the virt-install command, as shown here:
virt-install --name RHEL-6.3-LAMP \ --os-type=linux \ --os-variant=rhel6 \ --cdrom /mnt/ISO/rhel63-server-x86_64.iso \ --disk pool=My-NFS-POOL,format=raw,size=100 \ --ram 4096 \ --vcpus=2 \ --network bridge=br0 \ --hvm \ --virt-type=kvm \
Here’s a summary of the key options specified with the virt-install command:
--os-type and –os-variant: indicate that this VM will be optimized for the Linux RedHat Enterprise Linux 6,3 release,
--cdrom: specifies the ISP image (virtual CDROM device that will be used to perform this installation)
--disk: specifies that the VM will be created with 100GB of storage from the storage pool named NFS-01.
--ram and –vcpus: specify the RAM and virtual CPUs for the VM
--hvm: indicates that this is a fully virtualized system (default)
--virt-type: specifies kvm as the hypervisor (default)
RedHat Enterprise Virtualization (RHEV) is based on KVM.
The choice of the physical servers for a virtualized environment is critical, and you’ve several choices, as explained in the following sections.
You can create your own servers by purchasing and putting together all the individual components such as the disk drives, RAM and CPU. You should expect to spend less to build your own systems, so that’s good, but the drawback is the time it takes to get your systems together. In addition, you’re responsible for maintaining these systems with partial or no service contracts to support you, with all the attendant headaches.
If you’re considering putting together your own systems, it may be a good idea to check out the Open Compute project (http://opencompute.org), which creates low cost server hardware specifications and mechanical drawings (designs). The goal of Open Compute is to design servers that efficient, inexpensive and easy to service. Consequently these specs contain far fewer parts than traditional servers. You can try and purchase hardware that meets these specifications to ensure you’re getting good hardware when you’re trying to keep expenses low.
Purchasing complete systems from a well-known vendor is the easiest and most reliable way to go, since you don’t need to worry about the quality of the hardware and software, in addition to getting first class support. It also gets maintenance off your hands. However, as you know, you’re going to pay for all the bells and whistles.
Blade servers are commonly used for virtualization since they allow for a larger number of virtual machines per chassis. Rack servers don’t let you create as many VMs per chassis.
The choice of the type of server depends on factors such as their ease of maintainability, power consumption, remote console access, server form factor, and so on.
Migrating virtual machines means the moving of a guest virtual machine from one server to another. You can migrate a VM in two ways: live and offline, as explained here:
Live Migration: this process moves an active VM from one physical server to another. In Red Hat Enterprise Linux, this process moves the VM’s memory and its disk volumes as well, using what’s called live block migration.
Offline Migration: During an offline migration, you shut down the guest VM and move the image of the VM’s memory to the new host. You can then resume the VM on the destination host and the memory previously used by the VM on the original host is released back to the host.
You can migrate VMs for the following purposes:
Load Balancing: You can migrate one or more VMs from a host to relieve it’s load
Upgrades to the Host: When you’re upgrading the OS on a host, you can avoid downtime for your applications by migrating the VMs on that host to other hosts.
Geographical Reasons: Sometimes you may want to migrate a VM to a different host in a different geographical location, to lower the latency of the applications hosted by the VM.
A Linux container (LXc) is a set of processes that are isolated from other processes running on a server. While virtualization and its hypervisors logically abstract the hardware, containers provide isolation, letting multiple applications share the same OS instance.
You can use a container to encapsulate different types of application dependencies. For example, if your application requires a particular version of a database or scripting language, the containers can encapsulate those versions. This means that multiple versions of the database or scripting language can run in the same environment, without requiring a completely different software stack for each application, each with its own OS. You don’t pay for all of this with a performance hit, as containerized applications deliver roughly the same performance as applications that you deploy on bare metal.
Linux Containers have increasingly become an alternative to using traditional virtualization, so much so that containerization is often referred to as the “new virtualization”.
On the face of it both virtualization and containerization seem to perform the same function by letting you run multiple virtual operating systems on top of a single OS kernel. However, unlike in traditional virtualization, a container doesn’t run multiple operating systems. Rather, it “contains” the multiple guest operating systems in their own userspace, while running a single OS kernel.
At a simple level, containers involve less overhead since there’s no need to emulate the hardware. The big drawback is that you can’t run multiple types of operating systems in the same hardware. You can run 10 Linux instances in a server with container based virtualization, but you can’t run both Linux and Microsoft Server guests side by side.
This chapter introduces you to Linux container technology and the principles that underlie that technology, and also compares traditional virtualization with containerization. Chapter 5 is dedicated to Docker containers and container orchestration technologies such as Kubernates.
Linux Containers (LXc) allow the running of multiple isolated server installs called containers on a single host. LXc doesn’t use offer a virtual machine – instead it offers a virtual environment with its own process and network space.
Linux containers have analogies in other well known ‘Nix operating systems:
Oracle Solaris: Zones
Linux containers (through Docker) are radically changing the way applications are built, deployed and instantiated. By making it easy to package the applications along with all of their dependencies, containers accelerate application delivery. You can run the same containerized applications in all your environments – dev, test, and production. Furthermore, your platform can be anything: a physical server, a virtual server, or the public cloud.
Containers are designed to provide fast and efficient virtualization. Containerization provides different views of the system to different processes, by compartmentalizing the system. This compartmentalization ensures guaranteed access to resources such as CPU and IO, while maintaining security.
Since each container shares the same hardware as well as the Linux kernel with the host system, containerization isn’t the same as full virtualization. Although the containers running on a host share the same host hardware and kernel, they can run different Linux distributions. For example, a container can run CentOS while the host runs on Ubuntu.
NOTE Linux Containers (LXC) constitute a container management system that became part of the Linux Kernel 2.6.24 in August 2008. As with Docker (see Chapter 5), Linux Containers make use of several Linux kernel modules such as cgroups, SELinux, and AppArmor.
Linux Containers combine an application and all of its dependencies into a package which you can make a versioned artifact. Containers provide application isolation while offering the flexibility of image-based deployment methods. Containers help isolate applications to avoid conflicts between their runtime dependencies and their configurations, and allow you to run different versions of the same application on the same host. This type of deployment provides a way to roll back to an older version of an application if a newer version doesn’t quite pan out.
Linux containers have their roots in the release of the chroot tool in 1982, which is a filesystem specific container type virtualization tool. Let’s quickly review chroot briefly to see how it compares to modern containerization.
Linux containerization is often seen as an advancement of the chroot technique, with dimensions other than just the file system. Whereas chroot offers isolation just at the file system level, LXc offer full isolation between the host and a container and between a container and other containers.
The Linux chroot() command (pronounced “cha-root”) lets a process (and its child processes) redefine the root directory from their perspective. For example, If you chroot the directory /www, and when you issue the command cd, instead of taking you to the normal root directory (“/”), it leaves you at /www. Although /www isn’t really the root directory, the program believes that it’s so. In essence, chroot restricts the environment and that’s the reason the environment is also referred to as a jail or chroot jail.
Since a process has a restricted view of the system, it can’t access files outside of its directory, as well as libraries and files from other directories. An application must therefore have all the files that it needs right in the chroot environment. The key principle here is that the environment should be self-contained within a single directory, with a faux root directory structure.
Linux containers are similar to chroot, but offer more isolation. Linux containers use additional concepts beyond chroot, such as control groups. Whereas chroot is limited to the file subsystem, control groups enable you to define groups encompassing several processes (such as “sshd” for example) and control resource usage for those groups for various subsystems such as the file system, memory, CPU and network resources and block devices.
Isolating applications is a key reason for using container technologies. In this context, an application is a unit of software that provides a specific set of services. While users are concerned just with the functionality of applications, administrators need to worry about the external dependencies that all applications must satisfy. These external dependencies include system libraries, third-party packages and databases.
Each of the “dependencies” has its own configuration requirements and running multiple versions of an application on a host is difficult due to potential conflicts among these requirements. For example, a version of an application may require a different set of system libraries than another version of the same application. While you can somehow manage to run multiple versions simultaneously through elaborate workarounds, the easiest solution to managing the dependencies is to isolate the applications.
Both containerization and virtualization help address the issues involved in efficient application delivery, where applications in general are much more complex yet must be developed with lower expense and delivered faster, so they can quickly respond to changing business requirements.
At one level, you can view both containers and traditional virtualization as allowing you to do the same thing: let you run multiple applications on the same physical servers. How then, are containers a potentially better approach? Virtualization is great for abstracting from the underlying hardware, which helps lower your costs through consolidating servers, and make it easy to automate the provision of a complete stack that includes the OS, the application code and all of its dependencies.
However, great as the benefits of virtualization are, virtual machines have several limitations:
By replacing physical servers with virtual servers, you do reduce the physical server units – yet server sprawl doesn’t go away – you’re simply replacing one type of sprawl with another!
Virtual technology isn’t suitable for microservices that can uses hundreds of thousands of processes, since each OS process requires a separate VM.
Virtual machines can’t be instantiated very quickly – they take several minutes to spin up, with means inferior user experience. Containers on the other hand can be spun up blazingly fast- within a few short seconds!
Lifecycle management of VMs isn’t a trivial affair – every VM has a minimum of two operating systems that need patching and upgrading – the hypervisor and the guest OS inside the VM. If you have a virtualized application with 20 VMs, you need to worry about patching 21 operating systems (20 guests systems + 1 hypervisor).
While traditional virtualization does offer complete isolation, once you secure containers with Linux namespaces, CGroups and SELinux, you can get virtually (no pun) the same amount of isolation. Linux containers offer a far more efficient way to build, deploy, and execute applications in today’s modern application architectures that uses microservices and other new application paradigms. Linux containers offer an application isolation mechanism for lightweight multitenancy and offer simplified application delivery.
Containers, as I’ve mentioned earlier, have been around for over a decade, and things like Solaris Zones and FreeBDS Jails have been with us for even longer. The new thing about current containerization is that it’s being used to encapsulate an application’s components, including the application dependencies and required services. This encapsulation makes the applications portable. Docker has contributed substantially to the growth of containerization, by offering easy to use management tools as well as a great repository of container images. RedHat and others have also offered smaller footprint operating systems and frameworks for management, as well as containerization orchestration tools such as Kubernates (please see Chapter 5).
Linux containers enhance the efficiency of application building, shipping, deploying, and execution. Here’s a summary of the benefits offered by containers.
While VMs take several minutes to boot up, you can boot up a containerized application in mere seconds, due to the lack of the overhead imposed by a hypervisor and a guest OS. If you need to scale up the environment using a public cloud service, the ability to boot up fast is highly beneficial.
Due to their minimal footprint, many more containers fit on a physical server than virtual machines.
You can monitor containers easily since they all run on a single OS instance. When idle, the containers don’t use any server resources such as memory and CPU unlike a virtual machine, which grabs those resources when you start it up. You can also easily remove unused container instances and prevent a virtual machine like sprawl.
Even if you have a large number of applications running on a containerized server, you need to patch and upgrade just a single operating system, regardless of how many containers run on it, unlike in the case of virtual machines. Since there are fewer operating systems to take care of, you’re more likely to upgrade than to apply incremental patches.
Containers make it easy to move application workload between private and public clouds. Virtual machines are usually much larger than containers, with sizes ranging often in the Gigabytes. Containers are invariably small (a few MBs) and so it’s easier to transport and instantiate them.
Containers speed up application development due the testing cycles being shorter, owing to the containers including all the application dependencies. You can build an app once and deploy it anywhere.
You have far fewer operating systems to manage since multiple containers share the same OS kernel. You also have better visibility into the workload of a container from the host environment, unlike with VMs, where you can’t peek inside the VM.
Containers aren’t a mere incremental enhancement of traditional virtualization. They offer many ways to speed up application development and deployment, especially in the areas of microservices, which I discussed in Chapter 3. Since microservices can startup and shutdown far quicker than traditional applications, containers are ideal for them. You can also scale resources such as CPU and memory independently for microservices with a container based approach.
Let’s say your organization has a sensitive application that makes uses of SSL to encrypt data flowing through the public internet. If you’re using a virtualized setup, the application image includes SSL and therefore, you’ll need to modify the application image whenever there are SSL security flaws. Obviously, your application is down during this time period, which may be long, since you’ll need to perform regression testing after making changes to SSL.
If you were using a container based architecture on the other hand, you can separate the SSL portion of the application and place it in its own container. The application code isn’t intertwined with SSL in this architecture. Since you don’t need to modify the application code, there’s no need for any regression testing of the application following changes in SSL. A huge difference!
There are two different ways in which you can employ Linux containers. You can use containers for sandboxing applications, or you can utilize image-based containers to take advantage of the whole range of features offered by containerization. I explain the two approaches in the following sections.
Under the host containers use case, you use containers as lightweight application sandboxes. All you applications running in various containers are based on the same OS as the host, since all containers run in the same user space as the host system. For example, you can carve a RHEL 7 host into multiple secure and identical containers, with each container running a RHEL 7 userspace. Maintenance is easy since updates need to be applied just to the host system. The disadvantage to this type of containerization is that it’s limited to just one type of OS, in this example a RHEL runtime.
Image based containers include not just the application, but also application’s runtime stack. Thus, the container runs an application that has nothing to do with the host OS. Both the container and application run times are packaged together and deployed as an image. The containers can be non-identical under image based containerization. This means you can run multiple instances of the same application on a server, each running on a different OS platform. This is especially useful when you need to run together application versions based on different OS versions such as RHEL 6.6 and RHEL 7.1, for example. Docker, which we discuss extensively in Chapter 5, is based on image based containers. Docker builds on LXc and it includes the userspace runtime of applications.
You can deploy containers both on bare metal, and on virtualized servers.
Linux containers have become increasingly important as application packaging and delivery technology. Containers provide application isolation along with the flexibility of image-based deployment. Linux containers depend on several key components offered by the Linux kernel, such as the following:
Cgroups (Control groups) – allow you to group processes for optimizing system resource usage by allocating resources among user-defined groups of tasks
Namespaces – isolates processes by abstracting specific global system resources and making them appear as a distinct instance to all the processes within a namespace
SELinux – securely separates containers by applying SELinux policy and labels.
Namespaces, cgroups and SELinux are all part of the Linux kernel, and they provide the support for containers, which run the applications. While there’s a bunch of other technologies used by containers, namespaces, cgroups and SELinux account for most of the benefits you see with containers.
In the following sections, let’s briefly review the key building blocks of Linux containers:
Process isolation – namespaces provide this
Resource management – Cgroups provide resource management capabilities
Security – SELinux takes care of security
The ability to create multiple namespaces enables process isolation. It’s namespaces that make it possible for Linux containers to provide isolation between applications, with each namespace providing a boundary around applications. Each of these applications is a self-contained entity with its own file system, hostname and even a network stack. It’s when it’s running within a namespace that an application is considered to be running within a container.
A namespace makes a global system resource appear as a dedicated resource to processes running within that namespace. This helps different processes see different views of the system, something which is similar to the concept of zones in the Solaris operating system. This separation of the resource instances lets multiple containers simultaneously use the same resource without conflicts. Namespaces offer lightweight process virtualization, although without a hypervisor layer as in the case of OS virtualization architectures such as KVM.
Mount namespaces together with chroots help you create isolated Linux installations for running non-conflicting applications.
In order to isolate multiple processes from each other, you need to isolate them at every place they may bump into each other. For example, the file system and network are two obvious points of conflict between two applications. Application containers use several types of namespaces as described here, each of which helps isolate a specific type of resource, such as the file system or the network.
Namespaces isolate processes and let you create a new environment with a subset of the resources. Once you set up a namespace, it’s transparent to the processes. In most Linux distributions, the following namespaces are supported, with RHEL 7 and other distributions adding the user namespace as well.
Let’s briefly discuss the namespaces listed here in the following sections.
Normally, file system mount points are global, meaning that all processes see the same set of mount points. Mount namespaces isolate the set of filesystem mount points viewed by various processes. Processes running within different mount namespaces, however, can each have different views of a file system hierarchy. Thus, a container can have a different /tmp directory from that of another container. The fact that each application sees a different file system means that dependent objects can be installed without conflicts among the applications.
UTS namespaces let multiple containers have separate hostnames and domain names, thus providing isolation of these two system identifiers. UTS namespaces are useful when you combine them with network namespaces.
PID namespaces allow processes running in various containers to use the same PID, so each container can have its own init process, which is PID1. While you can view all processes running inside the containers from the host operating system, you can only see a container’s own set of processes from that container. All processes, however, are visible within the “root” PID namespace.
Network namespaces allow containers to isolate the network stack, which includes things such as the network controllers, firewall, iptable rules, routing tables, etc. Each container can use separate virtual or real devices and have its own IP address. Network namespaces remove port conflicts among applications, since each application uses its own network stack, with a dedicated network address and TCP port.
IPC (inter process communication) namespaces allow interprocess communication (IPC) resource isolation, which allows containers to create shared memory segments and semaphores with identical names, although they can’t influence those resources that belong to other containers. Inter process communication environment includes things such as message queues, semaphores and shared memory.
Control groups (cgroups for short) let you allocate resources such as CPU time, block IO, RAM and network bandwidth among groups of tasks that you can define, thus providing you fine-grained control over system resources. Using cgroups, the administrator can hierarchically group and label processes and assign specific amounts of resources to these processes, thus making for an efficient allocation of resources.
The Linux nice command lets you set the “niceness” of a process, which influences the scheduling of that process. The nice values can range from -20 (most favorable scheduling) to a value of 19 (least favorable to the process). A process with a high niceness value is accorded lower priority and less CPU time, thus freeing up resources in favor of processes with a lower niceness value. Note that niceness doesn’t really translate to priority – the scheduler is free to ignore the nice level you set. In traditional systems, all processes receive the same amount of system resources, and so an application with a larger number of processes can grab more system resources compared to applications with fewer running processes. The relative importance of the application should ideally be the criterion on which resources ought to be allocated, but it isn’t so, since resources are allocated at the process level.
Control groups let you move resource allocation from the process level to the application level. Control groups do this by first grouping and labeling processes into hierarchies, and setting resource limits for them. These cgroup hierarchies are then bound with the systemd unit tree, letting you manage system resources with the systemctl commands (or by editing the system unit files).
A Cgroup is a kernel provided filesystem, and is usually mounted at /cgroup, and contains directories similar to /proc and /sys that represent the running environment and kernel configuration options. In the following sections, I explain how cgroups are implemented in RedHat Enterprise Linux (and Fedora).
You organize cgroups in a tree-based hierarchy. Each process or task that runs on a server is in one and only one of the cgroups in a hierarchy. In a cgroup a number of tasks (same as processes) are associated with a set of subsystems. The subsystems act as parameters than can be assigned and define the “resource controllers” for memory, disk I/O, etc.
In RHEL 7 (and CentOS), the systemd process, which is the parent of all processes, provides three unit types for controlling resource usage – services, scopes and slices. Systemd automatically creates a hierarchy of slice, scope and service units that provide the structure for the cgroup tree. All three of these unit types can be created by the system administrator or by programs. Systemd also automatically mounts the hierarchies for important kernel resource controllers such as devices (allows or denies access to devices for tasks in a group), or memory (sets limits on memory usage by a cgroup’s tasks). You can also create custom slices of your own with the systemctl command.
Here’s a brief description of the three unit types provided by systemd:
Service: services let systemd start and stop a process or a set of processes as a single unit. Services are named as name.service.
Scope: processes such as user sessions, containers and VMs are called scopes and represent groups of externally created processes. Scopes are named as name.scope. For example, Apache processes and MySQL processes can belong to the same service but to different scopes – the first to the apache scope and the second to the Mysql scope.
Slice: a slice is group of hierarchically organized scopes and services. Slices don’t contain any processes – it’s the scopes and services that do. Since a slice is hierarchical in nature, the name of a slice unit corresponds to its path in the hierarchy. If the slice name is parent-name.slice, the slice named parent-name.slice is a subslice of the parent slice.
The kernel creates the following four slices by default to run the system:
-.slice – root slice
system.slice – default location for system services (systemd automatically assigns services to this slice)
user.slice – default location for user sessions
machine.slice - default location for VMs and Linux containers
Slices are assigned to scopes. Users are assigned implicit subslices and you can define new slices and assign services and scopes to those slices. You can create permanent services and slice units with unit files. You can also create transient service and slice units at runtime through issuing API calls to PID 1. Transient services and slice units don’t survive a reboot, and are released after they finish.
You can create two types of cgroups: transient and persistent. You can create transient cgroups for a service with the systemd-run command, and set limits on resources that the service can use. You can also assign a persistent cgroup to a service by editing its unit configuration file. The following example shows the syntax for creating a transient cgroup with systemd-run:
systemd-run --unit=name --scope --slice=slice_name command
Once you create the cgroup, you can start a new service with the systemd-run command, as shown here:
# systemd-run --unit=toptest --slice=test top –b
This command runs the top utility in a service unit, within a new slice named test.
You can override resources by configuring the unit file or at the command line as shown here:
# systemctl set-property httpd.service CPUShares=524 MemoryLimit=500M
Unlike in virtualized systems, containers don’t have a hypervisor to manage resource allocation, and each container appears as a regular Linux process from the point of view of the OS. Using cgroups helps allocate resources efficiently since you’re using groups instead of processes. A CPU scheduler, for example, finds it easy to allocate resources among groups rather than among a large number of processes.
Systemd stores the configuration for each persistent unit in the /usr/lib/systemd/system directory. To change the configuration of a service unit you must modify the configuration file either manually by editing the file, or with the systemctl set-property command.
You can view the hierarchy of the control groups with the systemd-cgls command in RHEL 7, as shown in the following output from the command, which also shows you the actual processes running in the cgroups.
├─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 20 ├─user.slice │ └─user-1000.slice │ └─session-1.scope │ ├─11459 gdm-session-worker [pam/gdm-password] │ ├─11471 gnome-session --session gnome-classic │ ... │ └─system.slice ├─systemd-journald.service │ └─422 /usr/lib/systemd/systemd-journald ├─bluetooth.service │ │ └─5328 /usr/lib/systemd/systemd-localed ├─sshd.service │ └─1191 /usr/sbin/sshd -D │ ...
As the output reveals, slices don’t contain processes – it’s the scopes and services that contain the processes. The -.slice is implicit and identifies with the hierarchy’s root.
In older versions of Linux, administrators used the libcgroup package and the cgconfig command to build custom cgroup hierarchies. In this section, I show how RHEL 7 moves resource management from the process to the application level, by binding the cgroup hierarchies with the systemd unit tree. This lets you manage the resources either through systemctl commands or by editing the systemd unit files. You can still use the libcgroup package in release 7, but it’s there only to assure backward compatibility.
A cgroup subsystem is also called a resource controller and stands for specific resources such as CPU or memory. The kernel (systemd) automatically mounts a set of resource controllers, and you can get the list from the /proc/cgroups file (or the listsubsys command). Here are the key controllers in a RHEL 7 system:
cpu: provides cgroup tasks access to the CPU
cpuset: assigns individual CPUs to tasks in a cgroup
freezer: suspends or resumes all tasks in a cgroup when they reach a defined checkpoint
memory: limits memory usage by a cgroup’s tasks
Systemd provides a set of parameters with which you can tune the resource controllers.
Now that you have a basic idea about cgroups and resource controllers, let’s see how you can use cgroups to optimize resource usage.
Let’s say you have two MySQL database servers, each running within its own KVM guest. Let’s also assume that one of these is a high priority database and the other, a low priority database. When you run the two database servers together, by default the I/O throughput is the same for both.
Since one of the database services is a high priority database, you can prioritize the I/O throughput by assigning the high priority database service to a cgroup with large number of reserved I/O operations. At the same time, assign the low priority database server to a cgroup with a lower number of reserved I/O operations.
In order to prioritize the I/O throughput, you must first turn on resource accounting for both database servers:
# systemctl set-property db1.service BlockIOAccounting=true # systemctl set-property db2.service BlockIOAccounting=true
Next, you can set the priority by setting a ratio of 5:1 between the high and low priority database services, as shown here:
# systemctl set-property db1.service BlockIOWeight=500 # systemctl set-property db2.service BlockIOWeight=100
I employed the resource controller BlockIOWeight in this example to priorotize the I/O throughput between the two database services, but you could also have configured block device I/O throttling by setting the blkio controller to achieve the same result.
The two major components of the container architecture that you’ve seen thus far – namespaces and cgroups aren’t designed for providing security. Namespaces are good at making sure that the /dev directory in each container is isolated from changes in the host. However, a bad process from a container can still potentially hurt the host system. Similarly, cgroups help with the avoiding of denial of service attacks since they limit the resources any single container can use. However, SELinux, the third major component of modern container architectures is designed expressly to provide security, not only for containers, but for also normal Linux environments.
RedHat has been a significant contributor to SELinux over the years, along with the Secure Computing Corporation. The National Security Agency (NSA) developed SELinux to provide the Mandatory Access Control (MAC) framework often required by the military and similar agencies. Under SELinux, processes and files are assigned a type and access to them is controlled through fine-grained access control polices. This limits potential damage from well-known security vulnerabilities such as buffer overflow attacks. SELinux significantly enhances the security of virtualized guests, in addition to the hosts themselves.
SELinux implements the following mechanisms in the Linux kernel:
Mandatory Access Control (MAC)
Multi-level Security (MLS)
Multi-category security (MCS)
The sVirt package enhances SELinux and uses Libvirt to provide a MAC system for containers (and also for virtual machines). SELinux can then securely separate containers by keeping a container’s root processes from affecting processes running outside the container.
SELinux can be disabled or made to run in a permissive mode. In both of these modes, SELinux won’t provide sufficient secure separation among containers. Following is a brief review of SELinux modes and how you can enable it.
SELinux operates in two modes (three if you want to add the default “disabled” mode) – enforcing and permissive. While SELinux is enabled in both the Enforcing and the Permissive modes, SELinux security policies are enforced only in the Enforcing mode. In the Permissive mode, the security policies are read but not applied. You can check the current SELinux mode with the getenforce command:
# getenforce Enabled #
There are several ways to set the SELinux mode, but the easiest way is to use the setenforce command. Here are ways you can execute this command:
# setenforce 1 # setenforce Enforcing # setenforce 0 # setenforce Permissive
The first two options set the mode to Enforcing and the last two to Permissive.
RHEL 7 provides Secure Virtualization (sVirt), which integrates SELinux and virtualization, by applying MAC (Mandatory Access Controls) when using hypervisors and VMs. sVirt works very well with KVM, since both are part of the Linux kernel.
By implementing the MAC architecture in the Linux kernel, SELinux limits access to all resources on the host. In order to determine which users can accept a resource, each resource is configured with a SELinux context such as the following:
In this example, here’s what the various entities in the SELinux context stand for:
system_u: is user
The goal of sVirt is to protect the host server from a malicious instance, as well to protect the virtual instances themselves from a bad instance. The way sVirt does this by configuring each VM created under KVM virtualization to run with a different SELinux label. By doing this, it creates a virtual fence around each of the VMs, through the use of unique category sets for each VM.
It’s common for Linux distributions to ship with Booleans that let you enable or disable an entire set of allowances with a single command, such as the following:
virt_use_nfs: controls the ability of the instances to use NFS mounted file systems.
virt_use_usb: controls the ability of the instances to use USB devices.
virt_use_xserver: controls the ability of the instances to interact with the X Windows system
Virtualization (such as KVM virtualization) and containers may seem similar, but there are significant differences between the two technologies. The most basic of the differences is that virtualization requires dedicated Linux kernels to operate, whereas Linux containers share the same host system kernel.
Your choice between virtualization and containerization depends on your specific needs, based on the features and benefits offered by the two approaches. Following is a quick summary of the benefits/drawbacks for the two technologies.
Assuming you’re using KVM virtualization, you can run operating systems of different types, including both Linux and Windows, should you need it. Since you run separate kernel instances, you can assure separation among applications, which ensures that issues with one kernel don’t affect the other kernels running on the same host. In addition security is enhanced due to the separation of the kernels. On top of this, you can run multiple versions of an application on the host and the VM, besides being able to perform virtual migrations as I explained earlier.
On the minus side, you must remember that VMs need more resources and you can run fewer VM on a host compared to the number of containers you can run.
Containers help you isolate applications, but maintaining the applications is a lot easier than maintaining them on virtual machines. For example, when you upgrade an application on the host, all containers that run instances of that application will benefit from that change. You can run a very large number of containers on a host machine, due to their light footprint. Theoretically speaking, you can run up to 6000 containers on a host, whereas you can only run a few VMs on a host.
Containers offer the following additional benefits:
Flexibility: since an application’s runtime requirements are included with the application in the container, you can run containers in multiple environments.
Security: since containers typically have their own network interfaces and file system that are isolated from other containers, you can isolate and secure applications running in a container from the rest of the processes running on the host server
Performance: typically containers run much faster than applications that carry the heavy overhead of a dedicated VM
Sizing: since they don’t contain an entire operating system unlike VMs, containers of course, are very compact, which makes it quite easy to share them.
Resource allocation: LXC helps you easily manage resource allocations in real time.
Versatility: You can run different Linux distributions on the same host kernel using different containers for each Linux distribution.
The key difference between KVM virtual machines and Linux containers is that KVM VMs require a separate kernel of their own while containers share the same kernel from the OS.
You can host more containers than VMs for a given hardware, since contains have a light footprint and VMs are resource hungry.
KVM Virtualization lets you:
Boot different operating systems, including non-Linux systems.
Separate kernels mean that terminating a kernel doesn’t disable the whole system.
Run multiple versions of an application on the same host since the guest VM is isolated from changes in the host.
Perform live migrations of the VMs
Linux Containers: Are designed to support the isolation of applications. Since system wide changes are visible inside all containers, any change such as an application upgrade will automatically apply to all containers that run instances of the application. ** The lightweight nature of containers means that you can a very large number of them on a host, with the maximum number running into 6000 containers or more on some systems.
Unlike a fully virtualized system, LXC won’t let you run other operating systems. However, it’s possible for you to install both a virtualized (full or para) system on the same kernel as the LXC host system and run both the virtualized guests and LXC guests simultaneously. Virtualized management APIs such as libvert and ganeti are helpful if you wish to implement such as hybrid system.
As organizations move beyond monolithic applications to microservices, new application workloads involve a connected matric put together to server specific business needs, but easily rearrangeable into a different format. Containers are a key part of this new application architecture. For developers who create applications, containers offer these benefits:
Better quality releases
Easier and faster scalability of the applications
Isolation for applications
Shorter development and test cycles and fewer deployment errors
From the point of the IT operations teams, containers provide:
Better quality releases
Efficient replacement of full virtualization
Easier management of applications When a container is instantiated, the processes execute within a new userspace created when you mount the container image. The kernel ensures that the processes in the container are limited to executing system calls only from their own namespaces such as the mount namespace and the PID namespace. The namespaces are containerized in this case.
You can take the same containerized image and run it on a laptop or servers in your datacenter, or on virtual machines in the cloud.
A virtual machine packages virtual hardware, a kernel and a user space. A container packages just the userspace – there’s no kernel or virtual hardware in a container.
Linux containers are starting to be used widely, with some large cloud service produces already using them at scale. Following are some concerns that have led to a slower than expected adoption of Linux containers.
Security: security issues are a concern with enterprise adoption, due to the fact that kernel exploits at the host OS level will mean that all contains living on that host are at risk. Vendors are therefore fine tuning security techniques such as mandatory access control (MAC) to tighten things up security wise. SELinux already offers Mandatory access controls, and a different project named libseccomp lets you eliminate syscalls, which prevent hacked containers from compromising the kernel supporting them.
Management and orchestration: vendors are working on creating frameworks for managing container images and orchestrating their lifecycle. New tools are being created for supporting containers. Docker is a great framework that makes it very easy to create, manage and delete containers. Docker can also limit the resource containers can consume, as well as provide metrics on how containers are using the resources. New tools for building and testing containers are in the wings as well. Linux container adoption will accelerate by agreeing on a standard for inter-container communications, using solutions such as virtual switching ng, hardware enhanced switching and routing and so on.
While SELinux, Cgroups and namespaces make containerization possible, a key missing piece of Linux containers is the ability to manage them – Docker solves this problem very nicely, as explained in the next chapter, which is all about Docker containers.
LXc is a userspace interface for providing containment features for the Linux kernel. It enables you to easily create and manage both system and application containers. It also helps you easily automate container deployment. LXC seeks to create a Linux environment that’s close to the standard Linux installations, but without using a separate kernel. LXC containers are usually regarded as a midway solution between a chroot and a full-fledged virtual machine.
LXC is free software, with most of the code releases under the terms of a GNU license. While LXc, is quite useful, it does have some requirements such as the following:
Editing of configuration files for controlling resources
It (LXc) maybe implemented differently between distributions, or even among different releases of the same distribution
Docker offers a much more efficient and powerful way to create and manage Linux containers. Docker is an application that enables almost effortless management of Linux containers through a standard format. Docker isn’t a totally new technology – it builds on the concepts you’ve seen earlier in this chapter, such as namespaces, cgroups and LXc to go beyond what’s possible with userspace tools such as LXc. Besides helping you efficiently manage containers, Docker, since it is a type of image based containerization, makes containers portable, thus making it easy to share them across hosts.
With this useful background of Linux containers, let’s turn to a discussion of Docker containers in the next chapter.