Chapter 1. Unikernels: A New Technology to Combat Current Problems

At the writing of this report, unikernels are the new kid on the cloud block. Unikernels promise small, secure, fast workloads, and people are beginning to see that this new technology could help launch a new phase in cloud computing.

To put it simply, unikernels apply the established techniques of embedded programming to the datacenter. Currently, we deploy applications using beefy general-purpose operating systems that consume substantial resources and provide a sizable attack surface. Unikernels eliminate nearly all the bulk, drastically reducing both the resource footprint and the attack surface. This could change the face of the cloud forever, as you will soon see.

What Are Unikernels?

For a functional definition of a unikernel, let’s turn to the burgeoning hub of the unikernel community, Unikernel.org, which defines it as follows:

Unikernels are specialised, single-address-space machine images constructed by using library operating systems.

In other words, unikernels are small, fast, secure virtual machines that lack operating systems.

I could go on to focus on the architecture of unikernels, but that would beg the key question: why? Why are unikernels really needed? Why can’t we simply live with our traditional workloads intact? The status quo for workload construction has remained the same for years; why change it now?

Let’s take a good, hard look at the current problem. Once we have done that, the advantages of unikernels should become crystal clear.

The Problem: Our Fat, Insecure Clouds

When cloud computing burst on the scene, there were all sorts of promises made of a grand future. It was said that our compute farms would magically allocate resources to meet the needs of applications. Resources would be automatically optimized to do the maximum work possible with the assets available. And compute clouds would leverage assets both in the datacenter and on the Internet, transparently to the end user.

Given these goals, it is no surprise that the first decade of the cloud era focused primarily on how to do these “cloudy” things. Emphasis was placed on developing excellent cloud orchestration engines that could move applications with agility throughout the cloud. That was an entirely appropriate focus, as the datacenter in the time before the cloud was both immobile and slow to change. Many system administrators could walk blindfolded through the aisles of their equipment racks and point out what each machine did for what department, stating exactly what software was installed on each server. The placement of workloads on hardware was frequently laborious and static; changing those workloads was a slow, difficult, and arduous task, requiring much verification and testing before even the smallest changes were made on production systems.

The advent of cloud orchestration software (OpenStack, CloudStack, openNebula, etc.) altered all that—and many of us were very grateful. The ability of these orchestration systems to adapt and change with business needs turned the IT world on its head. A new world ensued, and the promise of the cloud seemed to be fulfilled.

Security Is a Growing Problem

However, as the cloud era dawned, it became evident that a good orchestration engine alone is simply not enough to make a truly effective cloud. A quick review of industry headlines over the past few years yields report after report of security breaches in some of the most impressive organizations. Major retailers, credit card companies, even federal governments have reported successful attacks on their infrastructure, including possible loss of sensitive data. For example, in May 2016, the Wall Street Journal ran a story about banks in three different countries that had been recently hacked to the tune of $90 million in losses. A quick review of the graphic representation of major attacks in the past decade will take your breath away. Even the US Pentagon was reportedly hacked in the summer of 2011. It is no longer unusual to receive a letter in the mail stating that your credit card is being reissued because credit card data was compromised by malicious hackers.

I began working with clouds before the term “cloud” was part of the IT vernacular. People have been bucking at the notion of security in the cloud from the very beginning. It was the 800-pound gorilla in the room, while the room was still under construction!

People have tried to blame the cloud for data insecurity since day one. But one of the dirty little secrets of our industry is that our data was never as safe as we pretended it was. Historically, many organizations have simply looked the other way when data security was questioned, electing instead to wave their hands and exclaim, “We have an excellent firewall! We’re safe!” Of course, anyone who thinks critically for even a moment can see the fallacy of that concept. If firewalls were enough, there would be no need for antivirus programs or email scanners—both of which are staples of the PC era.

Smarter organizations have adopted a defense-in-depth concept, in which the firewall becomes one of several rings of security that surround the workload. This is definitely an improvement, but if nothing is done to properly secure the workload at the center of consideration, this approach is still critically flawed.

In truth, to hide a known weak system behind a firewall or even multiple security rings is to rely on security by obscurity. You are betting that the security fabric will keep the security flaws away from prying eyes well enough that no one will discover that data can be compromised with some clever hacking. It’s a flawed theory that has always been hanging by a thread.

Well, in the cloud, security by obscurity is dead! In a world where a virtual machine can be behind an internal firewall one moment and out in an external cloud the next, you cannot rely on a lack of prying eyes to protect your data. If the workload in question has never been properly secured, you are tempting fate. We need to put away the dreams of firewall fairy dust and deal with the cold, hard fact that your data is at risk if it is not bolted down tight!

The Cloud Is Not Insecure; It Reveals That Our Workloads Were Always Insecure

The problem is not that the cloud introduces new levels of insecurity; it’s that the data was never really secure in the first place. The cloud just made the problem visible—and, in doing so, escalated its priority so it is now critical.

The best solution is not to construct a new type of firewall in the cloud to mask the deficiencies of the workloads, but to change the workloads themselves. We need a new type of workload—one that raises the bar on security by design.

Today’s Security is Tedious and Complicated, Leaving Many Points of Access

Think about the nature of security in the traditional software stack:

  1. First, we lay down a software base of a complex, multipurpose, multiuser operating system.

  2. Next, we add hundreds—or even thousands—of utilities that do everything from displaying a file’s contents to emulating a hand-held calculator.

  3. Then we layer on some number of complex applications that will provide services to our computing network.

  4. Finally, someone comes to an administrator or security specialist and says, “Make sure this machine is secure before we deploy it.”

Under those conditions, true security is unobtainable. If you applied every security patch available to each application, used the latest version of each utility, and used a hardened and tested operating system kernel, you would only have started the process of making the system secure. If you then added a robust and complex security system like SELINUX to prevent many common exploits, you would have moved the security ball forward again. Next comes testing—lots and lots of testing needs to be performed to make sure that everything is working correctly and that typical attack vectors are truly closed. And then comes formal analysis and modeling to make sure everything looks good.

But what about the atypical attack vectors? In 2015, the VENOM exploit in QEMU was documented. It arose from a bug in the virtual floppy handler within QEMU. The bug was present even if you had no intention of using a virtual floppy drive on your virtual machines. What made it worse was that both the Xen Project and KVM open source hypervisors rely on QEMU, so all these virtual machines—literally millions of VMs worldwide—were potentially at risk. It is such an obscure attack vector that even the most thorough testing regimen is likely to overlook this possibility, and when you are including thousands of programs in your software stack, the number of obscure attack vectors could be huge.

But you aren’t done securing your workload yet. What about new bugs that appear in the kernel, the utilities, and the applications? All of these need to be kept up to date with the latest security patches. But does that make you secure? What about the bugs that haven’t been found yet? How do you stop each of these? Systems like SELINUX help significantly, but it isn’t a panacea. And who has certified that your SELINUX configuration is optimal? In practice, most SELINUX configurations I have seen are far from optimal by design, since the fear that an aggressive configuration will accidentally keep a legitimate process from succeeding is quite real in many people’s minds. So many installations are put into production with less-than-optimal security tooling.

The security landscape today is based on a fill-in-defects concept. We load up thousands of pieces of software and try to plug the hundreds of security holes we’ve accumulated. In most servers that go into production, the owner cannot even list every piece and version of software in place on the machine. So how can we possibly ensure that every potential security hole is accounted for and filled? The answer is simple: we can’t! All we can do is to do our best to correct everything we know about, and be diligent to identify and correct new flaws as they become known. But for a large number of servers, each containing thousands of discrete components, the task of updating, testing, and deploying each new patch is both daunting and exhausting. It is no small wonder that so many public websites are cracked, given today’s security methodology.

And Then There’s the Problem of Obesity

As if the problem of security in the cloud wasn’t enough bad news, there’s the problem of “fat” machine images that need lots of resources to perform their functions. We know that current software stacks have hundreds or thousands of pieces, frequently using gigabytes of both memory and disk space. They can take precious time to start up and shut down. Large and slow, these software stacks are virtual dinosaurs, relics from the stone age of computing.

The recipe for constructing software stacks has remained almost unchanged since the time before the IBM PC when minicomputers and mainframes were the unquestioned rulers of the computing landscape. For more than 35 years, we have employed software stacks devised in a time when hardware was slow, big, and expensive. Why? We routinely take “old” PCs that are thousands of times more powerful than those long-ago computing systems and throw them into landfills. If the hardware has changed so much, why hasn’t the software stack?

Using the old theory of software stack construction, we now have clouds filled with terabytes of unneeded disk space using gigabytes of memory to run the simplest of tasks. Because these are patterned after the systems of long ago, starting up all this software can be slow—much slower than the agile promise of clouds is supposed to deliver. So what’s the solution?

Slow, Fat, Insecure Workloads Need to Give Way to Fast, Small, Secure Workloads

We need a new type of workload in the cloud. One that doesn’t waste resources. One that starts and stops almost instantly. One that will reduce the attack surface of the machine so it is not so hard to make secure. A radical rethink is in order.

A Possible Solution Dawns: Dockerized Containers

Given this need, it is no surprise that when Dockerized containers made their debut, they instantly became wildly popular. Even though many people weren’t explicitly looking for a new type of workload, they still recognized that this technology could make life easier in the cloud.

Note

For those readers who might not be intimately aware of the power of Dockerized containers, let me just say that they represent a major advance in workload deployment. With a few short commands, Docker can construct and deploy a canned lightweight container. These container images have a much smaller footprint than full virtual machine images, while enjoying snap-of-the-finger quick startup times.

There is little doubt that the combination of Docker and containers does make massive improvements in the right direction. That combination definitely makes the workload smaller and faster compared to traditional VMs.

Containers necessarily share a common operating system kernel with their host system. They also have the capability to share the utilities and software present on the host. This stands in stark contrast to a standard virtual (or hardware) machine solution, where each individual machine image contains separate copies of each piece of software needed. Eliminating the need for additional copies of the kernel and utilities in each container on a given host means that the disk space consumed by the containers on that host will be much smaller than a similar group of traditional VMs.

Containers also can leverage the support processes of the host system, so a container normally only runs the application that is of interest to the owner. A full VM normally has a significant number of processes running, which are launched during startup to provide services within the host. Containers can rely on the host’s support processes, so less memory and CPU is consumed compared to a similar VM.

Also, since the kernel and support processes already exist on the host, startup of a container is generally quite quick. If you’ve ever watched a Linux machine boot (for example), you’ve probably noticed that the lion’s share of boot time is spent starting the kernel and support processes. Using the host’s kernel and existing processes makes container boot time extremely quick—basically that of the application’s startup.

With these advances in size and speed, it’s no wonder that so many people have embraced Dockerized containers as the future of the cloud. But the 800-pound gorilla is still in the room.

Containers are Smaller and Faster, but Security is Still an Issue

All these advances are tremendous, but the most pressing issue has yet to be addressed: security. With the number of significant data breaches growing weekly, increasing security is definitely a requirement across the industry. Unfortunately, containers do not raise the bar of security nearly enough. In fact, unless the administrator works to secure the container prior to deployment, he may find himself in a more vulnerable situation than when he was still using a virtual machine to deploy the service.

Now, the folks promoting Dockerized containers are well aware of that shortfall and are expending a large amount of effort to fix the issue—and that’s terrific. However, the jury is still out on the results. We should be very mindful of the complexity of the lockdown technology. Remember that Dockerized containers became the industry darling precisely because of their ease of use. A security add-on that requires some thought—even a fairly modest amount—may not be enacted in production due to “lack of time.”

Note

I remember when SELINUX started to be installed by default on certain Linux distributions. Some people believed this was the beginning of the end of insecure systems. It certainly seemed logical to think so—unless you observed what happened when people actually deployed those systems. I shudder to think how many times I’ve heard, “we need to get this server up now, so we’ll shut off SELINUX and configure it later.” Promising to “configure SELINUX when there’s time” carries about as much weight as a politician’s promise to secure world peace. Many great intentions are never realized for the perception of “lack of time.”

Unless the security solution for containers is as simple as using Docker itself, it stands an excellent chance of dying from neglect. The solution needs to be easy and straightforward. If not, it may present the promise of security without actually delivering it in practice. Time will tell if container security will rise to the needed heights.

It Isn’t Good Enough to Get Back to Yesterday’s Security Levels; We Need to Set a Higher Bar

But the security issue doesn’t stop with ease of use. As we have already discussed, we need to raise the level of security in the cloud. If the container security story doesn’t raise the security level of workloads by default, we will still fall short of the needed goal.

We need a new cloud workload that provides a higher level of security without expending additional effort. We must stop the “come from behind” mentality that makes securing a system a critical afterthought. Instead, we need a new level of security “baked in” to the new technology—one that closes many of the existing attack vectors.

A Better Solution: Unikernels

Thankfully, there exists a new workload theory that provides the small footprint, fast startup, and improved security we need in the next-generation cloud. This technology is called unikernels. Unikernels represent a radically different theory of an enterprise software stack—one that promotes the qualities needed to create and radically improve the workloads in the cloud.

Smaller

First, unikernels are small—very small; many come in at less than a megabyte in size. By employing a truly minimalist concept for software stack creation, unikernels create actual VMs so tiny that the smallest VM allocations by external cloud providers are huge by comparison. A unikernel literally employs the functions needed to make the application work, and nothing more. We will see examples of these in the subsection “Let’s Look at the Results”.

Faster

Next, unikernels are very quick to start. Because they are so tiny, devoid of the baggage found in a traditional VM stack, unikernels start up and shut down amazingly quickly—often measured in milliseconds. The subsection “Let’s Look at the Results” will discuss a few examples. In the “just in time” world of the cloud, a service that can be created when it is needed, and terminated when the job is done, opens new doors to cloud theory itself.

And the 800-Pound Gorilla: More Secure

And finally, unikernels substantially improve security. The attack surface of a unikernel machine image is quite small, lacking the utilities that are often exploited by malicious hackers. This security is built into the unikernel itself; it doesn’t need to be added after the fact. We will explore this in “Embedded Concepts in a Datacenter Environment”. While unikernels don’t achieve perfect security by default, they do raise the bar significantly without requiring additional labor.

Get Unikernels now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.