Preface

If you are a CloudStack user, you should read this book! If you are a CloudStack developer, you should read this book! If you are a DevOps-minded person, you should read this book! If you are an application developer, you should read this book! This might sound like a joke, but this is really the intent. This book covers the Apache CloudStack ecosystem, but it also introduces tools that are used in different setups. For example, we’ll take a look at tools such as Chef, Ansible, and Vagrant, as well as applications (e.g., Hadoop) and storage solutions (e.g., RiakCS). This is much more than just CloudStack.

This is not a standard cookbook with multiple recipes on a single topic. It covers a variety of tools and provides introductory material for each. It is meant to be used as a reference that you can open at any time for a quick tutorial on how to use a specific tool or application so that you can make effective use of it. Used in combination with CloudStack, these tools are becoming core technologies used by developers, system administrators, and architects alike. They build on the foundation of a solid cloud and empower IT professionals to do things better and faster.

Why I Wrote This Book

I have been working with virtualization and what became known as clouds since around 2002. If we want to build a cloud, we now have several open source solutions, which Marten Mickos, CEO of Eucalyptus, has called the four sisters: CloudStack, Eucalyptus, OpenNebula, and OpenStack. Successful private and public clouds are currently operational all over the world using these solutions, so it appears that building a cloud is now a solved problem. The capabilities of those clouds are certainly different and the scalability of each solution—as well as some specific networking or storage features—might be different, but they are operational and in production. This is why I believe that instead of an installation book, it is important to look at the software ecosystem of those cloud solutions and start thinking about using the cloud, integrating it in the development and operational processes so that we can provide higher level services using this foundation and start getting some return on investment.

Since I joined the Apache CloudStack community in July 2012, I have worked actively to test and, when necessary, develop CloudStack drivers in a lot of tools that make the arsenal of today’s IT developer and system administrators. Increasingly, I believe that users can also leverage these tools directly. I wanted to write this book so that I could share my experience with testing these tools and explain how they are relevant to answer the question “I have a cloud, now what?” Then we can get back to focusing on the problems at hand: reliable application hosting, distributed application deployments, data processing, and so on.

The cloud has matured, and this book will show you various tools and techniques to take full advantage of it so that you can stop worrying about the implementation details of your cloud and get back to working on your applications.

CloudStack Within the Cloud Computing Picture in 500 Words

Cloud computing can be a very nebulous term—for some it is an online application, for others it is a virtualization system. To set the record straight, the definition put forth by the National Institute for Standards and Technology (NIST) is helpful. In its 2011 report, NIST defined cloud computing as follows:

Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort…

The NIST definition goes on to define the essential characteristics of clouds (i.e., on-demand, network access, multitenancy, elasticity, and metering). It continues by defining three service models: software as a service (SaaS), platform as a service (PaaS), and infrastructure as a service (IaaS). It also identifies four deployment models: private cloud, public cloud, hybrid cloud, and community cloud (note that community cloud is a less recognized model and is not commonly used today).

The SaaS to IaaS model can be mapped to the old ISO model. SaaS deals with application delivery, IaaS deals with infrastructure management, and PaaS is everything in between. That’s a very simplified view of things, but it’s not too far off. SaaS refers to online application hosting: users will access the application interface over the Internet, and all the work that happens in the background to make the application available and scalable is entirely hidden from the end user (as it should be). Gmail (and most Google services, including Calendar and Docs) is a typical SaaS example. PaaS represents what we used to call middleware, and makes the link between the end-user application and the underlying infrastructure that it is running on. A PaaS solution is aimed at developers who do not want to worry about the infrastructure. PaaS is a fast-moving area these days with solutions such as Openshift, CloudFoundry, and Cloudify receiving a lot of attention and being developed extremely fast. IaaS is the infrastructure layer that orchestrates the work typically done by system administrators to host the applications, including server provisioning, network management, and storage allocation.

Apache CloudStack is an infrastructure as a service (IaaS) software solution. It aims at managing large sets of virtual machine instances in a highly available, highly scalable way. It is used to build public or private clouds to offer on-demand, elastic, multitenant compute, storage, and network services. As mentioned earlier, it is known as one of the four sisters of open source cloud computing that allows you to build an Amazon EC2 clone.

CloudStack’s development was begun by a Silicon Valley startup called VMOps in 2008. The company was renamed Cloud.com in 2010, and in 2011, Citrix Systems acquired Cloud.com. In April 2012, Citrix donated CloudStack to the Apache Software Foundation (ASF). CloudStack then entered the Apache Incubator and became a trademark of the ASF, graduating to become a top-level ASF project in March 2013, joining other open source projects like HTTPD and Hadoop.

How This Book Is Organized

To get you up to speed on the Apache CloudStack ecosystem, the book is organized in three parts with two chapters each. Part I discusses installation steps, both from source and from binaries:

  • Chapter 1, Installing from Source covers some basic installation steps for developers. The CloudStack documentation provides complete installation instructions, so we will not cover these details here. Instead, this chapter is meant to introduce CloudStack and some features that can help ecosystem development (e.g., the simulator and DevCloud, the CloudStack sandbox).
  • Chapter 2, Installing from Packages is a step-by-step installation guide for Ubuntu 14.04 using KVM. This guide can be followed on a local machine using VMware fusion (to do nested virtualization with KVM) or on physical hardware. It is intended for users who do not want to compile from source.

Part II discusses API clients and wrappers:

  • Chapter 3, API Clients explains how to sign an API request and then goes through a few clients, including CloudMonkey (the official CloudStack command-line interface), Apache Libcloud (a Python module that abstracts the differences between cloud providers’ APIs), Apache jclouds (a Java library with a similar goal as libcloud), and CloStack (a Clojure-based client specific for CloudStack). This chapter should give everyone a taste of a client in their favorite language. This chapter will be interesting to folks who want to use the CloudStack API and write their own applications on top of it.
  • Chapter 4, API Interfaces presents three applications that provide a different API in front of the CloudStack API. They are sometimes called API bridges or wrappers. These applications run as servers on the user’s machine or within the cloud provider infrastructure, and expose a different API. For example, EC2Stack exposes an EC2-compatible interface, gstack exposes a GCE-compatible interface, and rOCCI exposes a standardized interface. In addition, this chapter presents Boto and Eutester, two Python modules written by the Eucalyptus team. Boto is a client to Amazon Web Services (AWS) and Eutester is a testing framework. CloudStack users will be able to use these modules in combination with EC2Stack.

Part III discusses configuration management and some advanced recipes:

  • Chapter 5, Configuration Management starts with an introduction to Veewee and Packer. Moving on from there, it presents several recipes about Vagrant, a software development tool that helps test configurations locally and then deploys in the cloud in a repeatable manner. With some knowledge of Vagrant, the rest of the chapter is dedicated to the introduction of two configuration management solutions, Ansible and Chef. These solutions have CloudStack plug-ins that help deploy applications in the cloud. This chapter will be interesting to the DevOps community.
  • Chapter 6, Advanced Recipes goes into some more advanced topics. We look at two important aspects of the cloud infrastructure itself: monitoring and storage. We introduce RiakCS and show how it can be used as an image catalog. We also show how to use Fluent for log aggregation in combination with Elasticsearch and MongoDB. Finally, we introduce Apache Whirr, an application orchestrator built on top of jclouds that can be used to deploy and run distributed systems like Hadoop.

Finally, Part IV summarizes the book and provides some tips for further reading and investigation.

Technology You Need to Understand

This book is of an intermediate level and requires a minimum understanding of a few development and system administration concepts. Before diving into the book, you might want to review:

bash (Unix shell)
This is the default Unix shell on Linux and OS X. Familiarity with the Unix shell, such as editing files, setting file permissions, moving files around the filesystems, user privileges, and some basic shell programming will be very beneficial. If you don’t know the Linux shell in general, consult books such as Cameron Newham’s Learning the Bash Shell or Carl Albing, JP Vossen, and Cameron Newham’s bash Cookbook, both from O’Reilly.
Package management
The tools we will present in this book often have multiple dependencies that need to be met by installing some packages. Knowledge of the package management on your machine is therefore required. It could be apt on Ubuntu/Debian systems, yum on CentOS/RHEL systems, port or brew on OS X. Whatever it is, make sure that you know how to install, upgrade, and remove packages.
Git
Git has established itself as the standard for distributed version control. If you are already familiar with CVS and SVN, but have not yet used Git, you should. Version Control with Git by Jon Loeliger and Matthew McCullough (O’Reilly) is a good start. Together with Git, the GitHub website is a great resource to get started with a hosted repository of your own. To learn GitHub, try http://training.github.com and the associated interactive tutorial.
Python
In addition to programming with C/C++ or Java, I always encourage students to pick up a scripting language of their choice. Perl used to rule the world, while these days, Ruby and Go seem to be prevalent. I personally use Python. Most examples in this book use Python but there are a few examples with Ruby, one even uses Clojure. O’Reilly offers an extensive collection of books on Python, including Introducing Python by Bill Lubanovic, Programming Python by Mark Lutz, and Python Cookbook by David Beazley and Brian K. Jones.

Those are your weapons: your shell, your package manager, your GitHub account, and some Python. If you don’t know these tools (and especially Python), you need not worry. There are recipes for Rubyists and Clojure programmers. You will be able to pick things up as you go along.

Online Content

If you want to take a self-paced training on a few of the tools described in this book, head over to http://codac.co, an online tutorial I have presented several times. It makes use of exoscale, a CloudStack-based public cloud. You can register for free on exoscale and you will get free credits that should allow you to go through the tutorial.

Although a lot of the content in this book has been tested on exoscale, there are other public CloudStack clouds that you can use to test these tools and even go to production. You might consider getting an account with any or all of these:

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic
Indicates new terms, URLs, email addresses, filenames, and file extensions.
Constant width
Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.
Constant width bold
Shows commands or other text that should be typed literally by the user.
Constant width italic
Shows text that should be replaced with user-supplied values or by values determined by context.

Tip

This element signifies a tip, suggestion, or general note.

Warning

This element indicates a warning or caution.

Safari® Books Online

Note

Safari Books Online is an on-demand digital library that delivers expert content in both book and video form from the world’s leading authors in technology and business.

Technology professionals, software developers, web designers, and business and creative professionals use Safari Books Online as their primary resource for research, problem solving, learning, and certification training.

Safari Books Online offers a range of plans and pricing for enterprise, government, education, and individuals.

Members have access to thousands of books, training videos, and prepublication manuscripts in one fully searchable database from publishers like O’Reilly Media, Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit Press, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, Course Technology, and hundreds more. For more information about Safari Books Online, please visit us online.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

O’Reilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://shop.oreilly.com/product/0636920034377.do.

To comment or ask technical questions about this book, send email to .

For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.

Find us on Facebook: http://facebook.com/oreilly

Follow us on Twitter: http://twitter.com/oreillymedia

Watch us on YouTube: http://www.youtube.com/oreillymedia

Acknowledgments

Perhaps very strangely, I would like to thank the entire Amazon Web Services team for doing an amazing job providing cloud services that are revolutionizing the IT landscape. Amazon was the first to deliver on the vision of computing as a utility, and it has been a huge driver and innovator in the way we interact with compute resources. I would also like to thank the entire Apache Software Foundation CloudStack community, who works extremely hard to develop and release CloudStack—without a healthy community, there is no ecosystem (and vice versa). A huge thank you goes to Mike Tutkowski, Jeff Moody, and Pierre-Yves Ritschard, who took the time to review the book and gave me some very valuable feedback. Finally, I would like to thank Mark Hinkle, who gave me the time to write this book, and Brian Anderson, who took calls from me and brainstormed with me as we tried to figure out the best format for this book.

Get 60 Recipes for Apache CloudStack now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.