Chapter 8. Buildpacks

Buildpacks provide the magic and flexibility that make running your apps on Heroku so easy. When you push your code, the buildpack is the component that is responsible for setting up your environment so that your application can run. The buildpack can install dependencies, customize software, manipulate assets, and do anything else required to run your application. Heroku didn’t always have buildpacks; they’re a new component that came with the Cedar stack.

To better understand why the buildpack system is so useful, let’s take a look at a time before buildpacks. Let’s take a look at the original Aspen stack.

Before Buildpacks

When Heroku first launched, its platform ran a stack called Aspen. The Aspen stack only ran Ruby code—version 1.8.6 to be exact. It had a read-only filesystem (no writes allowed). It had almost every publicly available Ruby dependency, known as gems, pre-installed, and it only supported Ruby on Rails applications.

Developers quickly fell in love with the Heroku workflow provided by Aspen and wanted to start using the platform for other things. The Ruby community saw Rack-based applications grow in popularity, especially Sinatra, and there was also an explosion in the number of community-built libraries being released. If you were a Ruby developer during this time you could be sure to find an acts_as library for whatever you wanted. While that was good for the community, keeping a copy of every gem on Aspen wasn’t maintainable or sane, especially when different gems started requiring specific versions of other gems to work correctly. Heroku needed a way to deploy different types of Ruby applications and a different way to manage external library dependencies.

Recognizing the need for a more flexible system, Heroku released their next stack, called Bamboo. It had very few gems installed by default and instead favored declaring dependencies in a .gems file that you placed in the root of your code repository (this was before Ruby’s now-common dependency management system, called bundler, or the Gemfile, used to declare dependencies, existed). The Bamboo stack had just about everything a Ruby developer could want, but it didn’t easily allow for custom binaries to be used, and it certainly didn’t allow non-Ruby developers to take advantage of the Heroku infrastructure and workflow. The need for flexibility and extensibility drove Heroku to release their third and most recent stack, called Cedar. This stack was the first to support the buildpack system.

Introducing the Buildpack

With the Cedar stack, the knowledge gleaned preparing innumerable Ruby apps to run on Aspen and Bamboo was abstracted out into a separate system called a buildpack. This system was kept separate from the rest of the platform so that it could be quickly iterated as the needs of individual languages grew. Buildpacks act as a clean interface between the runtimes, which is the system that actually run the apps, and your application code. Among others, Heroku now supports buildpacks for Ruby, Java, Python, Grails, Clojure, and NodeJS. The buildpack system is open source, so anyone can fork and customize an existing buildpack. Developers might want to do this to add extra functionality such as a native CouchDB driver, or to install a compiled library like wkhtmltopdf. With a forked buildpack, you can do just about anything you want on a Heroku instance.

Even if you’re not interested in building and maintaining a buildpack of your own, understanding how they work and how they are architected can be vital in understanding the deploy process on Heroku. Once you push your code up to Heroku, the system will either grab the buildpack you have listed under the config var BUILDPACK_URL or it will cycle through all of the officially supported buildpacks to find one that it detects can correctly build your code. Once the detect phase returns, the system then calls a compile method to get your code production ready, followed by a release method to finalize the deployment.

Let’s take a look at each of these steps in turn.

Detect

When Heroku calls the detect command on a buildpack, it runs against the source code you push. The detect method is where a Ruby buildpack could check for existence of a Gemfile or a config.ru to check that it can actually run a given project. A Python buildpack might look for .py and a Java buildpack might look for the existence of .mvn files. If the buildpack detects that it can build the project, it returns a 0 to Heroku; this is the UNIX way of communicating “everything ran as expected.” If the buildpack does not find the files it needs, it returns a nonzero status, usually 1. If a buildpack returns a nonzero result, Heroku cancels the deploy process.

Once a buildpack detect phase returns a 0 (which tells the system that it can be run), the compile step will be called.

Compile

The compile step is where the majority of the build process takes place. Once the application type has been detected, different setup scripts can be run to configure the system, such as installing binaries or processing assets. One example of this is running rake assets:precompile in a recent Rails project, which will turn SASS-based styles into universal CSS, and CoffeeScript files into universal JavaScript.

Note

In the compile stage, the configuration environment variables are not available. A well-architected app should compile the same regardless of configuration.

A cache directory is provided during the compile stage, and anything put in here will be available between deploys. The cache directory can be used to speed up future deploys (e.g., by storing downloaded items such as external dependencies, so they won’t need to be downloaded again, which can be time consuming or prone to occasional failure).

Binary files that have been prebuilt against the Heroku runtime can be downloaded in this stage. That is how the Java buildpack supports multiple versions of Java, and how the Ruby buildpack supports multiple versions of Ruby. Any code that needs to be compiled and run as a binary should be done in advance and made available on a publicly available source such as Amazon’s S3 service.

Once the compilation phase is complete, the release phase will be called. This pairs the read-built application with the configuration required to run it in production.

Release

The release stage is when the compiled app gets paired with environment variables and executed. No changes to disk should be made here, only changes to environment variables. Heroku expects the return in YAML format with three keys: addons if there are any default add-ons; config_vars, which supplies a default set of configuration environment variables; and default_process_types, which will tell Heroku what command to run by default (i.e., web).

One of the most important values a buildpack may need to set is the PATH config var. The values passed in these three keys are all considered defaults (i.e., they will not overwrite any existing values in the application). Here is an example of YAML output a buildpack might output:

---
addons:
config_vars:
  PATH: $PATH:/app/vendor/mruby_bin
default_process_types:

Note

This YAML output will only set default environment variables; if you need to overwrite them, you need to use a profile script.

Once the release phase is through, your application is deployed and ready to run any processes declared in the Procfile.

Profile.d Scripts

The release phase of the build process allows you to set default environment variables, referred to by Heroku as config, on your application. While this functionality is useful it does not allow you to overwrite an existing variable in the build process. To accomplish this, you can use a profile.d directory, which can contain one or more scripts that can change environment variables.

For instance, if you had a custom directory inside of your application named foo/ that contained a binary that you wished to be executed in your application, you could prepend it to the front of your path by creating a foo.sh file in ./profile.d/ so it would reside in ./profile.d/foo.sh and could contain a path export like this:

export PATH="$HOME/foo:$PATH"

If you’ve written this file correctly in the compile stage of your build process, then after you deploy, your foo will appear in your PATH.

You can get more information on these three steps and more through the buildpack documentation.

Now you can detect, compile, release, and even configure your environment with profile.d scripts. While most applications only need functionality provided in the default buildpacks, you can extend them without needing to fork and maintain your own custom buildpack. Instead, you can use multiple buildpacks with your application.

Leveraging Multiple Buildpacks

Buildpacks give you the ability to do just about anything you want on top of Heroku’s platform. Unfortunately, using a custom buildpack means that you’re giving up having someone else worry about your problems and you’re taking on the responsibility of making sure your application compiles and deploys correctly instead of having Heroku take care of it for you. Is there a way to use Heroku’s maintained buildpack, but to also add custom components? Of course there is: you can use a custom buildpack called Multi, to run multiple buildpacks in a row. Let’s take a look at one way you might use it.

In a traditional deployment setup, you would have to manually install binaries (see Binary Crash Course) like Ruby or Java just to be able to deploy. Luckily, Heroku’s default buildpacks will take care of most components our systems need, but it’s not unreasonable to imagine a scenario that would require a custom binary such as whtmltopdf, a command-line tool for converting HTML into PDFs. In these scenarios, how do we get the code we need on our Heroku applications?

You’ll need to compile the program you want so it can run on Heroku. You can find more information on how to do this in Heroku’s developer center. At the time of writing, the best way to compile binaries for Heroku systems is to use the Vulcan or Anvil library.

Once you’ve got the binary, you could fork a buildpack and add some custom code that installs the binary for you. Instead, we recommend creating a lightweight buildpack that only installs that binary. Once you’ve got this simple buildpack, you can leverage the existing Heroku-maintained buildpacks along with another community-maintained buildpack called “heroku-buildpack-multi.” This “Multi Buildpack” is a meta buildpack that runs an arbitrary number of buildpacks. To use it, first set the BUILDPACK_URL in your application:

$ heroku config:add BUILDPACK_URL=https://github.com/ddollar
                /heroku-buildpack-multi.git

Note

Instead of deploying using someone else’s buildpack from GitHub, you should fork it and deploy using your copy. This prevents them from making breaking changes or deleting the code. A custom buildpack needs to be fetched on every deploy, so someone deleting his repository on GitHub could mean that you can’t deploy.

Once you’ve got the BUILDPACK_URL config set properly, make a new file called .buildpacks and add the URLs to your custom buildpack and the Heroku-maintained buildpack.

You can see the documentation on Multi Buildpacks for examples and more options.

Quick and Dirty Binaries in Your App

If making your own mini buildpack seems like too much work, you can compile a binary and include it in your application’s repository. You can then manually change the PATH config variable to include that directory, and you’re good to go. While this process is simpler, it has a few drawbacks. It increases the size of your repository, which means it is slower to move around on a network. It hardcodes the dependency into your codebase, which can litter your primary application source control with nonrelated binary commits. It requires manually setting a PATH on new apps, which must be done for each new app, and it makes the binary not easily reusable for multiple apps. With these limitations in mind, it’s up to you to pick the most appropriate solution.

We recommend using the multi buildpack approach when possible.

The Buildpack Recap

The buildpack is a low-level primitive on the Heroku platform, developed over years of running applications and three separate platform stack releases. It gives you the ability to have fine-grained control over how your application is compiled to be run. If you need to execute something that Heroku doesn’t support yet, a custom buildpack is a great place to start looking. Remember, though, that an application’s config variables are not available to the buildpack at compile time. It’s easy to forget, so don’t say you weren’t warned.

Most applications won’t ever need to use a custom buildpack, but understanding how the system works and having the background to utilize them if you need to is invaluable.

Get Heroku: Up and Running now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.