Posted on by & filed under Content - Highlights and Reviews, Programming & Development.

A guest post by Paul Lathrop, a software engineer for Krux Digital, Inc. In his prior life as an operations engineer, Paul specialized in building and/or breaking complex systems using Puppet for several companies including Digg, SimpleGeo, and Krux.

If configuration management and web operations go together like peanut butter and jelly (beans and toast, perhaps?), then Puppet and Hiera go together like peanut butter and chocolate. If you don’t already know what Hiera is, head on over to the Puppet Labs site to learn why you want to use Hiera. If that doesn’t convince you, you’ll want to read about automatic parameter look up, the “killer app” of Puppet – Hiera integration.

Now that you know what Hiera is, and why you might want to use it, I’m here to help you with the hard parts of using Hiera: figuring out how to organize your data, version it, deploy it, and use it effectively in your Puppet manifests. The Puppet Labs documentation covers installation and configuration of Hiera pretty well, so I’m not going to cover that here. Instead, I’m going to show you how to take advantage of Hiera, and share some of the best practices I’ve stumbled across while using Hiera in production.

Hiera (true to the name) uses an ordered hierarchy to look up data. This means you can start with a common base of configuration data and override it as appropriate. Like Puppet, Hiera is designed to be a piece of infrastructure plumbing that is flexible enough to configure for your environment. It’s fairly easy to plug Hiera into a variety of data sources, but I’ve gotten a lot of mileage out of the basic YAML back-end, which uses YAML files to organize Hiera data. There’s also a JSON back-end built in, if you prefer to store your configuration data as JSON.

Like Puppet manifests, your Hiera data files should be placed under version control. The advantages of version control are even clearer when applied to your configuration data. Imagine having package versions in Hiera and being able to track commit messages for each version update, or being able to use git bisect to track down the change that caused an outage in production. Versioning your configuration data is very powerful. I would not recommend placing your Hiera data in the same repository as your Puppet manifests. To get the most use out of Hiera, you’ll want to parameterize a lot of your Puppet manifests; very quickly your manifests will be unable to compile if your Hiera data is unavailable. Early on, I realized that this dependency means you don’t want to use Puppet to deploy your Hiera data or you’ll have a pretty hairy bootstrapping problem. This isn’t as critical if you use a puppetmaster, but I prefer a masterless puppet for a variety of reasons, primarily related to the difficulty of scaling the puppetmaster.

In production, I store Hiera data under /etc/hiera – a checkout of the git repository where I version control the data. The primary Hiera configuration is /etc/hiera/hiera.yaml (symlinked into /etc/puppet so that Puppet knows where to find the Hiera data.), and it is very simple:

Hiera configuration can make use of Puppet variables (with some caveats). Puppet variables are referenced with the ${varname} notation, with colons identifying the Puppet scope of the variable (:: is the top-level scope). This YAML configures Hiera to look for data files in /etc/hiera, and defines a hierarchy for look ups. Data defined for the specific node should be placed in a YAML file named after the node’s FQDN and overrides data defined for a specific server environment (assuming a custom fact or top-level variable named server_environment). Server environment data overrides the common data (in the infrastructures I manage, I think of the server environment as distinct from Puppet’s notion of environments). I’ve tried several hierarchies of varying complexity and I keep coming back to this one as the right balance of simplicity and flexibility – I highly recommend it as a good starting point.

Since /etc/hiera is a git checkout, I use git for deployment. The checkout is pointed at an environment-specific branch, so production servers use the production branch, while development servers have a checkout of a branch named development. (Again, remember that these branches correspond with “server environment”, not Puppet environment.) When I’m ready to deploy a change, I merge it to the appropriate branch and use my parallel ssh tool to do a git pull in the checkout directory. This deployment strategy works well in smaller environments, but you can easily deploy Hiera data using your package manager, etc. It’s also worth noting that Hiera has only one configuration; if you make use of Puppet environments, you’ll want to configure your datadir to include the Puppet environment (%{::environment}) like this:

With the hierarchy outlined above, you must at least create the common.yaml file, which serves as the base of your hierarchy. In addition, when setting this up for a new infrastructure, I immediately create environment-specific files for each “server environment”:

  • production.yaml
  • staging.yaml
  • development.yaml

Populate your common.yaml with the kinds of things you might be tempted to store as a top-scope variable in Puppet’s site.pp. For example, I store the API keys for a couple services I use, the domain name for the environment I’m managing, and the list of packages that belong on every server. I usually choose a prefix based on the company name, and name all the sort of generic variables with that prefix. For a company named Widgets, Inc. I might have the following common.yaml:

Then place overriding data in your environment-specific YAML files. For example, you might want postfix to use different settings in development:

Use the FQDN-named YAML files to override settings on specific servers. I usually use this for testing the new version of a package on a single production server before rolling it out to the entire fleet.

To make use of all this data in Puppet, either define a parameterized class (taking advantage of automatic parameter look up):

Or use the hiera function:

One last tip: The hiera function’s second argument is a default value to use if Hiera can’t find the data you’ve asked for. I find it is better to provide a default only in cases where it is okay for that default to be used. It’s better to fail your catalog compilation and know something is wrong than to find yourself using a harmful default. In the examples above, I used a default because it makes sense for those settings in my environment – you should use your judgment on what works best in your environment.


Hiera is a powerful and flexible addition to your configuration management toolbox. The techniques I’ve outlined in this article have helped me leverage Hiera in real production environments. I’ve really only scratched the surface of what Hiera can do. For more information on Hiera integration with Puppet, check out the Using Hiera With Puppet documentation at Puppet Labs.

Here are some Puppet and Hiera resources from Safari Books Online.

Safari Books Online has the content you need

Pro Puppet is an in-depth guide to installing, using, and developing the popular configuration management tool Puppet. The book is a comprehensive follow-up to the previous title Pulling Strings with Puppet. Puppet provides a way to automate everything from user management to server configuration. You’ll learn how to create Puppet recipes, extend Puppet, and use Facter to gather configuration data from your servers.
Instant Puppet 3 Starter provides you with all the information that you need, from startup to complete confidence in its use. This book will explore and teach the core components of Puppet, consisting of setting up a working client and server and building your first custom module.
Puppet 3 Beginner’s Guide gets you up and running with Puppet straight away, with complete real world examples. Each chapter builds your skills, adding new Puppet features, always with a practical focus. You’ll learn everything you need to manage your whole infrastructure with Puppet.

About the author

PaulLathrop.Headshot Paul Lathrop is a software engineer for Krux Digital, Inc., building infrastructure and back-end services while working to bring down the walls between software and operations engineers. In his prior life as an operations engineer, Paul specialized in building and/or breaking complex systems using Puppet for several companies including Digg, SimpleGeo, and Krux, and can be reached at

Tags: git, Hiera, Puppet, versioning, Yaml,

Comments are closed.