Posted on by & filed under design, infrastructure.

Where We Were

Last June I wrote about our Continuous Integration setup. We’ve continued using this system and have made some enhancements in the past few months that make it even better.

In our first system of CI, we wanted everything in version control, everything tested, and nothing knifed manually into production. When we first designed this system, we were only thinking about our cookbooks. To get our version 1.0 off the ground, we stuck with just cookbooks and ignored following these rules for nodes, databags, and roles.

Same Goals, Better Implementation

  • Berkshelf
  • Test Kitchen
  • LXCs
  • Chef-zero
  • Jenkins deploys all

Our old system used real VMs (that’s an oxymoron) to run tests. These VMs would accumulate cruft with each run, so we designed a system to have fresh nodes to test on. Jeremiah Gray did a great writeup about our new testing framework. We created LXCs (Linux Containers) provisioned with chef-zero (which allowed test data bags) to run our Test Kitchen testing framework. We fully embraced Berkshelf to handle all our cookbook dependencies. We figured most of this out on our own but it looks similar to a chapter in Test-Driven Infrastructure with Chef.

Doing it right, no cheating

While we were dreaming of the next feature to add to our CI system, the ugly truth that we took some shortcuts kept gnawing at us. It was time to finish out our rules: everything in git, everything tested, Jenkins deploys all to our entire infrastructure.


With each push to our nodes repository, a script checks JSON syntax and crosschecks all nodes in Chef and git. If a node exists in one (Chef or git) but not the other, we receive an alert. Upon success, Jenkins knifes in all nodes to our fleet. This works for us now but may not scale in the future. What I like about this process is nodes are reset across the fleet frequently. There won’t be any lingering rogue node edits for any long period of time.

Data Bags

The data bag system works in a similar manner. First a syntax check looks at the JSON, then data bags in Chef and git are compared. The script is written so you can have data bags in git that are not yet in your Chef system, but once deployed to the fleet, they will be updated en masse like nodes.


Roles were the simplest case of all of these. It too has a syntax check. Jenkins pushes all roles in git into production upon a successful push to the master roles repository.

Community Cookbooks

We wrote a cookbook that acts as a framework to test our community cookbooks. Some community cookbooks come with tests and other do not. As we add more community cookbooks into our infrastructure, we have a place for tests.

Monitoring Host Group

We are supporting our legacy and current CI system so breaking down components into groups was a worthwhile exercise. All of the servers in our CI fleet were monitored, but putting them into their own host group helped compartmentalize how all the moving parts worked.

Always in Transition

While developing this system and using it in production, we also wanted to move all nodes from our Chef 10 production servers to our new Chef 11 server. We have most of our nodes migrated and once this is done, we can turn down a lot of VMs that support the Chef 10 and CI 1.0 infrastructure.

What’s next

When we originally planned our next phase we thought we would implement Gerrit (code review) and Zuul (project gating). All of the above upgrades were required before we could even think about making this transition. We are still working on some of the details I have highlighted in this post but Gerrit+Zuul are on our road map.

Tags: automation, chef, continuous integration, Gerrit, IT, vms, zuul,


  1.  Practical Philosophy in Chef Land | Safari Flow Blog