A few Puppet best practices

Published on Sep 17, 2013 by David Marquis

Puppet, the popular configuration management tool, can get tricky at times. After a few months of using Puppet to manage our servers at work, a few practices have emerged as good, recommendable practices. I wanted to share a few of them with the rest of the world here so that beginners get a head start but also to get a good base for discussion with more seasoned Puppet users.

1. Version control as much as possible

This one may seem obvious to anyone who has used version control, but it isn’t obvious for everybody. Many sysadmins who start to use Puppet have had limited exposure to version control, which they often consider as a tool reserved for developers alike.

Using version control will open up a lot of additional possibilities with Puppet, such as better tracking of changes, testing your Puppet manifests in an isolated environment, promoting your configuration from environment to environment, etc. Version control even provides a free backup for your configuration code.

You will see gains from using any version control system (VCS), but modern distributed VCS systems (such as Git, Mercurial or Bazaar) prove to be particularly useful here due to the ease they provide in managing multiple branches of code.

Using a code collaboration tool such as Github, Bitbucket or Gitlab (self-hosted and open-source, highly recommended) will also allow you and your team to review each other’s changes before they are applied. I won’t try to convince anyone of the virtues of code reviews here, but let’s just say you’ll end up with much better, more maintainable Puppet code if you consistently review your changes with your peers.

Put all of your Puppet files (manifests, templates, files, hieradata files) under version control, then checkout a working copy on your Puppetmaster. When ready to “deploy” changes to your Puppermaster, just sync the working copy on the server with the code in the version control repository.

2. Use environments

Puppet has this concept of Environments which proves to be very useful for applying your configuration changes on less critical servers first, then promoting those changes to production when tested and ready.

We use 2 Puppet environments: staging and production. At initial provisioning of a server, we assign the _staging_ environment to all pre-production boxes (DEV and QA in our case). We assign the production environment only to, you’ll guess, production servers. Each environment is tied to a specific branch in our Git repository (“master” branch in Git is production and “staging” branch is staging)

We do most changes on the “staging” branch, apply them on pre-production boxes, then when we know it’s stable, we promote the changes by merging them into the “master” branch and apply them to production servers.

It’s not always possible to follow this flow (not all servers have pre-production replicas), but when it is, we do. It’s good for the peace of mind.

3. Use dry-runs

Even with the best precautions taken, things can get messy when you actually get to run the Puppet agent to apply your configuration updates on your servers. To reduce the risk of problems, I highly suggest running the Puppet agent in “dry run” mode using the following options:

puppet agent [...] --verbose --noop --test

Using those options will cause the Puppet agent to only show what it would do, not what it did. You get to see the diffs for all files that would be modified and validate things are going to go as you expect.

4. Use librarian-puppet

Managing module dependencies can be a source of headaches, especially when many people are working on Puppet code and they each need to test it on their own computer. Librarian-puppet provides some sanity to the process by automatically managing your module dependencies. You express your dependencies in a file (the “Puppetfile”) and the tool will install, update or remove modules automatically when you run it, always matching what’s specified in the Puppetfile. It’ll even resolve and install the modules’ own dependencies (what we would call transitive dependencies) and detect compatibility issues.

Using librarian-puppet on the Puppetmaster also allows for easier deployments: no need to install and manage your modules manually. With librarian-puppet, a deployment usually goes with two simple steps:

Sync your main sources with your code repository (ex: git pull)
Run librarian-puppet to synchronize your installed Puppet modules

Tip: Don’t use Git dependencies with no version specifier

Librarian-puppet allows you to declare dependencies on modules that come directly from a Git repository this way:

  mod "stdlib",
    :git => "git://github.com/puppetlabs/puppetlabs-stdlib.git"

Be careful using this with open-source modules that you don’t control as this tells librarian-puppet you want to use the latest, bleeding edge version of the module. If the module’s author decides to change something in a incompatible manner, you’ll probably get to spend some quality time with Puppet’s sometimes cryptic error messages.

Instead, always use references in your Puppetfile:

mod "stdlib",
  :git => "git://github.com/puppetlabs/puppetlabs-stdlib.git",
  :ref => "v1.0.2"

This will at least shield your Puppet code from inadvertently break because of backward-incompatible changes from the author. If the module’s author doesn’t use tags for releases, at the very least bind yourself on a particular revision:

mod "stdlib",
  :git => "git://github.com/puppetlabs/puppetlabs-stdlib.git",
  :ref => "84f757c8a0557ab62cec44d03af10add1206023a"

5. Keep sensitive data safe

Some data needs to be kept secure. Examples of sensitive data you may need to put in your Puppet code are passwords, private keys, SSL certificates and so on. Don’t put this in version control unless you’re absolutely aware of the risks you’re taking doing so.

Puppet has a nice tool for separating all of your data from your actual manifests (code). That tool goes by the name of Hiera and allows you to store data about your servers and infrastructure in YAML or JSON files. From usage, you’ll see that most data in Hiera files is not confidential in nature… so should we refrain from using version control for Hiera files just because of a few elements that are unsafe? Certainly not!

The trick is to use Hiera’s ability to combine multiple data sources (backends). What you can do is split hieradata files into 2 types: YAML files for your “main” hieradata files and JSON files to store your “secured” data. Those JSON files are not to be put under version control and are stored securely on a single location: the Puppetmaster. This way, very few people can actually see the contents of the sensitive files.

Here’s how to configure Hiera as such (hiera.conf):

---
:hierarchy:
  - %{hostname}
  - %{environment}
  - common
  - credentials
:backends:
  - yaml
  - json
:yaml:
  :datadir: '/etc/puppet/hieradata'
# only credentials are stored in json hiera datastore
:json:
  :datadir: '/etc/puppet/secure/hieradata'

6. Create abstractions for your high level classes

I guess this will vary depending on preferences and most probably not everyone is going to agree, but I’ve found that wrapping uses of modules into wrapper classes provides better maintainability of the Puppet code over time. This is better explained by an example…

Suppose you want to setup a reverse proxy server using an existing Nginx module. Instead of directly assigning the ‘nginx’ class on your nodes and setting all of the required stuff up, create instead a new class called, say, proxy_server with the attributes you want to consider for your proxy server as class parameters. Assigning the proxy_server class on your node not only better states your intent, but it also creates a nice little abstraction over what you consider as a “proxy server”. Later on, if you decide to go away from Nginx (highly impropable, why would you sin as such? :)) or use another Nginx module (more probable!), then you’ll probably just need to change the content of your proxy_server class, instead of a bunch of tangled node definitions.

That’s it!

I hope you’ll find the above list useful! Please do not hesitate to share your own experience and best practices in comments.

Tags: puppet devops