Puppet Enterprise Migration from 3.8.4 to 2015.3.3

I recently completed a PE migration from 3.8.4 to 2015.3.3 (puppetserver 2.2.41 and puppet agent 4.3.2). This was a somewhat painful exercise, as we kept running into issues because we had gotten so far behind on upgrades. If you need to perform the same kind of upgrade, I hope this broad-stroke description of the upgrade steps will help you. Before we get to the upgrade, let’s cover some of the pre-requisites.

Release Notes

Always read the release notes first. I am sure I will cover some of the notes below in specific problems we ran into, but there’s a lot on there that we did NOT encounter.

Future Parser

When you get to PE 2015.3.3, you’ll be running Puppet 4. Make sure you have the future parser enabled on your master or agents, by following these instructions. You’ll likely run into at least one issue if you weren’t doing this before. For example, automatic string/array conversions may not work as you expect. Get your code up to par before moving forward.

You will also want to investigate using or testing with strict variables (I couldn’t find a simple link for this on puppet.com, though with the company’s name change it’s possible search engine results may show something by the time you read this). You can probably get away without this one, as long as there weren’t any bugs in any of the modules you use, but if you have to refactor for the future parser anyway, it may be good to combine these two efforts.

Classifier

I won’t get into details here, because they will invariably be implementation-specific anyway, but the classifier kept breaking the updates. The master was originally PE 3.7.2 and some significant changes occurred in PE 3.8, and we had made one or two of our own groups, so it wasn’t really a surprise that the installer found conflicts and failed. Check out the Preconfigured node groups documentation if you think you can fix this by yourself, but I suggest engaging support and they’ll help you with specifics.

If you run into classifier issues during the upgrade, you can probably fix them afterward without negative side effects. But maybe not, which brings us to…

Backups and Snapshots

When you start this upgrade, make sure you can get back to your PE3.8.4 master when you’re done. Make sure you have a good backup of your master and important nodes. If you have the ability to snapshot, use those in addition to your backups for ease of recovery. You may also want to take snapshots on your canary nodes to make it easy to try things and always get back to square one.

Block connections to the Master

PE 3.8.4 agents that check in to 2015.3.3 will work, but it may break some things no longer needed. For instance, we saw that pe-mcollective refused 3o start on an agent that had checked into the 2015.3.3 master. Not an issue, but we had to back out the change on the master. When it was back to 3.8.4, it keep trying to restart that service on nodes, so every node that had checked in showed up as a failed run in the Console. I suggest you block connections to the master during the upgrade process to avoid this. You can use a network- or host-level firewall to block port 8140 for non-canary nodes, you can revoke the certificates, or you can even revoke the CA. Each has pros and cons, use what works for you.

Code changes

This part will vary quite a bit. Here are some tips that should cover the most common problems I saw and heard reported in IRC:

  • Make sure you have rspec-puppet tests with good coverage that pass before you begin. You absolutely must have working unit tests to refactor, and you’re likely going to need to do some refactoring. If you don’t have working tests, put down this article and go fix that before returning to this!
  • Watch out for string conversions! I mentioned changes with the string parser above. One that bit us was using ‘undef’ instead of undef. This worked great with rspec-puppet tests (validate_string(‘undef’) is a string!) but the file resource’s source attribute did not like a path ‘undef/…’. I’m not sure whether it was a bad test or a problem in rspec-puppet, so I’ll just repeat myself again: Watch out for string conversions!
  • Hiera 3.0.6 has some bugs with scoping. These two hurt everyone:
    • Replace any instances %{} with %{::}. Previously, %{} would resolve to “null”, but now it resolves to the scope, so you get something weird like <#Hiera:7329A802#> instead of nothing. Putting the colon prepend avoids this.
    • Your datadir value was probably %{environment} before. This works with Automatic Parameter Lookup but does not always work with the hiera functions. Replace it with %{::environment}.
  • If you’re leveraging the PE-bundled ruby instead of system ruby because your environment is limited (i.e. Enterprise Linux 6 and Ruby 1.8.7), don’t rely on this anymore! You’re going to have a bad time. Bite the bullet and move to EL7 and Ruby 2.0.0 or use rvm/rbenv. You should already be planning that, so this is just extra ammunition, right?
  • Watch out when upgrading or changing distributions at the same time. Yeah, I just advised you to do it, but that’s only because EL7 is less pain than “fixing” EL6. It’s easy to get bit by changes like interface names, deprecated commands, etc. We also added docker to our build node’s profile, with an eye toward using beaker and docker for acceptance testing, and then wondered why the docker interface’s IP showed up in DDNS instead of the expected IP. Make sure you don’t change more things than you can keep in your head at once, or troubleshooting becomes difficult.
  • The provider pe_puppetserver_gem no longer works. Use puppetserver_gem instead.
  • Review your modules requirements and supported Puppet versions. Unfortunately, you cannot always trust the forge pages – as I write this, pe_puppetserver_gem says it supports PE >=3.7.0. It should read “PE >=3.7.0 <2015.0.0” (PR8 submitted). If 2015.3.3 is explicitly listed, make sure that applies for the version you are using! If you’re two major versions behind, using ntp 3.x instead of 5.x, expect to be bitten. Balance upgrading everything with the possibility of everything exploding, and be sure to test, test, TEST everything before pushing to production.
  • The hiera eyaml gem will be removed during the installation. It’s installed in the puppetserver’s environment, and that whole thing is going to be replaced. If you have regular yaml configured as a backend and your master does not rely on any eyaml secrets, you’re probably okay to proceed. You’ll have to then make sure that the puppet master checks into itself, to re-install the gem (you are managing that with hunner/hiera or similar, right?) before any node that relies on eyaml secrets connects. If you only have the eyaml backend or the master relies on eyaml secrets, you’ll have to look at another solution – adding the yaml backend before continuing is probably the path of least resistance.

Upgrade the master and agents

Finally, we’re ready for the actual upgrade. This is the simple part. Puppet has helpfully provided an upgrade guide with specific PE 3.8.4 to 2015.3.3 notes (I should note they have a 3.8.4 to 2016.1.1 guide as well. We started the upgrade attempts before 2016.1.1 was released, though!). Download the installer on the master, unpack it, and run:

sudo ./puppet-enterprise-installer

A few minutes later, it’s all done. Next, upgrade the agents. If you have a good automation or a small number of nodes, run:

curl -k https://<MASTER HOSTNAME>:8140/packages/current/install.bash | sudo bash

Otherwise, use puppetlabs/puppet_agent and classification to upgrade the agents on their next run. Reminder: ensure you allow connections from the agents to the master before doing this! If you revoked the CA or individual certs, you’ll have to straighten out the certs as well.

If you run into any issues at any time in this process, collect diagnostics, then use snapshots and/or backups to move your systems back to the starting point. Analyze what went wrong and remediate it before trying another upgrade.

Once you succeed in upgrading the master and agents, use the Console or syslog to ensure that all the nodes are working properly. Something may have slipped through your preparation and it’s best to catch and fix it early. Finally, once you mark it down as a success, don’t forget to delete snapshots and any temporary backups and update your relevant run|play books.

Summary

Upgrading from PE 3.8.4 to PE 2015.3.3 was long overdue for us. Because of the delay, the preparation work kept stacking up, and that’s likely where most of your work will be as well. Once you get past that, upgrading the master and agents is relatively simple. Plus, with PE you have support, so use it if you run into any problems not described here. Good luck!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s