Getting started with Jenkins and Puppet

If you’ve followed my blog, you’ve seen that I enjoy using Travis CI to run my puppet rspec tests on the controlrepo or against component modules. When you create a PR, Travis starts a build and adds some links to the PR notes to the builds. When it’s complete, the PR is updated to let you know whether the build was successful or not. You can also configure your repo to prevent merges unless the build succeeds. The best part is that it “just works” – you never have to worry about upgrading Travis or patching it or really anything other than making sure you enabled it. It’s a pretty awesome system, especially since it is absolutely free for open source projects. I do love Travis!

But, Travis isn’t always available and isn’t always the answer. Travis only runs against PRs or (typically) merges into the default branch. It won’t run on a schedule, even if your code never changed, which can help test any dynamic dependencies you might have (whether you should have any is a different topic!). It runs on a limited subset of OSes that Travis supports. You have to use Github and public repos to hit the free tier – if your controlrepo is private, there’s a different travis site to use and there is no free plan, though there is a limited number of trial builds to get you started. After that, it can be pretty pricey, starting at 1 concurrent builds for $69/month. This is great for a business, but most of us can’t afford $840 a year for the home network. It’s also, by itself, somewhat limited in how you can integrate it. It can be part of a pipeline, but it just receives a change and sends a status back to Github, it won’t notify another system itself. You have to build that yourself.

There are a ton of other Continuous Integration and Continuous Delivery systems out there, though. They can be cheaper, have better integrations, run on your own hardware/cloud, and don’t have to require Github. Among the myriad options available, I chose to look at Jenkins, with the rtyler/jenkins puppet module. I can run this on my own VM in my home lab (hardware/cloud), it can integrate with GitHub private repos ($0) or BitBucket (doesn’t require Github). CloudBees also discussed a new Puppet Enterprise Pipeline plugin at PuppetConf 2016, which is really appealing to me (better integrations). I can also have it run builds on a schedule, so if dependencies change out from underneath me, I’m alerted when that happens, not when I run my first test a month after the underlying change.

I’m very new to Jenkins and I’m going to do a lot of things wrong while testing it, but I’m going to start blogging about my experiences as I go, right or wrong, and then try and do a wrapup article once I learn more and have a fully working solution. I’ve been inspired to do this by Julia Evans and her wonderful technical articles that are more about the exploration of technology. That goes against my normal style of figuring it all out first, but I so love to see the exploration of others and hope you will, too! As we go on this journey, please feel free to educate or correct me in the comments and on twitter, as always.

Create a Least-Privilege account to perform domain joins

I’ve been working to automate joining linux machines to an Active Directory domain lately and I was surprised to find little documentation on creating an AD account just for the domain joins. I was able to piece things together by looking at umpteen documents and lots of trial and error. I’ve compiled what I found in the hopes that others do not have to struggle so much. This is just what I was able to find out – please let me know if you have a better way!

 

In a Windows Active Directory domain, it’s important to join computers to the domain. When the computer is joined, the computer account is created and lets it do things like send user authentication requests/receive responses, update it’s IP/name in DNS, and otherwise participate in the AD domain. This is a pretty fundamental and vital requirement, so Microsoft has made it easy for users to perform domain joins, but with some limits.

If a user is a enabled, a member of Domain Users, a member of the local Administrators group, and the correct authentication information is used, they are granted the ability to join any given computer to a domain. The key ms-ds-MachineAccountQuota is defined at the domain level with a default value of 10. Due to the quota, any random user can join 10 computers to the domain after which they will no longer be able to do so. Enabled members of Domain Administrators are exempt from both local Administrators membership and the quota and can join unlimited computers so long as the correct authentication information is used.

This works great with Windows machines, but presents a slightly different problem when joining non-Windows machines to a domain. In these cases, there is likely no local Administrators group, so regular users are never able to satisfy all of the requirements to join a machine to the domain. Domain Administrators can, but that violates the principle of least privilege and is not the best option for production environments. We want a non-administrator account who can join as many computers to the domain as is required. I found two steps were required to create this account.

Active Directory Delegation

In the Active Directory Users and Computers MMC snap-in, you can use the Delegate Control wizard to delegate the ability to create computer accounts to a user account. Unfortunately, I have not found scriptable commands that are an equivalent to this wizard, so we need to describe the GUI process.

  • Create an account, ex: domainjoin, in the appropriate hierarchy of your Active Directory. It is recommend that User cannot change password and Password never expires are selected so the account is always available. It will not have ability to log into a server or any elevated privilege.
  • Delegate the ability to manage computer objects to the user with the Active Directory Users and Computers snap in (from JSI Tip 8144 with tweaks).
    • Open the Active Directory Users and Computers snap-in.
    • Right click the container under which you want the computers added (ex: Computers) and choose Delegate Control.
    • Click Next.
    • Click Add and supply your user account(s), e.g domainjoin. Click Next when complete.
    • Select Create custom task to delegate and click Next.
    • Select Only the following objects in the folder and then check Computer objects and Create selected objects in this folder. Click Next.
    • Under Permissions, check Create All Child Objects and Write All Properties. Click Next.
    • Click Finish

Increase the MachineAccountQuota value

The second step is to increase the quota from the default value of 10. This appears to be done domain-wide, so all users will get the new quota. I somehow doubt that will be a problem, but if it is, you will have to do further research on how to proceed. To increase the quota, we just need a single command entered in an Administrative PowerShell terminal.

Set-ADDomain example.com -Replace @{"ms-ds-MachineAccountQuota"="10000"}

I used 10,000 because we have less than 100 nodes ready to join the domain. You can increase the value if your scale is a bit higher. I’m sure there’s a way to reset the quota, too, I just haven’t found it yet.

Joining your node to the domain

You’re now ready to join your node to the domain with your new least-privilege account domainjoin. I have created a puppet module, domain_join, to meet my personal needs. I’d love to hear how you’re tackling this issue, especially if the solution is better than mine!

Parallelized Rspec Tests

Peter Souter showed me a recent PR for the puppet-approved jenkins module where he parallelized the rspec tests. When there are a large number of tests in separate files, it can take a lot of time when run in series. Parallelizing the tests MAY offer a speed improvement; in Peter’s case, it reduced the time by almost 50%. With a small number of tests, or when an overwhelming percentage of the tests are in a single file, there may be no benefit or even a decrease in performance, so be sure to test out its effects before committing to it. It’s a pretty simple change, but let’s look at it in some detail anyway.

Gemfile

In your Gemfile, you need to add one line:

gem 'parallel_tests'

Continue reading

Running rspec-puppet tests with granularity

When working on your puppet code, you’re going to want to run rspec against your tests on a regular basis. There are a few quirks to this process that we should cover quickly.

“Normal” Usage

Let’s start with a simple test of everything. You can do this with bundle exec rake spec (or the test target, which includes spec plus some other targets). That would look something like this (note: be is an alias for bundle exec):

Continue reading

Ruby net/https debugging and modern protocols

I ran into a fun problem recently with Zabbix and the zabbixapi gem. During puppet runs, each puppetdb record for a Zabbix_host resource is pushed through the zabbixapi, to create or update the host in the Zabbix system. When this happened, an interesting error crops up:

Error: /Stage[main]/Zabbix::Resources::Web/Zabbix_host[kickstart.example.com]: Could not evaluate: SSL_connect SYSCALL returned=5 errno=0 state=SSLv2/v3 read server hello A

If you google for that, you’ll find a lot of different errors and causes described across a host of systems. Puppet itself is one of those systems, but it’s not the only one. All of the systems have something in common: Ruby. What they rarely have is actual resolution, though. Possible causes include time out of sync between nodes, errors with the certificates and stores on the client or server side, and of course a bunch of “it works now!” with no explanation what changed. To confuse matters even more, the Zabbix web interface works just fine in the latest browsers, so the SSL issue seems restricted to zabbixapi.

To find the cause, we looked at recent changes. The apache SSLProtocols were changed recently, which shows up in a previous puppet run’s output:

Continue reading

Root Cause Analysis: It’s Still Valid

You’ve probably heard it before: Root Cause Analysis (RCA) doesn’t exist, there’s always something under the root cause. Or, there’s no root cause, only contributing factors. This isn’t exactly untrue, of course. Rarely in our entire life will we find some cause and effect so simple that we can reduce a problematic effect to a single cause. Such arguments against RCA may be grounded in truth but smooth over the subtleties and complexities of the actual process of analysis. They also focus on the singular, though nothing in the phrase “Root Cause Analysis” actually implies the singular. Let’s take a look at how RCA works and analyze it for ourselves.

Root Cause Analysis is the analysis of the underlying causes related to an outage. We should emphasize that “causes” is plural. The primary goal is to differentiate the symptoms from the causes. This is a transformative and iterative process. You start with a symptom, such as the common “the internet is down!” In a series of analytical steps, you narrow it down as many times as needed. That progression may look like:

  • “DNS resolutions failed”
  • “DNS server bind72 failed to restart after the configuration was updated”
  • “A DNS configuration was changed but not verified and it made its way into production”
  • “Some nodes had two resolvers, one of which was bind72 and the other was the name of a decommissioned DNS node.”

Each iteration gets us closer to a root cause. We may identify multiple root causes – in this case, lack of config validation and bad settings on some nodes. Not only are these causes, root causes, but they are actionable.  Validation can be added to DNS configuration changes. Bad settings can be updated. Perhaps there’s even a cause underneath – WHY the nodes had bad settings – because RCA is an iterative process. We can also extrapolate upward to imagine what other problems could be prevented. DNS configurations surely aren’t the only configurations that need validated.

Multiple causes and findings doesn’t invalidate Root Cause Analysis, it only strengthens the case for it. If it makes it easier to share the concept, we can even call it Root Causes Analysis, to help others understand that we’re not looking for a singular cause. Regardless of what we call it, I believe it is absolutely vital that we continue such analysis, that we don’t throw away the practice because some people have focused on the singular. Be an advocate of proper RCA, of iterative analytical processes, and of identifying and addressing the multiple causes at hand.

Kickstart your CentOS Template, EL7 Edition

I wrote an article on kickstarting your CentOS Template in early 2014 that focused on Enterprise Linux 6. Later in the year, RHEL7 was announced and CentOS 7 soon followed. It’s well past time to refresh the kickstart article. To keep this more of a “moving target”, I’ve created a github repo to host the kickstart files at puppetinabox/centos-kickstart, so you can turn there for updates or submit your own PRs. I’m also toying with an existing puppet module danzilio/kickstart that generates kickstart files, and I plan to contribute some PRs to it to manage the kickstart service itself. In the meantime, I’ll show a small profile that will do the same thing, since it’s just apache and a few files.

Kickstart Configuration

The new EL7 file was based off the EL6 version. I simply changed the package list as some were no longer available and the open-vm-tools are now the preferred method of VMware tools management. That section was removed from the bottom. In the additional steps section, I changed the yum repo for puppet from Puppet 3 to Puppet Collections 1 for Puppet 4. I also removed the banner setup, that’s easy enough to add in if you like.

Kickstart Service Management

The kickstart service itself is pretty simple. You can use puppetlabs-apache to install apache and then place your files in it’s default root of /var/www/html. Take the kickstart files and add them to dist/profile/files with any modifications you require. Then create a profile that includes apache plus the kickstart files. That would look something like this:

Continue reading

Speeding up Travis CI with Containers

Another quick Travis CI post. Travis CI now supports containers, which means potentially faster builds. There are two lines you can add to your .travis.yml:

sudo: false
cache: bundler

The first enables the container infrastructure. The second caches dependencies for a Ruby project. Travis CI has articles on the new architecture and caching content and dependencies that provide more detail if you need it. I didn’t see as much of a speedup with caching, but enabling containers definitely made it faster.

Allowing expected failures with Travis CI

Recently, I started setting up my puppet modules with Travis CI. One thing that bothered me was that once I added a new version to the matrix, it had to pass with that version or the PR would have a red mark. Today, I found out how to ignore those new versions with the allow_failures key in the matrix hash of .travis.yml. For instance, if I am currently testing up to Puppet 3.7.0 and I to start testing with 3.7.0 future parser and 4.0.0, I add the new versions to the env hash and the matrix‘s allow_failures key. The new lines are in bold:

---
sudo: false
language: ruby
branches:
  only:
    production
bundler_args: --without development system_tests
script: "cd dist/profile && bundle exec rake test"
notifications:
  email: false
rvm:
  - 1.9.3
  - 2.0.0
  - 2.1.0
env:
  - PUPPET_GEM_VERSION="~> 3.3.0"
  - PUPPET_GEM_VERSION="~> 3.4.0"
  - PUPPET_GEM_VERSION="~> 3.5.0" STRICT_VARIABLES=yes
  - PUPPET_GEM_VERSION="~> 3.6.0" STRICT_VARIABLES=yes
  - PUPPET_GEM_VERSION="~> 3.7.0" STRICT_VARIABLES=yes
  - PUPPET_GEM_VERSION="~> 3.7.0" STRICT_VARIABLES=yes FUTURE_PARSER=yes
  - PUPPET_GEM_VERSION="~> 4.0.0" STRICT_VARIABLES=yes
matrix:
  exclude:
  # Ruby 2.1.0
  - rvm: 2.1.0
    env: PUPPET_GEM_VERSION="~> 3.2.0"
  - rvm: 2.1.0
    env: PUPPET_GEM_VERSION="~> 3.3.0"
  - rvm: 2.1.0
    env: PUPPET_GEM_VERSION="~> 3.4.0"
  allow_failures:
    - env: PUPPET_GEM_VERSION="~> 4.0.0" STRICT_VARIABLES=yes
    - env: PUPPET_GEM_VERSION="~> 3.7.0" STRICT_VARIABLES=yes FUTURE_PARSER=yes

Now, when Travis CI runs, it will add the new versions to the match and will mark the designated versions separately as Allowed Failures. You can see this in action with puppetinabox/controlrepo build #22, which contains expected failures and so is considered a passing build. I can now develop against my supported versions and test against new versions, without affecting my test results. This has solved a pretty big frustration for me. Hopefully, it helps you as well!

Getting a fresh start in a Git repository

Sometimes, the software in a git repository you work with starts to act wonky, especially when running tests, and for no particular reason that you can discern! I saw this recently when I was playing with bundler in some puppet module repos I was setting up to use Travis CI. My rspec tests started failing with nonsensical errors, such as saying it couldn’t find a class that was very much present on disk. What’s worse is that I did a fresh clone of the repo from GitHub into another directory and it worked flawlessly! What the hell? Thankfully, a little kvetching on twitter led to some help from David Schmitt:

//platform.twitter.com/widgets.js

David dropped some great wisdom here (thanks!). Let’s unpack it and understand it.

Continue reading