Installing Jenkins and RVM

Update: Contrary to the module readme as of 12/1/2016, this WILL result in a VM running Jenkins 2, rather than version 1.

It’s time to get started installing Jenkins for use with Puppet. I’ll be installing this in my home lab on a vm called jenkins (so inventive, I know) as an all-in-one server for simplicity. I don’t know Jenkins real well so I don’t know what I should be doing for masters and build servers, but my home lab needs are small anyway. I gave it 2 cores and 1 GB of RAM, which I think is on the low end for Jenkins. My initial exploration led me to believe I’ll need to install Jenkins as well as RVM, since my base OS image is CentOS 7 (Ruby 2.0.0) and the current Puppet 4 AIO build is against Ruby 2.1. I’m using Puppet and the rtyler/jenkins module to set up Jenkins, since I don’t know anything about its innards. Down below, you’ll see that I installed RVM by hand, after which it occurred to me that there’s a perfectly cromulent maestrodev/rvm module that I’ve used in the past – something I may change out later, but I already wrote my manifest so I’m going to share it anyway!

I used Jenkins jobs to experiment a lot with this setup, such as enabling RVM on a job and seeing what errors showed up in the log, but I’m going to hold off on showing that magic until the next article. I will still explain where portions of the manifest came from, just without the actual errors.

Before we go any further, make sure you have the jenkins module and its dependencies added to your Puppetfile, .fixtures.yml, and wherever else you might need to track it.

Continue reading

Getting started with Jenkins and Puppet

If you’ve followed my blog, you’ve seen that I enjoy using Travis CI to run my puppet rspec tests on the controlrepo or against component modules. When you create a PR, Travis starts a build and adds some links to the PR notes to the builds. When it’s complete, the PR is updated to let you know whether the build was successful or not. You can also configure your repo to prevent merges unless the build succeeds. The best part is that it “just works” – you never have to worry about upgrading Travis or patching it or really anything other than making sure you enabled it. It’s a pretty awesome system, especially since it is absolutely free for open source projects. I do love Travis!

But, Travis isn’t always available and isn’t always the answer. Travis only runs against PRs or (typically) merges into the default branch. It won’t run on a schedule, even if your code never changed, which can help test any dynamic dependencies you might have (whether you should have any is a different topic!). It runs on a limited subset of OSes that Travis supports. You have to use Github and public repos to hit the free tier – if your controlrepo is private, there’s a different travis site to use and there is no free plan, though there is a limited number of trial builds to get you started. After that, it can be pretty pricey, starting at 1 concurrent builds for $69/month. This is great for a business, but most of us can’t afford $840 a year for the home network. It’s also, by itself, somewhat limited in how you can integrate it. It can be part of a pipeline, but it just receives a change and sends a status back to Github, it won’t notify another system itself. You have to build that yourself.

There are a ton of other Continuous Integration and Continuous Delivery systems out there, though. They can be cheaper, have better integrations, run on your own hardware/cloud, and don’t have to require Github. Among the myriad options available, I chose to look at Jenkins, with the rtyler/jenkins puppet module. I can run this on my own VM in my home lab (hardware/cloud), it can integrate with GitHub private repos ($0) or BitBucket (doesn’t require Github). CloudBees also discussed a new Puppet Enterprise Pipeline plugin at PuppetConf 2016, which is really appealing to me (better integrations). I can also have it run builds on a schedule, so if dependencies change out from underneath me, I’m alerted when that happens, not when I run my first test a month after the underlying change.

I’m very new to Jenkins and I’m going to do a lot of things wrong while testing it, but I’m going to start blogging about my experiences as I go, right or wrong, and then try and do a wrapup article once I learn more and have a fully working solution. I’ve been inspired to do this by Julia Evans and her wonderful technical articles that are more about the exploration of technology. That goes against my normal style of figuring it all out first, but I so love to see the exploration of others and hope you will, too! As we go on this journey, please feel free to educate or correct me in the comments and on twitter, as always.

Create a Least-Privilege account to perform domain joins

I’ve been working to automate joining linux machines to an Active Directory domain lately and I was surprised to find little documentation on creating an AD account just for the domain joins. I was able to piece things together by looking at umpteen documents and lots of trial and error. I’ve compiled what I found in the hopes that others do not have to struggle so much. This is just what I was able to find out – please let me know if you have a better way!

 

In a Windows Active Directory domain, it’s important to join computers to the domain. When the computer is joined, the computer account is created and lets it do things like send user authentication requests/receive responses, update it’s IP/name in DNS, and otherwise participate in the AD domain. This is a pretty fundamental and vital requirement, so Microsoft has made it easy for users to perform domain joins, but with some limits.

If a user is a enabled, a member of Domain Users, a member of the local Administrators group, and the correct authentication information is used, they are granted the ability to join any given computer to a domain. The key ms-ds-MachineAccountQuota is defined at the domain level with a default value of 10. Due to the quota, any random user can join 10 computers to the domain after which they will no longer be able to do so. Enabled members of Domain Administrators are exempt from both local Administrators membership and the quota and can join unlimited computers so long as the correct authentication information is used.

This works great with Windows machines, but presents a slightly different problem when joining non-Windows machines to a domain. In these cases, there is likely no local Administrators group, so regular users are never able to satisfy all of the requirements to join a machine to the domain. Domain Administrators can, but that violates the principle of least privilege and is not the best option for production environments. We want a non-administrator account who can join as many computers to the domain as is required. I found two steps were required to create this account.

Active Directory Delegation

In the Active Directory Users and Computers MMC snap-in, you can use the Delegate Control wizard to delegate the ability to create computer accounts to a user account. Unfortunately, I have not found scriptable commands that are an equivalent to this wizard, so we need to describe the GUI process.

  • Create an account, ex: domainjoin, in the appropriate hierarchy of your Active Directory. It is recommend that User cannot change password and Password never expires are selected so the account is always available. It will not have ability to log into a server or any elevated privilege.
  • Delegate the ability to manage computer objects to the user with the Active Directory Users and Computers snap in (from JSI Tip 8144 with tweaks).
    • Open the Active Directory Users and Computers snap-in.
    • Right click the container under which you want the computers added (ex: Computers) and choose Delegate Control.
    • Click Next.
    • Click Add and supply your user account(s), e.g domainjoin. Click Next when complete.
    • Select Create custom task to delegate and click Next.
    • Select Only the following objects in the folder and then check Computer objects and Create selected objects in this folder. Click Next.
    • Under Permissions, check Create All Child Objects and Write All Properties. Click Next.
    • Click Finish

Increase the MachineAccountQuota value

The second step is to increase the quota from the default value of 10. This appears to be done domain-wide, so all users will get the new quota. I somehow doubt that will be a problem, but if it is, you will have to do further research on how to proceed. To increase the quota, we just need a single command entered in an Administrative PowerShell terminal.

Set-ADDomain example.com -Replace @{"ms-ds-MachineAccountQuota"="10000"}

I used 10,000 because we have less than 100 nodes ready to join the domain. You can increase the value if your scale is a bit higher. I’m sure there’s a way to reset the quota, too, I just haven’t found it yet.

Joining your node to the domain

You’re now ready to join your node to the domain with your new least-privilege account domainjoin. I have created a puppet module, domain_join, to meet my personal needs. I’d love to hear how you’re tackling this issue, especially if the solution is better than mine!

Parallelized Rspec Tests

Peter Souter showed me a recent PR for the puppet-approved jenkins module where he parallelized the rspec tests. When there are a large number of tests in separate files, it can take a lot of time when run in series. Parallelizing the tests MAY offer a speed improvement; in Peter’s case, it reduced the time by almost 50%. With a small number of tests, or when an overwhelming percentage of the tests are in a single file, there may be no benefit or even a decrease in performance, so be sure to test out its effects before committing to it. It’s a pretty simple change, but let’s look at it in some detail anyway.

Gemfile

In your Gemfile, you need to add one line:

gem 'parallel_tests'

Continue reading

Running rspec-puppet tests with granularity

When working on your puppet code, you’re going to want to run rspec against your tests on a regular basis. There are a few quirks to this process that we should cover quickly.

“Normal” Usage

Let’s start with a simple test of everything. You can do this with bundle exec rake spec (or the test target, which includes spec plus some other targets). That would look something like this (note: be is an alias for bundle exec):

Continue reading

Ruby net/https debugging and modern protocols

I ran into a fun problem recently with Zabbix and the zabbixapi gem. During puppet runs, each puppetdb record for a Zabbix_host resource is pushed through the zabbixapi, to create or update the host in the Zabbix system. When this happened, an interesting error crops up:

Error: /Stage[main]/Zabbix::Resources::Web/Zabbix_host[kickstart.example.com]: Could not evaluate: SSL_connect SYSCALL returned=5 errno=0 state=SSLv2/v3 read server hello A

If you google for that, you’ll find a lot of different errors and causes described across a host of systems. Puppet itself is one of those systems, but it’s not the only one. All of the systems have something in common: Ruby. What they rarely have is actual resolution, though. Possible causes include time out of sync between nodes, errors with the certificates and stores on the client or server side, and of course a bunch of “it works now!” with no explanation what changed. To confuse matters even more, the Zabbix web interface works just fine in the latest browsers, so the SSL issue seems restricted to zabbixapi.

To find the cause, we looked at recent changes. The apache SSLProtocols were changed recently, which shows up in a previous puppet run’s output:

Continue reading

Root Cause Analysis: It’s Still Valid

You’ve probably heard it before: Root Cause Analysis (RCA) doesn’t exist, there’s always something under the root cause. Or, there’s no root cause, only contributing factors. This isn’t exactly untrue, of course. Rarely in our entire life will we find some cause and effect so simple that we can reduce a problematic effect to a single cause. Such arguments against RCA may be grounded in truth but smooth over the subtleties and complexities of the actual process of analysis. They also focus on the singular, though nothing in the phrase “Root Cause Analysis” actually implies the singular. Let’s take a look at how RCA works and analyze it for ourselves.

Root Cause Analysis is the analysis of the underlying causes related to an outage. We should emphasize that “causes” is plural. The primary goal is to differentiate the symptoms from the causes. This is a transformative and iterative process. You start with a symptom, such as the common “the internet is down!” In a series of analytical steps, you narrow it down as many times as needed. That progression may look like:

  • “DNS resolutions failed”
  • “DNS server bind72 failed to restart after the configuration was updated”
  • “A DNS configuration was changed but not verified and it made its way into production”
  • “Some nodes had two resolvers, one of which was bind72 and the other was the name of a decommissioned DNS node.”

Each iteration gets us closer to a root cause. We may identify multiple root causes – in this case, lack of config validation and bad settings on some nodes. Not only are these causes, root causes, but they are actionable.¬† Validation can be added to DNS configuration changes. Bad settings can be updated. Perhaps there’s even a cause underneath – WHY the nodes had bad settings – because RCA is an iterative process. We can also extrapolate upward to imagine what other problems could be prevented. DNS configurations surely aren’t the only configurations that need validated.

Multiple causes and findings doesn’t invalidate Root Cause Analysis, it only strengthens the case for it. If it makes it easier to share the concept, we can even call it Root Causes Analysis, to help others understand that we’re not looking for a singular cause. Regardless of what we call it, I believe it is absolutely vital that we continue such analysis, that we don’t throw away the practice because some people have focused on the singular. Be an advocate of proper RCA, of iterative analytical processes, and of identifying and addressing the multiple causes at hand.