Convert a Puppet module from Bundle-based testing to the Puppet Development Kit (PDK)

A few years ago, I set up my modules with a bundle-based test setup and modulesync and wrote a companion blog post. Since that was written, a lot of things have changed with puppet. One of those is the release last year of the Puppet Development Kit (PDK). The goal of the PDK is to simply development of puppet code and modules by reducing or eliminating all of the headaches of creating a mature ruby/bundler/puppet-lint/etc. setup. There is also a brand new tool called pdksync that combines the PDK with the power of modulesync. I was somewhat involved in the initial efforts toward the PDK, through my work on puppet-lint, but I have not actually used the PDK “in anger” yet, in part because of my previously working modulesync setup. This seems like a great opportunity to switch to PDK and pdksync, starting with the PDK.

Why PDK?

Before we begin, we should look at why we want to use the Puppet Development Kit. My current setup is best described as fragile. Its effectiveness varies based on what version of Ruby I use and the version of the gems I happen to download via bundler on any given day. I use CentOS 7, which is stuck on Ruby 2.0. Most of the gems in my setup require at least Ruby 2.1 or 2.2, so I have to resort to RVM to provide me with Ruby 2.3.1. Someday, I’ll need to update that to Ruby 2.4 for a gem and my setup will break until I fix it.

I am also downloading a bunch of gems that are not pinned and updated versions can bring in subtle bugs or cascading failures not related to changes in my code. Sometimes the gem is directly related to my work, like a puppet-lint version with a bug that I can downgrade and pin. Other times, it’s a very indirect dependency of a dependency of a dependency to puppet-lint, and pinning it only creates more problems for everything that depends on it. Of course, bundler also relies on rubygems.org and internet/mirror access, which sometimes go down when you need them most.

While these are surmountable issues, they always come up while I’m trying to get something done in puppet, and the minutes or hours required to fix the problem prevents me from making the changes I need, when I need.

The PDK resolves this by bundling its own version of Ruby and dependent gems. Puppet vets the setup, so I do not have to. Everything is on disk, so there’s no more required downloading of gems that can be pulled or unavailable because of network issues. This is a huge benefit to all users, whether they pay for Puppet Enterprise or use Puppet Opensource Edition for free. Less time spent worrying about dependency hell and more time getting straight to work. This is important, valuable work, but it’s not my expertise or an actual goal of my job, so I am very content to let someone else handle the setup so I can spend more time managing my systems with Puppet.

The PDK is an installable tool. Install once and you can use it with all your puppet modules and controlrepos. Upgrading the PDK is simple using your package manager. You can of course combine using the PDK on some modules and stick with the ruby/bundler setup on others. However, it will be more difficult (but not impossible) to switch between the PDK and native bundler on the same module – our CI systems will use native bundler after all – but some gem dependencies will no longer be pinned and we lose the guarantee of pinned gems that will work together.

We will see below that you can still modify the setup on each module/controlrepo to some extent, but when using the PDK, the full range of customization bundler offered is unavailable to you. I think most people will not find this to be a problem, but you should definitely read up on the PDK to make sure you understand what you gain and lose before converting to using it. If you change your mind later, switching back from PDK to a bundler-based setup is possible, but it may involve some work to find a working setup of pinned gem versions.

Installing the PDK

The very first thing I need to do is install the PDK. The following is written using PDK v1.5.0.0. The PDK is relatively new and gets frequent updates, so this may become out of date rapidly. If you run into any issues, check the version, read the release notes, and adjust accordingly.

The docs decribe how to install on various systems. I use EL7 so I will install the RPM. I also use Puppet Enterprise, not Puppet Opensource, so I have to add the Puppet repository first. rpm and yum can get me there, or I can use puppet apply:

# Manual
sudo rpm -Uvh https://yum.puppet.com/puppet5/puppet5-release-el-7.noarch.rpm
sudo yum install pdk -y
# puppet apply
cat > ~/pdk.pp < EOF
package { 'puppet5-release-el-7':
  ensure => present,
  provider => 'rpm',
  source => 'https://yum.puppet.com/puppet5/puppet5-release-el-7.noarch.rpm',
}
-> package { 'pdk':
  ensure => present,
}
EOF
sudo puppet apply ~/pdk.pp

I can now call the command pdk successfully. Be aware that it includes its own bundled ruby, so the first time you run it may take a little time to be loaded and cached, which is expected.

[rnelson0@build03 domain_join:master]$ pdk --help
NAME
    pdk - Puppet Development Kit

USAGE
    pdk command [options]

DESCRIPTION
    The shortest path to better modules.

COMMANDS
    build        Builds a package from the module that can be published to the Puppet Forge.
    bundle       (Experimental) Command pass-through to bundler
    convert      Convert an existing module to be compatible with the PDK.
    help         show help
    module       Provide CLI-backwards compatibility to the puppet module tool.
    new          create a new module, etc.
    test         Run tests.
    update       Update a module that has been created by or converted for use by PDK.
    validate     Run static analysis tests.

OPTIONS
    -d --debug                    Enable debug output.
    -f --format=           Specify desired output format. Valid
                                  formats are 'junit', 'text'. You may also
                                  specify a file to which the formatted
                                  output is sent, for example:
                                  '--format=junit:report.xml'. This option
                                  may be specified multiple times if each
                                  option specifies a distinct target file.
    -h --help                     Show help for this command.
       --version                  Show version of pdk.

If you sign up for the puppet-announce mailing list, you will be notified every time there’s a new PDK release. After reading the release notes for edge cases that may impact you, can easily upgrade to the latest version with your distro’s equivalent of yum update pdk. That is a lot easier than updating ruby/bundler setups.

Converting to the PDK

Next, my existing setup must be converted to PDK. I will walk through my efforts, but you can also review the PuppetConf 2017 video and slides about the PDK in addition to this Converting To PDK doc. I am working on my domain_join module first, starting from the release candidate for v0.5.2, if you want to recreate this effort. The module is part of my modulesync config and has 68 tests. It’s a reasonably mature module but not overly complex, perfect for testing without being too deep or shallow. I am also going to break the modulesync setup, which you can see here. Before beginning, I create a new branch on the module:

[rnelson0@build03 domain_join:master]$ git checkout -b pdk
Switched to a new branch 'pdk'

Because I use modulesync, this will not work out of the box for me, but there’s a very naive default pdk convert that can be used to update the config. It will inform you of the files that will be added/modified and prompts you to continue due to the potential for destruction. As noted, this concern is mitigated by using version control, and if you’ve read my blog before, you’re obviously using version control, right? If not, get that done first! (it’s beyond the scope of this article, but my git 101 article may help) Here’s what the naive attempt looks like:

[rnelson0@build03 domain_join:pdk]$ pdk convert

------------Files to be added-----------
.pdkignore
.project
spec/default_facts.yml
.gitlab-ci.yml
appveyor.yml

----------Files to be modified----------
metadata.json
spec/spec_helper.rb
.gitignore
.travis.yml
Rakefile
.rubocop.yml
.rspec
Gemfile

----------------------------------------

You can find a report of differences in convert_report.txt.

pdk (INFO): Module conversion is a potentially destructive action. Ensure that you have committed your module to a version control system or have a backup, and review the changes above before continuing.
Do you want to continue and make these changes to your module? Yes

------------Convert completed-----------

5 files added, 8 files modified.

[rnelson0@build03 domain_join:pdk±]$

The git diff is really, really lengthy, but you can find it here. A lot of it is simply re-arranging of stanzas in existing files (.gitignore, .rspec, .travis.yml, metadata.json) and .rubocopy.yml updates. The rest is mostly in three files: Rakefile, Gemfile and spec/spec_helper.rb. It also adds five files: .gitlab-ci.yml, .pdkignore, .project, appveyor.yml, and spec/default_facts.yml. Some info on the minor changes:

  • spec/default_facts.yml: If you had default facts in your spec/spec_helper.rb file, you should move them here. The rest is mostly “housekeeping” but it removes my hiera config, which I will explore in a moment.
  • .gitignore, .pdkignore: The former has been updated a bit, and the latter is the exact same thing.
  • .gitlab-ci.yml, .travis.yml, appveyor.yml: Puppet provides some good defaults for a number of external systems. I am sticking with Travis CI right now, but it’s great to have defaults for other services if I branch out. The latter looks targeted at testing Windows systems, too, something that’s often problematic. These are all optional but do not hurt by being present.
  • .project: Looks like some XML for use with the editor Eclipse.
  • .rubocop.yml: I really don’t like rubocop but it’s included. I plan to disable rubocop as quickly as possible. However, this addresses one of my pain points – every version of rubocop changes the name of Cops, and it fails to run if it finds an unknown Cop name in its config. Since Puppet vets the config, I do not have to deal with finding all the new Cop information everytime rubocop updates. It’s not enough for me to love it, but it is significant pain reduction.

This leaves the big three files mentioned earlier, which are worthy of more detailed investigation.

Rakefile

This file is MUCH smaller now, I presume in thanks due to some pdk magic. The conversion removed a changelog task I created, so I need to get this back.

GitHubChangelogGenerator::RakeTask.new :changelog do |config|
 version = (Blacksmith::Modulefile.new).version
 config.future_release = "v#{version}"
 config.header = "# Change log\n\nAll notable changes to this project will be documented in this file.\nEach new release typically also includes the latest modulesync defaults.\nThese should not impact the
 config.exclude_labels = %w{duplicate question invalid wontfix modulesync}
end

I also have a number of puppet-lint checks I have disabled, like arrow_alignment, that I need to make sure are restored. After restoring my task and disabled checks, I will be okay with this new, slimmer default.

Gemfile

PDK includes its own version of ruby and bundler and is guaranteed to deliver a gemset with all the dependencies needed to work together. You can run pdk bundle exec gem list to see what it includes, if you are curious what those are. I will add the github_changelog_generator gem here soon, but otherwise as long as everything works, I have no need to poke at this file anymore.

spec/spec_helper.rb

Though the diff is fairly long for this file, there is nothing tricky here, it just connects the new default facts and some other common practices. It DOES remove the hiera configuration. There is a more modern version of my hiera_config that we need to add back in:

RSpec.configure do |c|
  c.hiera_config = 'spec/fixtures/hiera/hiera.yaml'
end

The naive conversion is not that bad for my setup, but it does leave me with three changes to make to keep functional parity: add the github_changelog_generator gem, the :changelog rake task, and re-enable hiera lookups.

Updating the PDK Setup

Now that I’ve identified the non-default changes needed, I can do some updates. The PDK can convert and update modules using a template system. The template it used is listed at the bottom of metadata.json. You can find the templates online, or clone that directory and examine the moduleroot contents (the moduleroot_init directory is also used when you run pdk new):

[rnelson0@build03 domain_join:pdk±]$ git diff metadata.json
diff --git a/metadata.json b/metadata.json
index 7730b90..381aff7 100644
--- a/metadata.json
+++ b/metadata.json
@@ -27,5 +27,8 @@
"name": "puppet",
"version_requirement": ">=4.0.0"
}
- ]
+ ],
+ "pdk-version": "1.5.0",
+ "template-url": "file:///opt/puppetlabs/pdk/share/cache/pdk-templates.git",
+ "template-ref": "1.5.0-0-gd1b3eca"
}

The copies on disk are from the RPM, and are almost definitely out of date. The latest templates are on GitHub. I can re-run the conversion with pdk convert --template-url=https://github.com/puppetlabs/pdk-templates. The changes for me are pretty small but will be much larger the further away in time you are from the date of the RPM build. After running it, the template info will also be updated:

+ ],
+ "pdk-version": "1.5.0",
+ "template-url": "https://github.com/puppetlabs/pdk-templates",
+ "template-ref": "heads/master-0-g7b5f6d2"

We can look at the individual templates here or clone it locally. The first thing to note is that frequently the .erb templates are dynamic data, rather than static. The simplest change is in spec/spec_helper.rb, just adding a single stanza to the Rspec.configure section, which is also dynamic:

RSpec.configure do |c|
  c.default_facts = default_facts
  <%- if @configs['hiera_config'] -%>
  c.hiera_config = "<%= @configs['hiera_config'] %>"
  <%- end -%>
  <%- if @configs['strict_level'] -%>
    c.before :each do
    # set to strictest setting for testing
    # by default Puppet runs at warning level
    Puppet.settings[:strict] = <%= @configs['strict_level'] %>
  end
  <%- end -%>
end

I highlighted the conditional that will populate the filename with the contents of configs['hiera_config']. The configs hash is populated by config_defaults.yml. The README has a lot of helpful information on the defaults. There’s just a few lines for the spec_helper.rbfile:

[rnelson0@build03 pdk-templates:master]$ tail -2 config_defaults.yml
spec/spec_helper.rb:
  strict_level: ":warning"

I need to add to this hash, but I cannot add to the templates since they are upstream. Thankfully, there’s a built in way to account for this. The contents of the configs hash are combined with the same hash taken from the local.sync.yml file!

Note: if you’d like you CAN change the templates by forking puppetlabs/pdk-templates and passing in --template-url when you call pdk new, convert, or update. You are then on the hook for updating your templates over time, though.

To make use of the sync file, I just need to add it to the root of my module directory and add the custom config. It is additive, so only differences need to be present. Here is the hiera_config value required:

[rnelson0@build03 domain_join:pdk±]$ cat .sync.yml
spec/spec_helper.rb:
  hiera_config: 'spec/fixtures/hiera.yaml'

With the use of the pdk update command, I can re-apply the templates in --noop mode and see the change:

[rnelson0@build03 domain_join:pdk±]$ pdk update --noop
pdk (INFO): Updating rnelson0-domain_join using the default template, from 1.5.0 to 1.5.0

----------Files to be modified----------
spec/spec_helper.rb

----------------------------------------

You can find a report of differences in update_report.txt.

[rnelson0@build03 domain_join:pdk±]$ cat update_report.txt
/* Report generated by PDK at 2018-05-29 20:08:11 +0000 */


--- spec/spec_helper.rb 2018-05-29 18:53:09.140882197 +0000
+++ spec/spec_helper.rb.pdknew 2018-05-29 20:08:11.819978562 +0000
@@ -28,6 +28,7 @@

 RSpec.configure do |c|
   c.default_facts = default_facts
+  c.hiera_config = spec/fixtures/hiera.yaml
   c.before :each do
   # set to strictest setting for testing
   # by default Puppet runs at warning level

Now that we have proved out the process, I need to make a few more changes. To add the github_changelog_generator, I add an array entry under Gemfile: required: ':development'. To add the task, I use Rakefile: extras:for the rake task, one entry per line (you can also use multi-line content in yaml if you prefer). This is what the file looks like as well as the pending changes:

[rnelson0@build03 domain_join:pdk±]$ cat .sync.yml
spec/spec_helper.rb:
  hiera_config: 'spec/fixtures/hiera.yaml'
Gemfile:
  required:
    ':development':
      - gem: github_changelog_generator
Rakefile:
  default_disabled_lint_checks:
    - 'arrow_alignment'
    - 'class_inherits_from_params_class'
    - 'class_parameter_defaults'
    - 'documentation'
    - 'single_quote_string_with_variables'
  extras:
    - "require 'github_changelog_generator/task'"
    - 'GitHubChangelogGenerator::RakeTask.new :changelog do |config|'
    - '  version = (Blacksmith::Modulefile.new).version'
    - '  config.future_release = "v#{version}"'
    - '  config.header = "# Change log\n\nAll notable changes to this project will be documented in this file.\nEach new release typically also includes the latest modulesync defaults.\nThese should not impact the functionality of the module."'
    - '  config.exclude_labels = %w{duplicate question invalid wontfix modulesync}'
    - 'end'
[rnelson0@build03 domain_join:pdk±]$ pdk update --noop
pdk (INFO): Updating rnelson0-domain_join using the template at https://github.com/puppetlabs/pdk-templates, from master@7b5f6d2 to 1.5.0

----------Files to be modified----------
spec/spec_helper.rb
Rakefile
Gemfile

----------------------------------------

You can find a report of differences in update_report.txt.

[rnelson0@build03 domain_join:pdk±]$ cat update_report.txt
/* Report generated by PDK at 2018-05-29 20:39:41 +0000 */


--- spec/spec_helper.rb 2018-05-29 20:33:16.488401096 +0000
+++ spec/spec_helper.rb.pdknew  2018-05-29 20:39:41.492124202 +0000
@@ -28,6 +28,7 @@

 RSpec.configure do |c|
   c.default_facts = default_facts
+  c.hiera_config = "spec/fixtures/hiera.yaml"
   c.before :each do
     # set to strictest setting for testing
     # by default Puppet runs at warning level


--- Rakefile    2018-05-29 20:33:16.489401137 +0000
+++ Rakefile.pdknew     2018-05-29 20:39:41.492832995 +0000
@@ -3,4 +3,11 @@
 require 'puppet_blacksmith/rake_tasks' if Bundler.rubygems.find_name('puppet-blacksmith').any?

 PuppetLint.configuration.send('disable_relative')
+
+require 'github_changelog_generator/task'
+GitHubChangelogGenerator::RakeTask.new :changelog do |config|
+  version = (Blacksmith::Modulefile.new).version
+  config.future_release = "v#{version}"
+  config.header = "# Change log\n\nAll notable changes to this project will be documented in this file.\nEach new release typically also includes the latest modulesync defaults.\nThese should not impact the functionality of the module."
+  config.exclude_labels = %w{duplicate question invalid wontfix modulesync}
+end


--- Gemfile     2018-05-29 20:16:10.321541394 +0000
+++ Gemfile.pdknew      2018-05-29 20:39:41.494035036 +0000
@@ -34,6 +34,7 @@
   gem "puppet-module-win-default-r#{minor_version}",   require: false, platforms: [:mswin, :mingw, :x64_mingw]
   gem "puppet-module-win-dev-r#{minor_version}",       require: false, platforms: [:mswin, :mingw, :x64_mingw]
   gem "puppet-blacksmith", '~> 3.4',                   require: false, platforms: [:ruby]
+  gem "github_changelog_generator",                    require: false
 end

 puppet_version = ENV['PUPPET_GEM_VERSION']

I now run it in yesop mode and my changes take. A quick check of rake targets confirms it. Note that all pdk bundle output is written to STDERR, not STDOUT.

[rnelson0@build03 domain_join:pdk±]$ pdk bundle exec rake -T 2>&1 | grep change
rake changelog # Generate a Change log from GitHub

I did not add the puppet-lint disable checks back in here. That is because the PDK does not use the Rakefile when running puppet-lint, it relies on the configuration file. I need to create .puppet-lint.rc at the top of the repo so that the settings are available to my CI system. That file looks like this:

[rnelson0@build03 domain_join:pdk±]$ cat .puppet-lint.rc
--no-arrow_alignment-check
--no-class_inherits_from_params_class-check
--no-documentation-check
--no-single_quote_string_with_variables-check

One difference between the Rake target and the config file is that an invalid check name in the config file can cause errors, whereas the Rake setting just doesn’t do anything. I removed the class_parameter_defaults check from the list because it is no longer a valid check.

There are a lot more things you might want to change, especially if you use CI other than Travis, but this should be enough for me to gain parity with my existing setup. Remember that you can poke at the templates online, find the default settings in config_defaults.yml, tweak in your own .sync.yml, re-run pdk update and everything should work out. If the templates cannot be wrangled as is, you can always open a ticket in the PDK project.

Make sure you commit your changes and push them up to version control, eventually to be merged into master.

First Test

Now I need to run my tests. Before I do that, I clean up everything not in git. Since I have developed in this directory, there are bundler files that don’t need to be there and may cause conflicts with the tests. Again, make sure you’ve committed changes first, or some of your uncommitted changes from the conversion will be removed:

[rnelson0@build03 domain_join:pdk±]$ git clean -ffdx
Removing .bundle/
Removing Gemfile.lock
Removing bin/
Removing convert_report.txt
Removing coverage/
Removing pkg/
Removing spec/defines/
Removing spec/fixtures/manifests/
Removing spec/fixtures/modules/
Removing spec/functions/
Removing spec/hosts/
Removing update_report.txt
Removing vendor/

PDK Tests

The first test is real simple, it’s my unit tests via pdk test unit:

[rnelson0@build03 domain_join:pdk±]$ pdk test unit
pdk (INFO): Using Ruby 2.4.4
pdk (INFO): Using Puppet 5.5.1
[✔] Preparing to run the unit tests.
[✔] Running unit tests.
Evaluated 68 tests in 2.321391522 seconds: 0 failures, 0 pending.
[✔] Cleaning up after running unit tests.

I also want to validate linting and syntax and whatnot with pdk validate:

[rnelson0@build03 domain_join:pdk]$ pdk validate
pdk (INFO): Running all available validators...
pdk (INFO): Using Ruby 2.4.4
pdk (INFO): Using Puppet 5.5.1
[✔] Checking metadata syntax (metadata.json tasks/*.json).
[✔] Checking module metadata style (metadata.json).
[✔] Checking Puppet manifest syntax (**/**.pp).
[✔] Checking Puppet manifest style (**/*.pp).
[✖] Checking Ruby code style (**/**.rb).
info: task-metadata-lint: ./: Target does not contain any files to validate (tasks/*.json).
convention: rubocop: spec/spec_helper_acceptance.rb:17:27: Style/HashSyntax: Use the new Ruby 1.9 hash syntax.
convention: rubocop: spec/spec_helper_acceptance.rb:17:49: Style/HashSyntax: Use the new Ruby 1.9 hash syntax.
<more rubocop results>

I have a ton of rubocop results, which I will address below. Everything else works fine, as expected.

CI Tests

The second is a little trickier. Currently, whatever CI system you use will use ruby/bundler to perform the same checks. That is planned to change (PDK-709 tracks the Travis CI setup) When I use Travis CI, it uses tests from .travis.yml. Here are the relevant portions:

bundler_args: --without system_tests
matrix:
  fast_finish: true
  include:
    -
      env: CHECK="syntax lint metadata_lint check:symlinks check:git_ignore check:dot_underscore check:test_file rubocop"
    -
      env: CHECK=parallel_spec
    -
      env: PUPPET_GEM_VERSION="~> 4.0" CHECK=parallel_spec
      rvm: 2.1.9

There are two different checks that will run. The first is all the syntax and linting, the equivalent of pdk validate. The second and third are the unit tests, run against the latest Puppet 4 and Puppet 5 independently, and equivalent to pdk test unit. Here’s what happens when I run the unit tests first:

[rnelson0@build03 domain_join:pdk]$ pdk bundle exec rake parallel_spec
pdk (INFO): Using Ruby 2.4.4
pdk (INFO): Using Puppet 5.5.1
Cloning into 'spec/fixtures/modules/stdlib'...
2 processes for 2 specs, ~ 1 specs per process
No examples found.

Finished in 0.00032 seconds (files took 0.07684 seconds to load)
0 examples, 0 failures


domain_join
  on redhat-6-x86_64
    with defaults for all parameters
      should not contain Package[samba-common-tools]
      should contain Package[oddjob-mkhomedir]
      should contain Package[krb5-workstation]
      should contain Package[krb5-libs]
      should contain Package[samba-common]
      should contain Package[sssd-ad]
      should contain Package[sssd-common]
      should contain Package[sssd-tools]
      should contain Package[ldb-tools]
      should contain Class[domain_join]
      should contain File[/etc/resolv.conf]
      should contain File[/etc/krb5.conf]
      should contain File[/etc/samba/smb.conf]
      should contain File[/etc/sssd/sssd.conf]
      should contain File[/usr/local/bin/domain-join]
      should contain Exec[join the domain]
    with manage_services false
      should not contain Package[sssd]
      should not contain File[/etc/sssd/sssd.conf]
      should contain File[/etc/resolv.conf]
      should contain File[/usr/local/bin/domain-join]
    with manage_services and manage_resolver false
      should not contain Package[sssd]
      should not contain File[/etc/sssd/sssd.conf]
      should not contain File[/etc/resolv.conf]
      should contain File[/usr/local/bin/domain-join]
    start script syntax
      should contain File[/usr/local/bin/domain-join] with content =~ /sssd status/
    with container
      should contain File[/usr/local/bin/domain-join] with content =~ /net ads join/
      should contain File[/usr/local/bin/domain-join] with content =~ /container_ou='container'/
    with account and password
      should contain File[/usr/local/bin/domain-join] with content =~ /register_account='service_account'/
      should contain File[/usr/local/bin/domain-join] with content =~ /register_password='open_sesame'/
    with join_domain disabled
      should not contain Exec[join the domain]
    with manage_dns disabled
      should not contain File[/usr/local/bin/domain-join] with content =~ /net ads dns register/
      should not contain File[/usr/local/bin/domain-join] with content =~ /update add /
    with manage_dns and ptr enabled
      should contain File[/usr/local/bin/domain-join] with content =~ /net ads dns register/
      should contain File[/usr/local/bin/domain-join] with content =~ /update add .+ addr show fake_interface/
  on redhat-7-x86_64
    with defaults for all parameters
      should contain Package[samba-common-tools]
      should contain Package[oddjob-mkhomedir]
      should contain Package[krb5-workstation]
      should contain Package[krb5-libs]
      should contain Package[samba-common]
      should contain Package[sssd-ad]
      should contain Package[sssd-common]
      should contain Package[sssd-tools]
      should contain Package[ldb-tools]
      should contain Class[domain_join]
      should contain File[/etc/resolv.conf]
      should contain File[/etc/krb5.conf]
      should contain File[/etc/samba/smb.conf]
      should contain File[/etc/sssd/sssd.conf]
      should contain File[/usr/local/bin/domain-join]
      should contain Exec[join the domain]
    with manage_services false
      should not contain Package[sssd]
      should not contain File[/etc/sssd/sssd.conf]
      should contain File[/etc/resolv.conf]
      should contain File[/usr/local/bin/domain-join]
    with manage_services and manage_resolver false
      should not contain Package[sssd]
      should not contain File[/etc/sssd/sssd.conf]
      should not contain File[/etc/resolv.conf]
      should contain File[/usr/local/bin/domain-join]
    start script syntax
      should contain File[/usr/local/bin/domain-join] with content =~ /status sssd.service/
    with container
      should contain File[/usr/local/bin/domain-join] with content =~ /net ads join/
      should contain File[/usr/local/bin/domain-join] with content =~ /container_ou='container'/
    with account and password
      should contain File[/usr/local/bin/domain-join] with content =~ /register_account='service_account'/
      should contain File[/usr/local/bin/domain-join] with content =~ /register_password='open_sesame'/
    with join_domain disabled
      should not contain Exec[join the domain]
    with manage_dns disabled
      should not contain File[/usr/local/bin/domain-join] with content =~ /net ads dns register/
      should not contain File[/usr/local/bin/domain-join] with content =~ /update add /
    with manage_dns and ptr enabled
      should contain File[/usr/local/bin/domain-join] with content =~ /net ads dns register/
      should contain File[/usr/local/bin/domain-join] with content =~ /update add .+ addr show fake_interface/

1 deprecation warning total

Finished in 2.34 seconds (files took 2.17 seconds to load)
68 examples, 0 failures


68 examples, 0 failures

Took 5 seconds
I, [2018-05-29T21:51:29.950455 #16602]  INFO -- : Creating symlink from spec/fixtures/modules/domain_join to /home/rnelson0/modules/domain_join
/opt/puppetlabs/pdk/share/cache/ruby/2.4.0/gems/rspec-core-3.7.1/lib/rspec/core.rb:179:in `block in const_missing': uninitialized constant RSpec::Puppet (NameError)
        from /opt/puppetlabs/pdk/share/cache/ruby/2.4.0/gems/rspec-core-3.7.1/lib/rspec/core.rb:179:in `fetch'
        from /opt/puppetlabs/pdk/share/cache/ruby/2.4.0/gems/rspec-core-3.7.1/lib/rspec/core.rb:179:in `const_missing'
        from /home/rnelson0/modules/domain_join/spec/classes/coverage_spec.rb:1:in `block in '

Deprecation Warnings:

puppetlabs_spec_helper: defaults `mock_with` to `:mocha`. See https://github.com/puppetlabs/puppetlabs_spec_helper#mock_with to choose a sensible value for you


If you need more of the backtrace for any of these deprecations to
identify where to make the necessary changes, you can configure
`config.raise_errors_for_deprecations!`, and it will turn the
deprecation warnings into errors, giving you the full backtrace.
Tests Failed

The text in bold indicates an error in spec/classes/coverage_spec.rb. The simple solution for me is to git rm it, rather than add in the right coverage gem again. It’s not particularly important to me, but if it is to you, you need to add it back to Gemfile and spec/spec_helper.rb. The important thing is that a second run does not have the error and completes successfully.

The second test is a series of rake targets and causes me some grief out of the gate:

[rnelson0@build03 domain_join:pdk]$ pdk bundle exec rake syntax lint metadata_lint check:symlinks check:git_ignore check:dot_underscore check:test_file rubocop
pdk (INFO): Using Ruby 2.4.4
pdk (INFO): Using Puppet 5.5.1
init.pp
---> syntax:manifests
---> syntax:templates
---> syntax:hiera:yaml
rake aborted!
.pp files present in tests folder; Move them to an examples folder following the new convention
/opt/puppetlabs/pdk/share/cache/ruby/2.4.0/gems/puppetlabs_spec_helper-2.7.0/lib/puppetlabs_spec_helper/rake_tasks.rb:231:in `block (2 levels) in '
/opt/puppetlabs/pdk/share/cache/ruby/2.4.0/gems/rake-12.3.1/exe/rake:27:in `'
/opt/puppetlabs/pdk/private/ruby/2.4.4/bin/bundle:23:in `load'
/opt/puppetlabs/pdk/private/ruby/2.4.4/bin/bundle:23:in `'
Tasks: TOP => check:test_file
(See full trace by running task with --trace)

The fix is easy – move the existing files to a new location as the convention has changed, or remove them entirely if they are not valuable – and then I can proceed without further fault:

[rnelson0@build03 domain_join:pdk]$ mkdir examples
[rnelson0@build03 domain_join:pdk]$ git mv -v tests/*pp examples/
‘tests/init.pp’ -> ‘examples/init.pp’
[rnelson0@build03 domain_join:pdk±]$ pdk bundle exec rake syntax lint metadata_lint check:symlinks check:git_ignore check:dot_underscore check:test_file rubocop
pdk (INFO): Using Ruby 2.4.4
pdk (INFO): Using Puppet 5.5.1
Running RuboCop...
Inspecting 0 files


0 files inspected, no offenses detected
---> syntax:manifests
---> syntax:templates
---> syntax:hiera:yaml

As I mentioned earlier, I’d like to disable RuboCop. I don’t see how right now. If I specify selected_profile: off in .sync.yml for rubocop, pdk update errors out applying the template (PDK-998). However it seems to pass just fine in that check, though the individual check fails badly (PDK-997). I’m content to let it go so long as it’s passing test and I don’t have to rewrite anything, but I will find SOME way to get rid of it if it starts causing me problems!

If you use gitlab-ci, appveyor, or some other system for testing, you will want to ensure those tests pass as well. Once done, commit everything to git again.

I am now ready to submit a pull request, and if you are following along, you may be, too. You can review and compare my pull request and the tests if you would like. You will of course notice that I merged it in spite of rubocop failures!

Summary

We have looked at what the PDK is, why we want to use it, how to install it, and how to convert a module to use it. Each module can be customized and we explored the .sync.yml file that controls customization. Once we finalized our conversion, we ran the same tests we had prior to the PDK to make sure they still work and verified the Travis CI tests, too. The next step is to find a replacement for modulesync, which allows use to push the same general configuration to multiple modules. Lucky for us, Puppet just released a potential replacement, pdksync, that I will evaluate soon.

Powershell in a Post-TLS1.1 World

I was trying to install PowerCLI on a new server in a new environment today and I encountered all sorts of error messages when PowerShell tried to install the required NuGet provider:

PS C:\Windows\system32> Find-Module -Name VMware.PowerCLI
WARNING: Unable to download from URI 'https://go.microsoft.com/fwlink/?LinkID=627338&clcid=0x409' to ''.
WARNING: Unable to download the list of available providers. Check your internet connection.
PackageManagement\Install-PackageProvider : No match was found for the specified search criteria for the provider 'NuGet'. The package provider 
requires 'PackageManagement' and 'Provider' tags. Please check if the specified package has the tags.
At C:\Program Files\WindowsPowerShell\Modules\PowerShellGet\1.0.0.1\PSModule.psm1:7405 char:21
+ ... $null = PackageManagement\Install-PackageProvider -Name $script:N ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidArgument: (Microsoft.Power...PackageProvider:InstallPackageProvider) [Install-PackageProvider], Exception
+ FullyQualifiedErrorId : NoMatchFoundForProvider,Microsoft.PowerShell.PackageManagement.Cmdlets.InstallPackageProvider

PackageManagement\Import-PackageProvider : No match was found for the specified search criteria and provider name 'NuGet'. Try 
'Get-PackageProvider -ListAvailable' to see if the provider exists on the system.
At C:\Program Files\WindowsPowerShell\Modules\PowerShellGet\1.0.0.1\PSModule.psm1:7411 char:21
+ ... $null = PackageManagement\Import-PackageProvider -Name $script:Nu ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidData: (NuGet:String) [Import-PackageProvider], Exception
+ FullyQualifiedErrorId : NoMatchFoundForCriteria,Microsoft.PowerShell.PackageManagement.Cmdlets.ImportPackageProvider

WARNING: Unable to download from URI 'https://go.microsoft.com/fwlink/?LinkID=627338&clcid=0x409' to ''.
WARNING: Unable to download the list of available providers. Check your internet connection.
PackageManagement\Get-PackageProvider : Unable to find package provider 'NuGet'. It may not be imported yet. Try 'Get-PackageProvider 
-ListAvailable'.
At C:\Program Files\WindowsPowerShell\Modules\PowerShellGet\1.0.0.1\PSModule.psm1:7415 char:30
+ ... tProvider = PackageManagement\Get-PackageProvider -Name $script:NuGet ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : ObjectNotFound: (Microsoft.Power...PackageProvider:GetPackageProvider) [Get-PackageProvider], Exception
+ FullyQualifiedErrorId : UnknownProviderFromActivatedList,Microsoft.PowerShell.PackageManagement.Cmdlets.GetPackageProvider

Find-Module : NuGet provider is required to interact with NuGet-based repositories. Please ensure that '2.8.5.201' or newer version of NuGet 
provider is installed.
At line:1 char:1
+ Find-Module -Name VMware.PowerCLI
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [Find-Module], InvalidOperationException
+ FullyQualifiedErrorId : CouldNotInstallNuGetProvider,Find-Module

I made it very angry, and I didn’t know why! After some searching, I stumbled on a solution on the Microsoft Community site. The issue is that PowerShell 5.1 defaults to only enabling SSL3 and TLS 1.0 for secure HTTP connections. You have probably noticed a lot of recent warnings on various websites about services removing support for TLS 1.0 and 1.1, and SSL3 has been disabled for many for years. Microsoft is no slacker here, and go.microsoft.com has dropped support for SSL3 and TLS 1.0 (probably TLS 1.1, too, but I didn’t check). Thus the Provider list at the URL cannot be accessed and the NuGet install fails.

PS C:\ProgramData\Documents> [Net.ServicePointManager]::SecurityProtocol
Ssl3, Tls

You can fix this by specifying Tls12 as the SecurityProtocol, but it only persists in this session, for this user. Thankfully, PowerShell has a well documented series of profile loads, so you can make the change once for all users on the server. You can choose whichever level works best for you. I chose $PsHome\Profile.ps1 which affects All Users, All Hosts. If you choose a global file like that, launch a PowerShell session as administrator (if you weren’t aware, there’s a Ctrl-modifier to avoid right-clicking!) so that you have the rights to edit the target file. If not, just substitute the file below with your choice.

This snippet will check for the existence of the file and create it if needed, then populate it with our one line change and comment telling us why. Finally, it opens the file so you can inspect it and adjust if you need to. Note that running it again will append the same lines, which isn’t harmful but may result in a little confusion for the next person to peek at it. Hello, future self!

$ProfileFile = "${PsHome}\Profile.ps1"

if (! (Test-Path $ProfileFile)) {
New-Item -Path $ProfileFile -Type file -Force
}
''                                                                                | Out-File -FilePath $ProfileFile -Encoding ascii -Append
'# It is 2018, SSL3 and TLS 1.0 are no good anymore'                              | Out-File -FilePath $ProfileFile -Encoding ascii -Append
'[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12' | Out-File -FilePath $ProfileFile -Encoding ascii -Append

notepad $ProfileFile

If you enter [Net.ServicePointManager]::SecurityProtocol in the current window, you’ll get the same Ssl3, Tls result you saw before. The profile is only loaded at startup. Open a new powershell instance on the server – as any user, even – and run it again. You should see the new setting:

PS C:\windows\system32> [Net.ServicePointManager]::SecurityProtocol
Tls12

Now you are ready to use PowerShell to connect to modern web servers, whether it’s to install NuGet, use Invoke-WebRequest, or any other HTTPS connection. Enjoy!

Self-documenting Puppet modules with puppet-strings

Documentation is hard. Anyone who has been in IT long enough will have tales of chasing their tails because of incorrect or outdated docs, or even missing docs. Documentation really benefits from automation and ease of creation. For Puppet modules, there exists a tool called puppet-strings that can help with this. There are probably other tools for this, but puppet-strings is developed by Puppet and will likely be integrated into the Puppet Development Kit, so I have chosen it as my solution.

Around this time last year, November of 2016, Will Hopper wrote a blog post about how to use puppet-strings. There is also some mention of puppet-strings in the Style Guide. At the time of that blog post, puppet-strings was mostly documented in that blog post and I didn’t jump on the project, but it turns out it’s really easy to leverage. Let’s give a shot.

Converting a Module to use puppet-strings

We should be able to convert any module to use puppet-strings, whether it’s small or large, simple or complex. Find a module you’d like to convert and you can follow along with it. I am going to convert my existing module rnelson0/certs, found on GitHub. First, let’s add the new gem to our module by adding two lines to the Gemfile:

gem 'puppet-strings'
gem 'rgen'

I’ve submitted PR 149 to puppet-strings as I believe rgen should be a runtime dependency, at which point you can remove that gem from the file.

Run bundle install or bundle update. You can now run bundle exec puppet strings generate ./manifests/*.pp . It won’t do much now, since we haven’t added strings-compatible metadata to our module, but it does generate the files:

[rnelson0@build03 certs:stringsdocs±]$ bundle exec puppet strings generate ./manifests/*.pp
[warn]: Missing @param tag for parameter 'source_path' near manifests/vhost.pp:59.
[warn]: Missing @param tag for parameter 'target_path' near manifests/vhost.pp:59.
[warn]: Missing @param tag for parameter 'service' near manifests/vhost.pp:59.
Files:                    2
Modules:                  0 (    0 undocumented)
Classes:                  0 (    0 undocumented)
Constants:                0 (    0 undocumented)
Attributes:               0 (    0 undocumented)
Methods:                  0 (    0 undocumented)
Puppet Classes:           1 (    0 undocumented)
Puppet Defined Types:     1 (    0 undocumented)
Puppet Types:             0 (    0 undocumented)
Puppet Providers:         0 (    0 undocumented)
Puppet Functions:         0 (    0 undocumented)
 100.00% documented
[rnelson0@build03 certs:stringsdocs±]$ ls html
ls: cannot access html: No such file or directory
[rnelson0@build03 certs:stringsdocs±]$ ls
CONTRIBUTING.md  doc  Gemfile  Gemfile.lock  manifests  metadata.json  Rakefile  README.md  spec  tests  vendor
[rnelson0@build03 certs:stringsdocs±]$ tree doc/
doc/
├── css
│   ├── common.css
│   ├── full_list.css
│   └── style.css
├── file.README.html
├── frames.html
├── _index.html
├── index.html
├── js
│   ├── app.js
│   ├── full_list.js
│   └── jquery.js
├── puppet_classes
│   └── certs.html
├── puppet_class_list.html
├── puppet_defined_type_list.html
├── puppet_defined_types
│   └── certs_3A_3Avhost.html
└── top-level-namespace.html

4 directories, 15 files

We can view the output in a browser by pulling up doc/index.html and browsing around it. If this is on a remote machine, it needs to be served up somehow. You can also copy it to your local machine and view it in a web browser (reminder that you can download a .ZIP of a branch from GitHub). I will leave this step out in the future for brevity, but don’t forget to do it, especially if you make changes, refresh, and nothing looks different!

We can add a rake task to make this simpler. In your Rakefile, add require 'puppet-strings/tasks'. If you add the gem to your Gemfile in a group that Travis doesn’t use, you should be sure to guard against failure with something like this:

# These gems aren't always present, for instance
# on Travis with --without development
begin
  require 'puppet_blacksmith/rake_tasks'
  require 'puppet-strings/tasks'
rescue LoadError
end

There are now two new rake tasks. You can generate docs with the much shorter bundle exec rake strings:generate:

[rnelson0@build03 certs:stringsdocs±]$ be rake -T | grep strings
Could not find semantic_puppet gem, falling back to internal functionality. Version checks may be less robust.
rake strings:generate[patterns,debug,backtrace,markup,json,yard_args]  # Generate Puppet documentation with YARD
rake strings:gh_pages:update                                           # Update docs on the gh-pages branch and push to GitHub
[rnelson0@build03 certs:stringsdocs±]$ be rake strings:generate
Could not find semantic_puppet gem, falling back to internal functionality. Version checks may be less robust.
[warn]: Missing documentation for Puppet defined type 'certs::vhost' at manifests/vhost.pp:35.
[warn]: The @param tag for parameter 'title' has no matching parameter at manifests/vhost.pp:35.
Files:                    2
Modules:                  0 (    0 undocumented)
Classes:                  0 (    0 undocumented)
Constants:                0 (    0 undocumented)
Attributes:               0 (    0 undocumented)
Methods:                  0 (    0 undocumented)
Puppet Classes:           1 (    0 undocumented)
Puppet Defined Types:     1 (    0 undocumented)
Puppet Types:             0 (    0 undocumented)
Puppet Providers:         0 (    0 undocumented)
Puppet Functions:         0 (    0 undocumented)
 100.00% documented

Next, we need to make some changes to our modules to document them. We can document manifests, types, providers, and functions, but I don’t have any of my own modules with types/providers/functions and the process is pretty similar, so I will focus on just a manifest today. Here is the header for my certs::vhost defined type before I add puppet-strings metadata:

# == Define: certs::vhost
#
# SSL Certificate File Management
#
# Intended to be used in conjunction with puppetlabs/apache's apache::vhost
# definitions, to provide the ssl_cert and ssl_key files.
#
# === Parameters
#
# [name]
# The title of the resource matches the certificate's name
# e.g. 'www.example.com' matches the certificate for the hostname
# 'www.example.com'
#
# [source_path]
# The location of the certificate files. Typically references a module's files.
# e.g. 'puppet:///site_certs' wills earch $modulepath/site_certs/files on the
# master for the specified files.
#
# [target_path]
# Location where the certificate files will be stored on the managed node.
# Optional value, defaults to '/etc/ssl/certs'
#
# [service]
# Name of the web server service to notify when certificates are updated.
# Optional value, defaults to 'httpd'
#
# === Examples
#
#  Without Hiera:
#
#    $cname = www.example.com
#    certs::vhost{ $cname:
#      source_path =&gt; 'puppet:///site_certificates',
#    }
#
#  With Hiera:
#
#    server.yaml
#    ---
#    certsvhost:
#      'www.example.com':
#        source_path: 'puppet:///modules/site_certificates/'
#
#    manifest.pp
#    ---
#    certsvhost = hiera_hash('certsvhost')
#    create_resources(certs::vhost, certsvhost)
#    Certs::Vhost<| |> -> Apache::Vhost<| |>
#
# === Authors
#
# Rob Nelson <rnelson0@gmail.com>
#
# === Copyright
#
# Copyright 2014 Rob Nelson
#

And here it is afterward:

# == Define: certs::vhost
#
# SSL Certificate File Management
#
# Intended to be used in conjunction with puppetlabs/apache's apache::vhost
# definitions, to provide the ssl_cert and ssl_key files.
#
# === Parameters
#
# @param name The title of the resource matches the certificate's name # e.g. 'www.example.com' matches the certificate for the hostname # 'www.example.com'
# @param source_path The location of the certificate files. Typically references a module's files. e.g. 'puppet:///site_certs' wills earch $modulepath/site_certs/files on the master for the specified files.
# @param target_path Location where the certificate files will be stored on the managed node. Optional value, defaults to '/etc/ssl/certs'
# @param service Name of the web server service to notify when certificates are updated. Optional value, defaults to 'httpd'
#
# @example
#     Without Hiera:
#    
#     $cname = www.example.com
#     certs::vhost{ $cname:
#       source_path => 'puppet:///site_certificates',
#     }
#    
#     With Hiera:
#    
#     server.yaml
#     ---
#     certsvhost:
#       'www.example.com':
#         source_path: 'puppet:///modules/site_certificates/'
#    
#     manifest.pp
#     ---
#     certsvhost = hiera_hash('certsvhost')
#     create_resources(certs::vhost, certsvhost)
#     Certs::Vhost<| |> -> Apache::Vhost<| |>
#
# === Authors
#
# Rob Nelson <rnelson0@gmail.com>
#
# === Copyright
#
# Copyright 2014 Rob Nelson
#

We can quickly regenerate the html docs and the defined type shows up. Be sure to click the `Defined Types` link in the top left, the left-hand menu does not mix classes and types.

You can see that there’s still some other work to do. The non-strings-ified portions of the comments are left as is, rather than parsed as markdown, so that needs to change. We don’t need most of that leftover crud. The class/defined type name is already known to strings. The Authors section should come from metadata.json (though if there are multiple, I am not sure if that file accepts an array). Copyright isn’t handled by metadata.json, and may not be strictly needed depending on your jurisdiction, but if you do need to keep it, just remove the === Copyright header and leave the text (I have chosen to omit it because US copyright law automatically grants me copyright for 70 years and I’m not that worried about it anyway; I would do something different for work).

I changed some other things:

  • Each  @param can take multi-line comments, as long as each trailing line maintains one space of extra indentation.
  • The title of defined types should be documented using @param title (docs), though it will generate a warning like [warn]: The @param tag for parameter 'name' has no matching parameter at manifests/vhost.pp:33
  • The order of metadata should go @summary > freeform text > @example > @param

Here’s the updated header and the resulting html doc:

# @summary Used in conjunction with puppetlabs/apache's apache::vhost definitions, to provide the related ssl_cert and ssl_key files for a given vhost.
#
# @example
#    Without Hiera:
#
#      $cname = www.example.com
#      certs::vhost{ $cname:
#        source_path => 'puppet:///site_certificates',
#      }
#
#    With Hiera:
#
#      server.yaml
#      ---
#      certsvhost:
#        'www.example.com':
#          source_path: 'puppet:///modules/site_certificates/'
#
#      manifest.pp
#      ---
#      certsvhost = hiera_hash('certsvhost')
#      create_resources(certs::vhost, certsvhost)
#      Certs::Vhost<| |> -> Apache::Vhost<| |>
#
# @param title
#  The title of the resource matches the certificate's name # e.g. 'www.example.com' matches the certificate for the hostname # 'www.example.com'
# @param source_path
#  Required. The location of the certificate files. Typically references a module's files. e.g. 'puppet:///site_certs' will search $modulepath/site_certs/files on the master for the specified files.
# @param target_path
#  Location where the certificate files will be stored on the managed node.
#  Default: '/etc/ssl/certs'
# @param service
#  Name of the web server service to notify when certificates are updated.
#  Default: 'http'

That’s about it! For small modules, this is probably a really simple, really quick change. For larger modules, this may take a while, but it’s tedious, not complicated.

Online Docs

There are two other things you may want to look at. First, the string docs can be a tad large (212K vs 24K for the actual manifests, for example) but more importantly, are NOT guaranteed to be in sync with the rest of your code. If you include doc/ in your git data and you change a parameter definition/use in a module and do not regenerate docs and commit them to the repo simultaneously, users may not understand and take action on the changes. If you go a long while without automatically updating them, you may confuse your users or even yourself.

You can simply add docs/ to your .gitignore file. Now, the docs are not stored in the Git repo – unless you add with `–force` or added them before updating .gitignore, at which point you will definitely want to correct that! This ensures no doc mismatch with published code and can help keep the size of the git repo a little more trim.

Second, GitHub and other providers often do not display HTML docs very well for your users, so even if you include doc/ in your repo, the contents are probably displayed as text files. Whoops! There are a few solutions for this.

  1. Publish through your Git provider’s services, like gh-pages, to a per-project website. For example, GitHub provides gh-pages sites and allows you to configure the publishing source (bonus: the rake task strings:gh_pages:update will push to this easily).
  2. Add a hook to your CI that generates the docs and sends them where necessary. Vox Pupuli is working on this but has not chosen an implementation yet.
  3. There are a few sites that you can add your docs to, some of which automagically update for you. One of these is http://www.puppetmodule.info/. You can easily click the Add Project button in the top right to add your own project to it (voila!). Since this occurs automatically, you never have to do anything else. But, when you are making changes, your docs could get stale until the next automated run occurs.

Puppet Module, by Dominic Cleal, also offers a badge you can add to your readme. Click the About button for more info.

Style Guides

One last thing to mention. As of 12/7/2017, the Style Guide is being updated to add information about puppet-strings. Pay attention to that space! I assume that it will first start with a description of standards and then add some puppet-lint checks to help you enforce it programatically. As puppet-strings is relatively new, you can expect more changes in the immediate future as it solidifies. If you have strong opinions on documentation, please speak up in the Documents Jira project, in Slack/IRC/mailing lists, or contact me and I’ll help you get your comments to the right person.

Summary

Today we added puppet-strings to a module, replaced the existing documentation with puppet-strings-compatible documentation, and looked at some solutions for automating document updates. It’s a simple process to enable better documentation updates, something everyone needs.

Disabling account lockout on your VCSA 6.5

I recently locked myself out of my vCenter Server Appliance when I was attempting to perform an upgrade through VAMI. The VAMI just says “invalid password”, but logging in on the console displayed a message indicating I had failed authentication 12 times. I had only tried four times! Regardless of whether it was me or someone else, now that I knew I had the right password, I was locked out. I waited 5 minutes but still couldn’t get in, so it looked like it was time to do a password reset. However, I wanted to explore something I had done with vRealize Orchestrator recently: disable the account lockout.

KB2147144 documents the process for booting into a privileged shell without a password. Unlike in 6.0, you hit ‘e’ instead of ‘space’ at the GRUB prompt, but otherwise it’s the same. You do have about half a second to hit ‘e’, so pay attention or you’ll find yourself rebooting a few times! For those who are not locked out already, you can just ssh into the VCSA and make this change without a reboot

Once you’re in, search for the word tally in the pam setup with grep tally /etc/pam.d/*. You will find these two lines in /etc/pam.d/system-auth.

auth require pam_tally2.so file=/var/log/tallylog deny=3 onerr=fail even_deny_root unlock_time=86400 root_unlock_time=300
auth require pam_tally1.so file=/var/log/tallylog deny=3 onerr=fail even_deny_root unlock_time=86400 root_unlock_time=300

Comment those two lines out (prepend with a #) and save the file:

# cat /etc/pam.d/system-auth
# Begin /etc/pam.d/system-auth

auth required pam_unix.so

# End /etc/pam.d/system-auth
#auth required pam_tally2.so file=/var/log/tallylog deny=3 onerr=fail even_deny_root unlock_time=86400 root_unlock_time=300
#auth required pam_tally1.so file=/var/log/tallylog deny=3 onerr=fail even_deny_root unlock_time=86400 root_unlock_time=300

If you know your password and are just dealing with lockouts, you can type reboot -f now. Otherwise, type passwd and enter the new password twice and then reboot. You can now enter your password wrong a million times – or someone else can – and you will not lose the ability to log in without waiting an extraordinary amount of time or requiring a reboot.

I upgraded from VCSA 6.5U1b to 6.5U1c and this persisted. I assume that when going to vNext (6.6 or 7.0) this change will be reverted, but I am not sure how it will behave when VCSA 6.5U2 is released, this may need to be re-done, so add disabling the lockout to your upgrade checklists alongside disabling the root account expiration.

Upgrading Puppet Enterprise from 2016.4 to 2017.3

Over the past year, there have been some pretty big improvements to Puppet. I am still running PE 2016.4.2 and the current version is 2017.3.2, so there’s lot of changes in there. Most of the changes are backwards-compatible, so an upgrade from last November’s version is not quite as bad as it sounds, and I definitely want to start using the new features and improvements. The big one for me is Hiera version 5 (new in Puppet 4.9 / PE 2016.4.5). It is backwards compatible, so you can start using it right now, but it does require some changes to take advantage of the new features. I have to upgrade the server, upgrade the agents, and then start implementing the new features! Why do I care about Hiera 5 in particular?

Hiera 3 was great, but you could only use one hiera setup on a server, regardless of how many environments were deployed on that server. This could cause problems when you wanted to change the hiera config and test it. You could not test it in a feature branch, it HAD to be promoted to affect the entire server. If you had multiple masters, you could change the config on just one, but that was about as flexible as hiera 3 would let you be. If it worked, awesome. If it broke, you could break a whole lot that needed undone before you could try again.

Hiera 5 introduces independent hierarchy configurations per environment and even per module! If you want to try out a new backend like hiera-eyaml, you can now create a new feature branch eyaml-test, update the configuration in that branch, push it, and ONLY nodes that use that environment will receive the new configuration. This is a huge help in testing changes to hiera without blowing up all your nodes.

The per module hierarchy also means that module authors can include defaults that use hiera, rather than the params.pp pattern. This makes it easier for module users to override settings. There are also improvements in the interface for those who want to create their own backends. And, best of all, hiera 5 means the name hiera is here to stay – no more confusion between “legacy” hiera and modern lookup, it’s all called hiera 5 now.

It does mean there are some deprecations to keep in mind, but they won’t actually go away until at least Puppet 6. You can use hiera 5 now and take some time to replace the deprecated bits. This does mean we can use our existing hiera 3 setup and worry about migrating it to hiera 5 later, too, which we will take advantage of.

I always prefer to upgrade to the latest version. If for some reason you’re upgrading to Puppet 4.9, be aware that Puppet 4.10 fixed PUP-7554, which caused failures when a hiera 3 format hiera.yaml was found at the base of a controlrepo or module. I kept a hiera.yaml in the root of my controlrepo for bootstrapping purposes for a long while, and if you do, you could hit that bug. Best to just move to the latest if you can.

I think most people will like Hiera 5, but there are a ton of other features (listed at the end) and even if nothing appeals specifically, it is good to stay up to date so you don’t get stuck with a really nasty upgrade process when you find a feature you really need. Please, don’t let yourself get a full year behind on updates like I did. Sometimes it’s really difficult to get out of that situation!

Puppet Enterprise Server Upgrade

I use Puppet Enterprise at both work and home these days, so I will go through the PE upgrade experience. The Puppet OpenSource install and upgrade instructions are on the same page of the documentation, so it seems pretty easy but your mileage may vary, of course.

First, take a look at your installation and make sure it’s in a known state – preferably a known good state all the way around, but at least a known one. If you have outstanding issues on the master, you need to resolve them. If some agents are failing to check in, you may want to take the time to fix them, or you could just keep track of the failures. After the upgrade, you don’t want to see an increase in failures. Once everything looks good, take a snapshot of your master(s) and a full application/OS backup if possible. If you have a distributed setup, perform this on all nodes as close to the same time as possible.

Second, download the latest version of PE (KB#0001) onto your master. Expand the tarball, cd into the directory, and run sudo ./puppet-enterprise-installer. You can provide a .pe.conf file with the -c option, or answer a few interactive questions to get started:

=============================================================
 Puppet Enterprise Installer
=============================================================
2017-11-08 20:38:07,432 Running command: cp /opt/puppetlabs/server/pe_build /opt/puppetlabs/server/pe_build.bak
2017-11-08 20:38:07,480 Running command: cp /opt/puppetlabs/puppet/VERSION /opt/puppetlabs/server/puppet-agent-version.bak

## We've detected an existing Puppet Enterprise 2016.5.2 install.

 Would you like to proceed with text-mode upgrade/repair? [Yn]y

## We've found a pe.conf file at /etc/puppetlabs/enterprise/conf.d/pe.conf.

 Proceed with upgrade/repair of 2016.5.2 using the pe.conf at /etc/puppetlabs/enterprise/conf.d/pe.conf? [Yn]y

The install takes a bit of time (30 minutes on my lab install). Once the upgrade is done, you’ll be directed to run puppet agent -t (with sudo of course). If you have additional compile masters or ActiveMQ hubs and spokes, also run the commands in steps 4 and 5 as well.

You should now be able to log into the Console and see the status of your environment. You will hopefully find Intentional Changes on most of your nodes and zero or no failures (if both are encountered in a run, Intentional Changes “wins” on the Console; let every node run at least twice to see if it moves back to Green or Red before continuing).

If you do encounter failures, you will have to analyze each issue to see if it’s related to the upgrade and something you can fix, or if it’s time to roll back. If you do roll back, make sure you roll back ALL the PE components, including the PuppetDB, so you don’t leave cruft somewhere. I experienced one issue in the lab described in PUP-7878, resolved by a reboot of the master after the upgrade.

If everything is good, then it is time to proceed to upgrading the Agents.

Agent Upgrades

The Puppet docs provide instructions for upgrading Agents in a variety of methods. I prefer to use the module puppetlabs/puppet_agent, as I’ve discussed before (OpenSource, PE Linux and PE Windows clients). My experience is using my profile module and hiera data in the controlrepo, Puppet’s instructions use the Console Classifier. It really does not matter how you do this, but I did find an issue with the Puppet docs (DOCUMENT-763) – after classifying, you must set the parameter puppet_agent::package_version or no upgrade occurs for agents already running Puppet 4 or 5. Set it to 5.3.3(obtained by running puppet --version on the master, which received the latest agent during the upgrade). Here’s how to do that in hiera:

puppet_agent::package_version: '5.3.3'

The next two agent runs will show changes. I ran my tests directly using ssh on a Linux host and it looked like this:

# First run upgrades Puppet 4 to 5
[rnelson0@build03 controlrepo:pe201732]$ sudo puppet agent -t --environment pe201732
Info: Using configured environment 'pe201732'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for build03.nelson.va
Info: Applying configuration version '1510241461'
Notice: /Stage[main]/Puppet_agent::Osfamily::Redhat/Yumrepo[pc_repo]/baseurl: baseurl changed 'https://yum.puppetlabs.com/el/$releasever/PC1/x86_64' to 'https://puppet.nelson.va:8140/packages/2017.3.2/el-7-x86_64'
Notice: /Stage[main]/Puppet_agent::Osfamily::Redhat/Yumrepo[pc_repo]/sslcacert: defined 'sslcacert' as '/etc/puppetlabs/puppet/ssl/certs/ca.pem'
Notice: /Stage[main]/Puppet_agent::Osfamily::Redhat/Yumrepo[pc_repo]/sslclientcert: defined 'sslclientcert' as '/etc/puppetlabs/puppet/ssl/certs/build03.nelson.va.pem'
Notice: /Stage[main]/Puppet_agent::Osfamily::Redhat/Yumrepo[pc_repo]/sslclientkey: defined 'sslclientkey' as '/etc/puppetlabs/puppet/ssl/private_keys/build03.nelson.va.pem'
Notice: /Stage[main]/Puppet_agent::Install/Package[puppet-agent]/ensure: ensure changed '1.9.2-1.el7' to '5.3.3-1.el7'
Notice: /Stage[main]/Puppet_enterprise::Mcollective::Server::Logs/File[/var/log/puppetlabs/mcollective]/mode: mode changed '0750' to '0755'
Notice: Applied catalog in 71.86 seconds

# Second run updates some PE components
[rnelson0@build03 controlrepo:pe201732]$ sudo puppet agent -t --environment pe201732
Info: Using configured environment 'pe201732'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for build03.nelson.va
Info: Applying configuration version '1510241554'
Notice: /Stage[main]/Puppet_enterprise::Pxp_agent/File[/etc/puppetlabs/pxp-agent/pxp-agent.conf]/content:
--- /etc/puppetlabs/pxp-agent/pxp-agent.conf 2017-11-08 21:16:56.713834368 +0000
+++ /tmp/puppet-file20171109-20790-1l6y5bg 2017-11-09 15:32:45.917909748 +0000
@@ -1 +1 @@
-{"broker-ws-uris":["wss://puppet.nelson.va:8142/pcp2/"],"pcp-version":"2","ssl-key":"/etc/puppetlabs/puppet/ssl/private_keys/build03.nelson.va.pem","ssl-cert":"/etc/puppetlabs/puppet/ssl/certs/build03.nelson.va.pem","ssl-ca-cert":"/etc/puppetlabs/puppet/ssl/certs/ca.pem","loglevel":"info"}
\ No newline at end of file
+{"broker-ws-uris":["wss://puppet.nelson.va:8142/pcp2/"],"pcp-version":"2","master-uris":["https://puppet.nelson.va:8140"],"ssl-key":"/etc/puppetlabs/puppet/ssl/private_keys/build03.nelson.va.pem","ssl-cert":"/etc/puppetlabs/puppet/ssl/certs/build03.nelson.va.pem","ssl-ca-cert":"/etc/puppetlabs/puppet/ssl/certs/ca.pem","loglevel":"info"}
\ No newline at end of file

Info: Computing checksum on file /etc/puppetlabs/pxp-agent/pxp-agent.conf
Info: /Stage[main]/Puppet_enterprise::Pxp_agent/File[/etc/puppetlabs/pxp-agent/pxp-agent.conf]: Filebucketed /etc/puppetlabs/pxp-agent/pxp-agent.conf to puppet with sum cad3d2db7a7a912a1734b7e8afa23037
Notice: /Stage[main]/Puppet_enterprise::Pxp_agent/File[/etc/puppetlabs/pxp-agent/pxp-agent.conf]/content: content changed '{md5}cad3d2db7a7a912a1734b7e8afa23037' to '{md5}a19b53e1586a748ba488ee4dcd7afc3c'
Info: /Stage[main]/Puppet_enterprise::Pxp_agent/File[/etc/puppetlabs/pxp-agent/pxp-agent.conf]: Scheduling refresh of Service[pxp-agent]
Notice: /Stage[main]/Puppet_enterprise::Pxp_agent::Service/Service[pxp-agent]: Triggered 'refresh' from 1 event
Notice: /Stage[main]/Puppet_enterprise::Mcollective::Server/File[/etc/puppetlabs/mcollective/server.cfg]/content: [diff redacted]
Info: Computing checksum on file /etc/puppetlabs/mcollective/server.cfg
Info: /Stage[main]/Puppet_enterprise::Mcollective::Server/File[/etc/puppetlabs/mcollective/server.cfg]: Filebucketed /etc/puppetlabs/mcollective/server.cfg to puppet with sum 7a8d59f271273738a51b4cf05ee6b33a
Notice: /Stage[main]/Puppet_enterprise::Mcollective::Server/File[/etc/puppetlabs/mcollective/server.cfg]/content: changed [redacted] to [redacted]
Info: /Stage[main]/Puppet_enterprise::Mcollective::Server/File[/etc/puppetlabs/mcollective/server.cfg]: Scheduling refresh of Service[mcollective]
Notice: /Stage[main]/Puppet_enterprise::Mcollective::Service/Service[mcollective]: Triggered 'refresh' from 1 event
Notice: Applied catalog in 6.02 seconds

I assume the PE component updates are based on facts only present with Puppet 5, facts that would not be present during the first run while the agent is still Puppet 4. Subsequent runs are stable.

I do not have a Windows agent to test with in my lab, I assume it looks similar but cannot verify. Be sure to test at least one Windows agent before releasing this change across your entire Windows fleet.

New Features

I have skipped from 2016.4 to 2017.3, which means I have missed out on new features in four major versions: 2016.5, 2017.1, 2017.2, and 2017.3. Here are some of the big features from the release notes:

I mentioned Hiera 5 already, which I’ll discuss further in another post. I also want to immediately enable the Package Inventory. As described, I can update the Classification of the PE Agent node group to include puppet_enterprise::profile::agent with package_inventory_enabled set to true and committing the change.

While it takes affect immediately, your agents need two runs to show up: the first changes the setting so package data is collected and the second actually collects the list. Once that happens with at least one node, you’ll start seeing data populate on the Inspect > Packages page.

I do not have need for High Availability myself, but that seems really cool. In the past, it’s been quite the pain in the behind to automate yourself. I have not used Orchestrator in anger before, and I do Hiera overrides in my control repo, almost ignoring the Console Classifier otherwise, so I probably will not be exploring them very well. However, I’m really excited about Tasks, that’s something I hope to explore during the winter break, perhaps by upgrading bash across all my systems!

Summary

Today we looked at why we want to upgrade to the latest Puppet and upgraded a Puppet Enterprise monolithic master and some linux agents. It’s not that hard! We also staked out features that we want to investigate and turned on the Package Inventory. There are a lot more changes than I listed, along with tons of fixed bugs and smaller improvements, so I recommend reviewing the release notes for each version to see what interests you.

I hope to be able to look into Hiera 5 and Tasks soon, look for new blog posts on those! Let me know if there’s anything else you’d like to see discussed in the comments or on twitter. Thanks!

Getting Started with Veeam SureBackup Jobs

Many wise people have pointed out that a backup doesn’t count until you can restore it. It’s vitally important that we test our backups by restoring them, and doing so manually is often problematic when the original system is still online. If you use Veeam Backup & Replication, it includes functionality called SureBackup to automatically test restores in a private, isolated network so that there’s no conflict with the production systems. You can read more about the functionality in the B&R Manual, starting with this section. I will be providing high level descriptions here as the manual already provides great detail, please take the time to read that as my article isn’t a substitute for the official docs!

The manual is pretty good, but I ran into a few things that were either confusing or missing, things I had to scramble to figure out on my own. That’s not fun and I don’t think others want to waste their time on it. I hope this article helps illuminate some of the gaps for others who wish to explore SureBackup. Let’s start by taking a look at how SureBackup works and the components it uses.

SureBackup, Application Groups, Virtual Labs, and other terminology

The basic process of SureBackup is as one might expect:

  1. Register and power on a VM based off the backup files
  2. Run tests against the VM
  3. Optionally perform a CRC check on the files
  4. Add the status of the VM to the report
  5. Power off and unregister the VPN
  6. Repeat 1-5 for the remaining VMs

Under the hood, of course it’s a little more complicated and introduces some new terminology:

  • Application Groups: A collection of related VMs. For example, an Active Directory Domain Controller, a domain-joined DNS server, and a domain-joined webserver. Or the trio of VMs a 3-tiered application. Only create an application group for VMs that need tested in a particular order or need extra tests. Each VM can have a defined role to run application-level tests and are powered on one at a time in the order specified.
  • Linked Job: A restore test can, after any Application Group VMs pass, run against all the VMs in a Backup or Replication job. These tests are basic power on and heartbeat tests, no application-level tests. These VMs are powered on in groups, by default 3 at a time.
  • Virtual Lab: Each job is run against or inside of a virtual lab. This is where the network isolation occurs. The Lab is attached to a single VMHost, not a cluster, and a standalone vSwitch with no uplinks is created on that VMHost to provide the isolation. A datastore is chosen for the temporary files used during the test. The production and isolated networks are bridged by at least one VM called a…
  • Proxy ApplianceNot to be confused with a Backup Proxy! This linux-based VM bridges the production and isolated networks using iptables and NAT masquerading to allow access to the restored VMs. It is managed entirely by Veeam, including creation, settings, powering on and off, etc.
  • SureBackup Job: A new job type in addition to Backup and Replication jobs. This option is not visible until a Virtual Lab exists.

Now that we know the various components, let’s expand the high level steps from before:

  1. A SureBackup Job starts and brings up the Virtual Lab and its Proxy Appliance[s].
  2. Pick the first VM from an Application Group or the first 3 VMs from a Linked Job. Register and power on a VM and run heartbeat and/or application tests against it. Tests are initiated from the Backup Server through the Proxy Appliance’s NAT and to the test VM.
  3. Optionally perform a CRC check on the files.
  4. Add the status of the VM to the report
  5. Power off and unregister the VPN
  6. [New] If the VM is a member of an application group and has failed, abort the run
  7. Repeat steps 1-6 for the remaining VMs, moving from Application Group VMs to Linked Job VMs.
  8. [New] Clean up all the temporary restore VMs and power off the Virtual Lab

We can optionally allow the VMs to persist after the SureBackup Job completes. In that case, the job actually remains running until we select it in the Console and choose to Stop Session, at which time step 7 completes. If we turn off the VMs manually, it doesn’t hurt anything, though; Veeam still handles the cleanup

Create an Application Group

An application group is defined when we want to test a number of related VMs, such as a 3-tier app or an Active Directory/Exchange setup. We do not create application groups for unrelated VMs, like 5 web servers from 5 different customers. The reason is that each VM is powered on (and left on!) in sequence, and if one fails the whole group fails. Make sure there’s a strong relationship between the VMs in an application group.

Creating an Application Group is a pretty simple process with the wizard. In the Console, go to Backup Infrastructure > SureBackup > Application Groups, right click and choose Add App Group…

Give it a name and description and click Next. On the Virtual Machines page, click Add VM and select one or more related VMs. I’ve chosen an instance of vRealize Operations Manager (vROps). Notice that the Role is not set. Select it and click Edit… Adding a role will enable an application-level check. Select the Web Server option. In the Startup Options tab, we need to make a change. vROps takes a long time to start, more than most web servers. I suggest increasing the Application initialization timeout to 300-400 seconds (5:00-6:40) so it has enough time to complete loading. Switch over to the Test Scripts tab and there is a small problem – the Web Server role uses port 80! If we highlight it and edit it, we cannot change the argument, we can only choose a different application or provide our own test script.

There are two ways to fix this. First, we can create a new role, which means we only have to describe the tests once and can re-use it across anything that fits the role. On the Backup server, browse to %ProgramFiles%\Veeam\Backup and Replication\Backup\SbRoles and we find one XML file for each role. Copy WebServer.xml to HTTPSServer.xml or similar and edit that file. There are three things to change: the Id and Name at the top and the Arguments about 2/3rds of the way down. I’m not aware of any specific rules for the Id generation, just that it should be unique. I changed the last F to an E, that’s all. The Name is what shows up in the Veeam dialog boxes. Here’s what mine looks like with the edits in bold:

<?xml version="1.0" encoding="utf-8" ?>
<SbRoleOptions>
  <Role>
    <SbRole>
      <Id>4CDC7CC4-A906-4de2-979B-E5F74C44832E</Id>
      <Name>HTTPS Web Server</Name>
    </SbRole>
  </Role>
  <Options>
    <SbVerificationOptions>
      <ActualMemoryPercent>100</ActualMemoryPercent>
      <MaxBootTimeoutSec>300</MaxBootTimeoutSec>
      <AppInitDelaySec>120</AppInitDelaySec>
      <TestScripts>
        <TestScripts>
          <TestScript>
            <Name>Web Server</Name>
            <Type>Predefined</Type>
            <TestScriptFilePath>Veeam.Backup.ConnectionTester.exe</TestScriptFilePath>
            <Arguments>%vm_ip% 443</Arguments>
          </TestScript>
        </TestScripts>
      </TestScripts>
      <HeartbeatEnabled>True</HeartbeatEnabled>
      <PingEnabled>True</PingEnabled>
    </SbVerificationOptions>
  </Options>
</SbRoleOptions>  

If we OK the Verification Options window and click Edit… again, we will see the new role HTTPS Web Server is available and the Test Scripts tab shows the port 443 in the arguments. More information on role definitions can be found in the manual.

The second way to configure the test scripts is on the Edit page by selection Use the following test script. Put something in the Name field. The Path is the TestScriptFilePath observed in the XML files plus the full path, giving us C:\Program Files\Veeam\Backup and Replication\Backup\Veeam.Backup.ConnectionTester.exe (assuming %ProgramFiles% is C:\Program Files). The arguments match the same field in the XML file, %vm_ip% 443 – or whatever port the one-off requires. We can also add our own binaries for testing, just make sure they’re documented as part of the Veeam B&R Backup server build.

Our single-VM example application group looks like this now:

There are tons of other things to customize in the application group – such as only allocating 50% of the compute and memory the VM is assigned to preserve resources during the test – but this is sufficient for our tests. Create however many application groups you want to now, you can always come back later and edit them or create more.

Create a Virtual Lab

The application group was the easy part. The Virtual Lab is next, and will create a vSwitch and Proxy Appliance VM on the host/datastore of our choosing. Before beginning, we need to decide which host and datastore to use, and grab an IP for the Proxy Appliance on the same network as the Backup server (it can be set up in a different network, but that’s a more complex setup I won’t be visiting in this article). Once we have that, we go to Backup Infrastructure > SureBackup > Virtual Labs, right click, and choose Add Virtual Lab… If a Virtual Lab has been created previously and disconnected somehow, we can also choose Connect Virtual Lab… to reconnect it. Let’s review creation of a new lab.

Give the lab a name and description. On the next page, we are asked to select a host. Again, we can NOT choose a cluster, we must choose a single host. Once we choose the host it will suggest a Folder and Resource Pool that the restore VMs will be placed in. We can edit with the Configure…button or just click Next. The next page in the wizard asks for a Datastore that the host can see, click Choose… and select one. I believe I saw a suggestion that the free space should be about 10% of the size of the VMs being restored, but I am not sure where I saw that and cannot find a more solid recommendation now.

The next page is where the Proxy Appliance is created. Set the name with the first Configure… and the network settings with the second Configure… In network settings, wemust choose the same production network as the B&R Backup server for our simple setup (more advanced options are discussed in the manual as Advanced multi-host (manual configuration) but there are no guides for it, sorry). If that network supports DHCP, just click OK, otherwise we will need to provide our IPv4 (no IPv6 availability) address settings and DNS servers. We can also optionally allow the proxy appliance to be the VMs internet proxy, but we will skip that for now (directions in steps 4 and 5 here).

Note: The proxy appliance by default receives the same name as the lab. If you use vCheck, there is a plugin that alerts on VMs whose file location on the datastore doesn’t match the VM name, and spaces in VM names are changed to underscores on the filesystem. If you use this plugin, I suggest avoiding spaces in the VM’s name or adjusting your plugin settings to skip the virtual lab VMs to prevent false positives.

On the Networking tab, choose Advanced single-host (manual configuration). You can read up on the networking modes. Our use case calls for tests of VMs in multiple networks, so we must choose the Advanced/manual option. If the restored VMs are all in a single network, then the Basic/automatic mode will work. Click Next to start setting up the Isolated Networks.

The next tab is where we will add the various networks that restored VMs will exist in. We will add some now and we will need to return here in the future when more networks are added. There are unfortunately no cmdlets or functions in the Veeam PowerShell kit to do this… yet. There will be one Isolated Network already.

Update: When I read the documentation, I assumed you needed an isolated network for every production network that a VM in the job uses (i.e. if your VMs were on VLANs 100-110, you needed 11 isolated networks and vNICs), which is not quite true. If no isolated network/vNIC exists that matches the production network for a VM, only Heartbeat and Verification tests are attempted. If an isolated network/vNIC does exist, then Ping and Script tests are attempted as well.

If we select that network and click Edit, we can see how it is associated with a Production network, an Isolated network, and a VLAN ID. This first Isolated Network defaults to the same network of the Proxy Appliance itself. It might be difficult to read through the scrubbed image, but the Isolated Network name is the Production Network name prepended with the lab name.

This network will be added to a private vSwitch on the selected host, which will have no uplinks. We should be perfectly fine leaving the VLAN ID alone, but if you are worried, just assign it a unique number not used elsewhere, maybe add 500 or 1000 to it. Click OK or Cancel and back on the Isolated Networks page of the wizard, click Add… We will need to Choose… a production network. In the dialog box, be sure to expand the host our appliance is in – if it’s a dvSwitch it SHOULD be the same everywhere, but there’s no point in chancing some identifier of a different host conflicting. In figure 7, I’ve chosen the vSphere Management network as that’s where vROPs resides.

Do not just change the VLAN ID and click OK! Take a look at the Isolated Network. I know it’s difficult to see with my scrubbed image, but it’s the same name as the previous isolated network. Click OK and the VLAN ID of both isolated networks are the same. An edit to either will update the ID for both. This isn’t what we want. The isolated network name needs changed. We can make it match the default format of <virtual lab name> <Production network name> or just <Production network name> or just enter free text like Bob. It doesn’t matter what it is called as long as it’s unique. Now, I cannot explain why the wizard doesn’t automatically change the name of the isolated network, but it doesn’t, so we have to do that ourselves. Big tip of the hat to Jason Ross who described the issue and fix in the Veeam forums. Once the Isolated Network is renamed, and click OK and the mapping will look something like this:

Next up is the Network Settings page. Here we want to create a vNIC for each Isolated Network we’ve created. Veeam uses NAT masquerading to let the Backup server communicate with the VMs on that segment, which requires selecting some address ranges that aren’t used elsewhere in the network, or at least that the Backup server doesn’t need to communicate with. Though we chose a manual network mode, a route to the masquerade IPs will automatically be created on the Backup server during restore jobs, so we do not have to manage that (this is why we did not put the proxy appliance in a different network than the Backup server). Edit the existing vNIC and assign it the IP/mask that the default gateway (router) in that network would have. We can also change the masquerade network and disable DHCP if we don’t want to use it on that interface. I would leave it enabled unless one of the VMs being restored is a DHCP server, otherwise it makes it real easy to ensure VMs get IP addresses. Here’s what that would look like for a network X.Y.10.64/27:

Repeat this for every Isolated Network you need, using the Choose isolated network to connect this vNIC to pulldown to select the correct isolated network. If we need VMs to talk to each other, check the Route network traffic between vNICs. If we don’t need it, it probably won’t hurt, though. Here’s what this might look like when complete.

We are going to skip Static Mapping, as the general NAT Masquerade works for this use case. Review the configuration on the Ready to Apply portion of the wizard and hit Apply. When we do, the resource pool, folder, vSwitch, port groups and virtual machine will be created and configured on the specified host. We can now find the proxy appliance VM (or the other resources managed) and add any notes, tags, etc that we would normally apply to those resources (I use tag-based backups so would want to put a NoBackup tag on the proxy appliance).

If you need more assistance on creating a virtual lab, I recommend this Whiteboard Fridays video.

Create a SureBackup Job and test

Finally, we need to create a backup job. We are almost there, I promise! Go into Backup & Replication > Jobs > SureBackup (this is only available if one or more virtual labs exists) and right click to create a new job with Surebackup…

Give the job a name and description and click Next. On the next page we must select a virtual lab. In this case, there is only one. Click Next. On the next page we may optionally select an application group. The next page in the Wizard is for Linked Jobs. Let’s take a moment to explore the three combinations available here:

  • Application Group only: The VMs inside the application group are powered on, one at a time and in serial order, then tested. Any VM test failure aborts the run immediately
  • Linked Jobs only: The VMs in the linked jobs are started up in batches (default: 3 at a time) until all VMs are tested. Any VM test failure does not abort the run.
  • Application Group and Linked Jobs: This a combination of the two above. The Application Group is processed as a unit and then, if it completes successfully, the Linked Job VMs are tested.

Since we created an application group, we will select it. We cannot edit the application group settings from here, only view them to ensure we select the correct group. We may choose to check the Keep the application group running after the job completes box. If so, the job will remain at 99% with all application group VMs and the Proxy Appliance VM powered on until someone right clicks on the job and chooses Stop Session. As described below, this is good for checking out any of the VMs in greater depth after the job completes. It would obviously not be something to leave enabled on a scheduled job. It is important to note that the VMs will only be kept running if the job completes successfully; if it fails, I observed the VMs being shut down immediately. So, it’s not great for troubleshooting. Click Next to proceed.

We can now link one or more Backup jobs to the SureBackup job by clicking the Add… button and selecting a job. We can only specify ONE role for all VMs in the linked job. If left blank, only a ping and heartbeat test will be used. At the bottom, we can specify how many VMs are processed at once. I did not play with the Advanced button but I believe we can use it to set roles by individual VM name, tags, folders, etc. Be aware that each VM will attempt to connect to an isolated network on the virtual lab’s vSwitch. If the backup jobs are by network, the lab can get by with a single isolated network, but if the job contains VMs from multiple networks, each one needs to exist beforehand or the job will fail. Click Next when ready to proceed.

The Settings page is where you specify to send SNMP or email notifications and determine if CRC checks are performed on the backup files. I only received emails in my testing for failed jobs; there appears to be no exposed setting for whether or not to send emails on successful job runs. CRC checks do take a while but I would recommend to avoid bit rot unless there is some sort of detection in the storage array or you’re a gambler.

Clicking Next takes us to the Schedule tab. If we check Run the job automatically we can have it run on a daily or monthly schedule, or have it run after a job – perhaps the Linked Job or a job that the Application Group VMs are backed up in. If some VMs come from a different job, leave If some linked backup jobs… checked and adjust the timer as needed.

Here’s what a successful job run looks like, with a little scrubbing, anyway:

Highlights and Observations

OK, that was a LOT we went through, very chewy. I have tried to highlight the most important items that I did not find in the B&R manual, including some I already covered above. I am also new to SureBackup myself and hope that if you see any incorrect information or workarounds, you will let me know in the comments or on twitter, specifically the affinity issue with the Proxy Appliance.

  • You need at least a Virtual Lab to create a SureBackup Job. Application Groups are optional, but are a quick way to get started.
  • Application Group VMs are processed in serial in the order specified. A single failure aborts the entire group.
  • If there is no existing role for a VM, you can create your own with an XML file. Existing roles are at %ProgramFiles%\Veeam\Backup and Replication\Backup\SbRoles.
  • Virtual Labs are tied to a single VMHost/Datastore and cannot be attached to clusters.
    • The Proxy Appliance VM is normally powered off so is mostly exempt from DRS. However, it can be moved during an HA event. Veeam does not appear to create an affinity rule to keep it in place. It also doesn’t quite notice when starting up the Virtual Lab that the VM isn’t on the same host as the vSwitch and jobs will continue to fail until you vMotion it back. Hopefully this is something Veeam is addressing; in the meantime I created a DRS rule on my own.
  • Spaces in the proxy appliance name are converted to spaces in the folder name on the datastore; at least one vCheck plugin will alert on this discrepancy between name and folder.
  • Place the Virtual Lab’s Proxy Appliance in the same network as the B&R Backup Server (not the Proxies or the Console, if the Console is separate from the Backup) and masquerade routes are added automatically; if you place it elsewhere, you must manage the routing from the Backup to the Proxy Appliance yourself.
  • Isolated networks are attached to a vSwitch with no uplinks. You should be able to use the same VLANs as you use in production, although someone could add an uplink to it. Adding 500 or 1000 to the VLAN number to put it in a range you don’t use may help prevent accidents.
  • The New Virtual Lab wizard’s Isolated Networks Add dialog does not automatically change the Isolated Network name; you must change it manually.
  • Tests vary depending on the network alignment:
    • If there is an isolated network/vNIC that matches the VMs production network, all tests (Heartbeat, Ping, Script, Verification) are attempted
    • If there is NO isolated network/vNIC matching the VM’s production network, only Heartbeat and Verification tests are attempted.
  • Windows Firewall policies default to block ICMP on “Private” networks, which is how the new Isolated network will be identified. Adjust your policy or Ping tests will fail. The policy is File and Printer Sharing (Echo Request – ICMPv4-In) for the Private profile, double click on it and enable it, or use PowerShell:

Enable-NetFirewallRule -DisplayName "File and Printer Sharing (Echo Request - ICMPv4-In)"
  • After you create the virtual lab, don’t forget to update the lab resources created to add Notes, Tags, and other standard meta-data you use internally.
  • A SureBackup Job can use an Application Group, one or more Linked Jobs, or an Application group AND one or more Linked Jobs.
    • When both are used, Linked Jobs are not processed until the Application Group tests are successful.
  • Keep the application group running after the job completes is missing the word successfully. If the application group tests fail, I observed the group shutting down immediately.
    • You will need to right click on the job and choose Stop Session when you are ready to shut down and delete the VMs.
  • Email notifications only happen on failures; I see no exposed setting to send notifications on success.
  • You cannot delete a lab or application group if a SureBackup job references it. Delete or edit the SureBackup job to remove the reference and try again.
  • You can power on the proxy appliance outside of SureBackup and deploy your own VMs attached to the vSwitch and make sure they get DHCP and are reachable with masquerading.
  • The default user/password for the proxy appliance is root/<proxyname>_r. Any spaces or underscores in the name are converted to hyphens. The default proxy name of Virtual Lab 1 results in the combination root / Virtual-Lab-1_r
  • You can examine the NAT masquerade or static NAT rules on the appliance with the commands iptables -L -n -v && iptables -t nat -L -n -v

Summary

With a lot of reading and a little bit of work, we have created an Application Group, a Virtual Lab with a few networks, and a SureBackup job that can test restores in a private environment. Most of us will have bit more work to do to create additional networks and maybe additional labs, but you should be able to start testing at least a few backups immediately. We can go to sleep a little better tonight knowing that our backups AND restores work! Even if they don’t work for some reason, at least we will find out now, not when we need them most!

I would love to hear any other tips and tricks for using SureBackup. It appears very powerful, but requires a good bit of manual effort. Has anyone made strides in automating it, officially or unofficial? Let me know in the comments or on twitter. Thanks!

Prevent vRealize Orchestrator lockouts

If you have played around with vRealize Orchestrator (and vCenter Orchestrator before it) for long enough, you have undoubtedly locked yourself out at least once, either at the console or in VAMI or both. KB 2069041 details the process to reset the password and it’s simple enough, for the most part. You still have to deal with a lockout period in both the console and VAMI, and since the only user that likely exists is root, it appears to me to be just a way to DoS yourself when you most desperately need access to your vRO. The lockout can be disabled, though.

While looking for the KB to reset the password, I found this article (if anyone knows who fdo is, please let me know, their profile page is blank) which describes how to disable the lockout at the console/ssh. Just edit /etc/pam.d/common-auth and comment out the line containing pam_tally2.so and you can get back in, whether you have changed root’s password or not. However, you cannot get into the VAMI still. Let’s see what else uses pam_tally2.so in the PAM configuration directory:

vro01:/var/log # grep tally /etc/pam.d/*
/etc/pam.d/common-account:account required pam_tally2.so
/etc/pam.d/common-account-vmware.local:account required pam_tally2.so
/etc/pam.d/common-auth:#auth required pam_tally2.so deny=3 onerr=fail even_deny_root unlock_time=86400 root_unlock_time=300
/etc/pam.d/common-auth-vmware.local:#auth required pam_tally2.so deny=3 onerr=fail even_deny_root unlock_time=86400 root_unlock_time=300
/etc/pam.d/vami-sfcb:auth required /lib64/security/pam_tally2.so deny=4 even_deny_root unlock_time=1200 root_unlock_time=1200
/etc/pam.d/vami-sfcb:account required /lib64/security/pam_tally2.so

Winner! There’s 3 different files (two are symlinks) containing that pattern and one has the word vami in it, bingo! Just get in and put a # in front of the auth line (the bolded one) to comment it out and suddenly you’ll be able to log in to the VAMI again. I do not know if this persists across updates, so you may want to revisit this after your next upgrade to be sure – I’ll come back and add a note whenever I do my next update.

You can now no longer DoS yourself, or be DoSed by accidental or malicious coworkers. However, keep in mind that this may violate your corporate standards for security, and that’s a political problem, not a technical one – perhaps in that situation, you can adjust the timers instead of disabling it entirely. I think it’s safe to say that this is perfect for everyone’s lab, though!