Prevent vRealize Orchestrator lockouts

If you have played around with vRealize Orchestrator (and vCenter Orchestrator before it) for long enough, you have undoubtedly locked yourself out at least once, either at the console or in VAMI or both. KB 2069041 details the process to reset the password and it’s simple enough, for the most part. You still have to deal with a lockout period in both the console and VAMI, and since the only user that likely exists is root, it appears to me to be just a way to DoS yourself when you most desperately need access to your vRO. The lockout can be disabled, though.

While looking for the KB to reset the password, I found this article (if anyone knows who fdo is, please let me know, their profile page is blank) which describes how to disable the lockout at the console/ssh. Just edit /etc/pam.d/common-auth and comment out the line containing pam_tally2.so and you can get back in, whether you have changed root’s password or not. However, you cannot get into the VAMI still. Let’s see what else uses pam_tally2.so in the PAM configuration directory:

vro01:/var/log # grep tally /etc/pam.d/*
/etc/pam.d/common-account:account required pam_tally2.so
/etc/pam.d/common-account-vmware.local:account required pam_tally2.so
/etc/pam.d/common-auth:#auth required pam_tally2.so deny=3 onerr=fail even_deny_root unlock_time=86400 root_unlock_time=300
/etc/pam.d/common-auth-vmware.local:#auth required pam_tally2.so deny=3 onerr=fail even_deny_root unlock_time=86400 root_unlock_time=300
/etc/pam.d/vami-sfcb:auth required /lib64/security/pam_tally2.so deny=4 even_deny_root unlock_time=1200 root_unlock_time=1200
/etc/pam.d/vami-sfcb:account required /lib64/security/pam_tally2.so

Winner! There’s 3 different files (two are symlinks) containing that pattern and one has the word vami in it, bingo! Just get in and put a # in front of the auth line (the bolded one) to comment it out and suddenly you’ll be able to log in to the VAMI again. I do not know if this persists across updates, so you may want to revisit this after your next upgrade to be sure – I’ll come back and add a note whenever I do my next update.

You can now no longer DoS yourself, or be DoSed by accidental or malicious coworkers. However, keep in mind that this may violate your corporate standards for security, and that’s a political problem, not a technical one – perhaps in that situation, you can adjust the timers instead of disabling it entirely. I think it’s safe to say that this is perfect for everyone’s lab, though!

PowerCLI, vCheck, and vCenter SSL/TLS secure channel errors

I have been struggling with a number of errors and warnings between PowerCLI and my vCenter servers. The warnings about my self-signed certificates are no big deal, but the errors of course are. The biggest error I have is a well-known issue documented in this vCheck issue on GitHub:

The underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel.

This happens intermittently, but frequently with the Get-HardDisk cmdlet which is used in most of the Snapshot related plugins. When it does happen, the vCheck plugin fails to return any meaningful data and normally errors pretty fast – run times for the full set of checks in my environment drops from ~120 minutes to ~8 minutes.

The issue goes back over 3 years and while there were a number of attempts to fix the issue, there was no single fix that worked for everyone, every time. Some would hide the issue till you hit a certain threshold and others would just make it far less likely to occur, but not eliminate it. I eventually opened an issue with VMware support and we found what I think is the solution.

Untrusted Certificates and CAs

I am using the provided certificates for my vCenters. These certificates have an expiry term of 10 years and are signed by a CA also provided by vCenter during the initial install. This is typically known as a self-signed certificate, but more specifically means the cert is not signed by a CA trusted by the client (if it was signed by Verisign but you removed the Verisign CAs from your Trusted CA store, it would be reported as a cert signed by an Untrusted CA and/or a self-signed certificate, depending on the application interfacing with it. I have decided to continue to use these certs as the process for attaching new certificates to a vCenter installation is hairy, to say the least.

This means that when I run Connect-VIServer against my vCenter, I receive the following note about the untrusted CA:

Be sure to use the FQDN to access your vCenter server, or this warning will be swallowed in favor of a “name mismatch” warning.

Generally speaking, most of us don’t care about this error because we are confident that we are connecting to our vCenter server and we tend to ignore this as a cause of problems. I certainly did. I don’t know the specifics surrounding it, but PowerCLI sometimes decides it doesn’t like the Untrusted CA and thus generates the error about Could not establish trust relationship. Sometimes, it’s cool and establishes it just fine. I believe it has something to do with resource exhaustion in tracking the connection, as one of the workarounds suggested on GitHub appeared to work for some by increasing the resources available to a PowerShell session. Regardless of the specifics, connecting to a Trusted CA does not have this issue. So our resolution is to use certificates signed by a Trusted CA!

As suggested above, you can attach new certificates directly from a Trusted CA to your vCenter, but it’s a tricky process. The other alternative is to trust the CA from your vCenter, which we’ll do here. Alternatively, if you want to attach new certs from an already-Trusted CA, check out KB2111219 and any number of blog posts that address this process and skip ahead to the Summary section.

Download and Install the Certificate Bundle

The first step to trusting the vCenter’s included CA is to download the certificate bundle. You can do this by visiting your vCenter on port 443, e.g. https://vcenter.example.com, and clicking on Download trusted root CA certificates:

You will receive a zip file that contains the certs in various formats. Since I’m on Windows, I burrow down to the certs\win directory where there are two CRT files and one CRL. Extract this in a folder somewhere. You only need the CRT that is paired with a CRL; the other CRT is for the ssoserver and that is not something PowerCLI cares about.

In vCenter 6.0, the cert bundle had no directories and just two files ending in .0 and .r0 (now found in the lin and mac directories) that correspond to .crt and .crl respectively, so just extract and rename the files if you that’s the case.

Now, we need to access the certificate store. This varies per OS and version. In Windows 7, you can find the store inside the Internet Options control panel on the Content tab by clicking the Certificates button. Click over to the Trusted Root Certification Authorities tab.

Click the Import button and browse to the CRT you stored earlier. When you import it, you’ll see the name CA – if you see ssoserver, you chose the wrong CRT file, try again with the other. You can now click on the imported CA called CA and click View to see the name. This is important when you have more than one vCenter, as they all import with the name CA, because that’s not confusing! You can see here this is the CA from my vCenter server called vcsa.nelson.va:

You want to repeat this process on any and all nodes that will use PowerCLI to connect to the vCenter in question, not just the server you run vCheck from.

Summary

With either your new certs or the new trust with the existing CA, you shouldn’t see the warning upon accessing your vCenter with Connection-VIServer. Close your PowerShell/PowerCLI sessions run that inside a brand new session and if you did things correctly, you will not see any yellow warning text:

When you run vCheck now, you should no longer see those random SSL/TLS errors! If you disabled some checks, like Phantom Snapshots, because they failed more often than they ran, this is a good time to review if you want to re-enable them. I hope this helps.

I will warn that this solution has only been tested for about a month, but I saw error rates drop from 70% to 0%. I could NOT get the errors to occur with the CA in place, but they would come back the moment I removed the CA. If you see the error return, please let me know in the comments or on twitter and I’ll be glad to share the ticket number reference for engaging support!

Many thanks to Isaac at VMware for this solution, and especially his insistence that I should import the CA even though I swore that couldn’t be the problem 🙂

Upgrade VCSA 6.0u3 to VCSA 6.5u1

Today, I upgraded a vCenter appliance on 6.0u3 to 6.5u1. I had been waiting for this forever as we wanted to get to 6.5, but had erroneously missed a line in the 6.0u3 release notes that said it could not be upgraded to 6.5! Happily, 6.5 Update 1 remedied that, so away we go!

You cannot use VAMI to do major/minor upgrades, only point releases (Update X) and patches, so you must download the new ISO and use the installer. You can find the ISO here and some great instructions on the installer in Mike Tabor’s Upgrade vCenter Server Appliance 6.0 to 6.5 article. The installer itself is pretty foolproof and Mike’s article addresses most ambiguities, so I just want to detail a few things I ran into that may help others.

  • Download the ISO before the change window begins, not after. That can be a problem, or so I’ve heard 😀
  • Turn off DRS during the upgrade. It’s mentioned in step 15 and in a warning in the installer itself, but I think it’s better to disable it before you get to that step, just in case DRS kicks in between when you start and that step.
  • The process involves a temporary IP for the new VCSA VM, so the old and new can be online simultaneously to transfer data. Add the temporary IP to any firewall rules involving the existing VCSA! If you do not do this, you may run into an error when stage 1 ends and the installer cannot reach the VAMI interface on the temporary VCSA. If you forget, you can proceed with Stage 2 at the URL specified, though you do have to enter a lot of auth information again:

  • If you have an external VUM, you need to either start the Migration Assistant on it or disable the extension com.vmware.vcIntegrity or the installer will not get started. I chose to disable the extension as the end goal was to use the new internal VUM service.
  • The password policy has changed, so you may not be able to keep the same root password for the new appliance.
  • For Stage 2, Mike very optimistically says “after a few minutes the vCenter Server Appliance upgrade should complete.” With just 2GB of data to migrate, it still took close to 45 minutes, and some individual steps seemed hung for close to 10 minutes at a time. Don’t worry if it takes a while, as long as you’re seeing progress overall.

After performing the upgrade, you’ll surely have other tasks, such as updating extensions like vRO and vROps, so don’t delete any snapshots right away in case something goes awry.

Managing SSH server security with Puppet

Edit: In an earlier edition, I credited the wrong newsletter as the source. My apologies to R.I.Pienaar!

In this past week’s DevCo Newsletter, I saw the Rebex SSH Check, which reminded me that I’ve locked down the SSH server security configuration at work, but not at home. Sounds like a good opportunity to blog about the process!

Now, I’m in security, but I’m not all that about the security settings. The names vary from descriptive to really obtuse, and there’s three keys that need managed: ciphers, MACs, and KexAlgorithms (that’s Key Exchange Algorithms, which is the name I’m more familiar with). The key to security is knowing when you don’t know, and seeking out that expertise. I am very thankful for Mozilla’s really great security guidelines, including an OpenSSH guide. There are sections for Modern and Intermediate security, depending on what is available for the systems you are securing. For me, these align with the Red Hat/CentOS EL7 (Modern) and EL6/5 (Intermediate) distros that I use.

The first step is making sure we have a tier in hiera for each OS/release we support, otherwise sshd could fail to restart when it encounters a cipher set name that is unknown to the openssh version in use. That could be bad, especially if we don’t have some form of iLO console to the nodes, though if we have puppet running on a regular basis or through mcollective, we *should* be able to recover. In any case, you definitely want to check run status of your nodes after this change to make sure you don’t discover a problem when you’re trying to troubleshoot some other problem.

I define my hierarchy in hiera itself using the puppet/hiera module, so here is the yaml for hiera to parse as well as the resulting hiera.yaml, the change is in bold:

# portion of hiera/puppet_role/puppet.yaml, which applies to the puppet master
hiera::hierarchy:
  - 'clientcert/%%{::}{clientcert}'
  - 'puppet_role/%%{::}{puppet_role}'
  - 'osfamily-release/%%{::}{osfamily}-%%{::}{operatingsystemmajrelease}'
  - 'datacenter/%%{::}{datacenter}'
  - 'global'

# /etc/puppetlabs/puppet/hiera.yaml
# managed by puppet
---
:backends:
- eyaml
- yaml

:logger: console

:hierarchy:
  - "clientcert/%{clientcert}"
  - "puppet_role/%{puppet_role}"
  - "osfamily-release/%{osfamily}-%{operatingsystemmajrelease}"
  - "datacenter/%{datacenter}"
  - global

:eyaml:
  :datadir: "/etc/puppetlabs/puppet/environments/%{::environment}/hiera"
  :extension: yaml
  :pkcs7_private_key: "/etc/puppetlabs/puppet/keys/private_key.pkcs7.pem"
  :pkcs7_public_key: "/etc/puppetlabs/puppet/keys/public_key.pkcs7.pem"

:yaml:
  :datadir: "/etc/puppetlabs/puppet/environments/%{::environment}/hiera"

:merge_behavior: deeper

This change will need to be put in place on the master, the master service restarted, and no dissimilar configs exist in the wrong location before agents will see the changes we make below (I had a /etc/puppetlabs/code/hiera.yaml that slightly vared from /etc/puppetlabs/puppet.hiera.yaml and it kept winning out till I removed it and restarted pe-puppetserver). You can force the run now, or wait up to two full run cycles before verifying that all your agents see the changes.

The second step is to populate the two OS/release files with the specific sets you want to use. I use saz/ssh, which allows me to use the ssh::server::options parameter to free-hand some stanzas into /etc/sshd_config. These commands replicate my settings, again according to Modern for EL7 and Intermediate for EL6:

mkdir hiera/osfamily-release
cat > hiera/osfamily-release/RedHat-6.yaml << EOF
---
ssh::server::options:
  'KexAlgorithms'            : 'diffie-hellman-group-exchange-sha256'
  'Ciphers'                  : 'aes256-ctr,aes192-ctr,aes128-ctr'
  'MACs'                     : 'hmac-sha2-512,hmac-sha2-256'
EOF

cat > hiera/osfamily-release/RedHat-7.yaml << EOF
---
ssh::server::options:
  'KexAlgorithms'            : 'curve25519-sha256@libssh.org,ecdh-sha2-nistp521,ecdh-sha2-nistp384,ecdh-sha2-nistp256,diffie-hellman-group-exchange-sha256'
  'Ciphers'                  : 'chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr'
  'MACs'                     : 'hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-512,hmac-sha2-256,umac-128@openssh.com'
EOF

There’s one final step: merge settings. You may have noticed the merge_behavior setting in my hiera.yaml above, but that’s defunct. Now you must set the lookup options. I do this in my least specific hiera file, hiera/global.yaml:

lookup_options:
  profile::base::linux::sudo_confs:
    merge: deep
  profile::base::linux::logrotate_rules:
    merge: deep
  ssh::server::options:
    merge: deep

If you don’t add this, then you’ll only get the first ssh::server::options values found, even for sub-keys like Ciphers that were not set at the higher tier.

Once all of these changes are in place, your agents should get the new settings and restart sshd. Any new ssh connections to the affected servers will use the specified security sets and ONLY the specified security sets. Existing connections will persist until the server or client end the sessions. We can now use curve25519-sh256@libssh.org as a KexAlgorithm with an EL6 node, but we would fail to connect to an EL7 node as only diffie-hellman-group-exchange-sha256 is available. If we re-run the Rebex SSH Test, our Modern servers show all green now. Success!

Addendum: Peter Souter notified me on twitter about his mozilla_ssh_hardening module (GitHub only at this time) that enforces the Mozilla recommendations on Ubuntu 16.04, CentOS 7, and CentOS 6. You can use that module to replace some of the work above, as long as you do not require conflicting customizations. I still hope this articles helps you understand the workings of hiera merges and the need for vetted security configurations.

Puppet 5 has arrived!

If you missed the news this past week, the Puppet 5 Platform was released! Read the announcement and the release notes for some great details. Congratulations to everyone at Puppet for getting this new release out the door. I’m looking forward to diving in with it as soon as a Puppet Enterprise release is out, since I’ve converted even my home lab away from Puppet Opensource.

There are a few things I’ve learned from the announcement thread, slack, and my own experiences with it in the last few days. It’s still early, so I am sure this will get out of date quickly, but I hope it helps others in the short term.

  • Puppet 5 AIO provides Ruby 2.4.1, so your tests should use it as well – even if you’re not using AIO puppet, it’s still helpful for any puppet modules.
  • PuppetDB requires postgresql96, but it’s not a dependency on the puppetdb package, since you can install puppetdb and postgresql on different hosts. Version 4.x works with postgresql96, so upgrade that first, then puppet. Detail here.
  • Puppet 5 includes a vendored version of the semantic_puppet gem. In Puppet 4.7 and below, it had a dependency on the external semantic_puppet getm. The gem is used by metadata-json-lint, which is often part of a puppet rspec test setup. Check out the metadata-json-lint README installation section to see how to deal with this. If your tests run against ~> 4.0 then you’re probably okay.
  • There’s a new version of puppetlabs_spec_helper that apparnetly has some issues with spec fixtures and symlinks (from slack, nothing to quote). I haven’t hit this myself, it might already be fixed, but something to be aware of if you have symlink-related issues during testing.
  • The combination of Puppet 5, rpsec-puppet, and the new puppetlabs_spec_helper are more stringent than Puppet 4 is. I’m not sure which of the three components specifically triggers it. I was testing for a resource that required another service, which was not part of the define I was testing (here). With puppet 4, this was fine, but with puppet 5, it started generating errors in this travis run. The fix is simple, through using a pre_condition to provide the service in the catalog, seen in this commit.
  • The first Puppet Enterprise release including Puppet 5 should be out sometime this fall.

That’s all I’ve run into so far. One last thing, here is a .travis.yml for testing component modules with both Puppet 4 and 5. You only need to update the matrix section, if you already have one, but I thought the whole thing might be helpful for those who don’t have tests yet.:

---
language: ruby
sudo: false
cache: bundler
notifications:
  email:
  on_failure: always
branches:
  only:
  - master
bundler_args: --without development system_tests
before_install: rm Gemfile.lock || true
script: bundle exec rake test
matrix:
  fast_finish: true
  include:
  - rvm: 2.3.1
    env: PUPPET_GEM_VERSION="~> 4.0" STRICT_VARIABLES=yes
  - rvm: 2.4.1
    env: PUPPET_GEM_VERSION="~> 5.0" STRICT_VARIABLES=yes

Where to store Puppet files and templates

I haven’t written a blog post in a while because I’ve been bogged down in work and life and not had much time in the lab. To make sure I don’t get too out of practice, I’m going to try writing some shorter tips and tricks articles. Let me know what you think.

A few days ago, someone asked a great question on the puppet-users mailing list about the location of config files in the roles/profile pattern. It’s a good question, and we can go deeper because it assumes the location of config files outside of that pattern, too. I’m going to explain where I keep my config files, and templates, in the various types of modules. There’s no single correct answer here, this is just a framework for me.

To start, let’s describe the types of modules. Component modules describe a single application/technology/thing and are designed to be consumed by end users. This is pretty much anything on the forge, such as puppet/hiera to manage a Hiera implementation or puppetlabs/apache to manage apache, vhosts, etc. There’s also a sub-type of these modules, Private Components. The line here is blurry, but think component modules that are not designed to go on the forge. This could be a module for a company’s internal application, very similar to a traditional component module, or an cluster of custom facts. Pretty much anything that’s not a Component module, or our final type: Profile modules. This last type is the collection of classes that make up your role/profile pattern implementation. They’re often simply called profile, but maybe there is more than one module if you have a lot of business groups using the puppet system. They differ from both types of component modules in that they contain the business logic of your implementation and are where you compose the collection of component modules that you use. I wrote an article on what goes in a role or profile, too.

In Component modules, the relevant configuration files or templates for the component are collected. In an ssh module, you’d have the ssh_config and sshd_config data; a sudo module would have sudoers and a template for sudoers.d/ files. Private Component modules vary quite a bit in functionality, but I treat them like regular component modules. If the module is for custom facts, there’s no need to put files or templates in it. If it’s for an internal app, the configuration files are stored in that module.

Your Role/Profile modules are a little more complicated. If you have a component module for apache, you likely have a profile class for apache, perhaps profile::apache or profile::somegroup::apache. The component module probably has its own file or template, but it may accept alternative files and templates. In this case, I create a sub-directory with the module subclass name, such as files/apache or templates/apache,  and add the file(s) there, e.g. templates/apache/vhosts.erb.

This is a pretty simple layout. The only real difficulty is when you have a private component modules and a profile for that component: do you put the file/template with the profile or the component module? I tend to lean toward the private component modules first, but I’ve done both.

I hope this helps and I’d love to hear of any other layouts you’ve had success with!

Using PowerCLI from the PowerShell Gallery

As you’ve surely seen, I love me some PowerCLI. So I was really happy when I saw that PowerCLI is now available on the PowerShell Gallery! What this means is that it is no longer a package you install on a server, it’s a set of modules you load from the gallery. When there’s a new version available, you just go get it. Because it’s now a bunch of files, not only do you not need to go to vmware.com to find the download link, you can also install it without requiring administrative access! That’s pretty awesome when you’re a tenant on a system, and it’s pretty awesome for the owners of the system, too (no needing to punt all your PowerCLI users so the files aren’t locked during an upgrade). I fill both roles from time to time, so I’m really happy about this improvement! Read more about the change in this VMware PowerCLI Blog article by Kyle Ruddy.

The article will guide you through the setup just fine, so I won’t dwell on that part very much, but if you’ve followed my PowerShell Profile article, there’s one small change to make: uninstall the old version of PowerCLI, then edit your posh profiles with notepad $profile and remove whatever version of the profile you used. Leave anything else you have added and close it out. Remember to do this once in PowerShell and once in PowerShell ISE if you use both.

Now, install the modules as the blog article recommends.

Find-Module -Name VMware.PowerCLI
Install-Module -Name VMware.PowerCLI -Scope CurrentUser

That’s it, you’re done! The modules will automatically be loaded as needed. You should be able to start typing Connect-VIServer and see autocomplete working by tabbing it out in regular PowerShell, in the typeahead dialog in ISE, or however your PowerShell UI displays it. If you hit enter, the containing sub-modules are loaded immediately on-demand. You can import the entire suite of modules with Import-Module VMware.PowerCLI,in your profile if you’d like, but it adds about 10 seconds to PowerShell startup on my laptop for minimal gain vs on-demand loading. However, it does give you the look of the old PowerCLI desktop shortcut, if you so desire it.

If, for some reason, the module is not found by PowerShell after installation, check out the value of $env:PSModulePath. It should include %USERPROFILE%\Documents\WindowsPowerShell\Modules, e.g. `C:\Users\rnelson0\Documents\WindowsPowerShell\Modules`, which is where Install-Module installs the files to. If it does not, you’ll need to modify it. Mine was funky because I apparently edited the environment variable portion of my windows install, even though I don’t remember it.

To keep up with PowerCLI from the Gallery, just run Update-Module -Name VMware.PowerCLI once in a while. Easy peasy. Enjoy!