Troubleshooting: Recreation and Validation

When watching others troubleshoot, I have noticed one very important step that is frequently overlooked: reproduction of the problem and validation of the solution.

Once you believe you have remediated an issue, you should attempt to immediately recreate the problem (use your common sense – if the issue affects online sales on Black Friday, it’s probably best to make a note and schedule the testing for later!). This is often as simple as undoing the fix or re-implementing the broken config. If the problem does not return, you didn’t actually fix the issue! Something else must have happened in the meantime to fix the issue.

You may be asking yourself, “If the problem is fixed, why do I care if it was my efforts that fixed it or not?” There are three main reasons why you should care:

  • Ensure the problem does not reoccur without warning. If your fix isn’t a fix and you cannot induce the problem to occur immediately, you can at least document what steps were taken and that they did not resolve the issue. When it does occur again, no one will be surprised.
  • Your “fix” may have side effects. Revert the configuration change along with any compensating controls put in place, such as a set of permit rules above a deny rule that didn’t exist in the firewall before.
  • You may start a cargo cult! This is very likely if the fix isn’t a setting but an action – clearing cache, restarting a process, or even rebooting. These hoops and the need to jump through them may become part of the diagnosis and remediation process. If the solution was invalidated, everyone would realize that these efforts only waste time and have no benefit.

Customer satisfaction will increase when they see with certainty that a fix works and that it won’t spontaneously reoccur in the future. Explain that you want to take some time now to recreate the issue and validate the solution and almost all customers will be understanding and appreciate the effort.

Home Lab 2015 Project

I strongly believe that everyone needs a home lab in order to practice Continual Improvement of the self. I recently completed an upgrade of my own home lab, for those interested. This year’s upgrade was inspired partly by need after moving to a new house that lacked ethernet wiring and partly by Chris Wahl’s colorful network.

The Existing Lab

For the past years, my focus has truly been on virtualizing everything. The core of my lab are two Dell hosts running vSphere. The smaller is a 2012 PowerEdge T110 ii with a 4 core processor, 32 GB RAM (32 GB max), a single onboard NIC, and some local storage. The larger is a 2013 PowerEdge T320 with a 6 core processor, 32 GB RAM (96 GB max), dual onboard NICs, and some local storage. They are both single socket, but could take extra NICs or storage. The T320 could also have an iDRAC if I didn’t mind running downstairs once in a blue moon. They are currently running vSphere 5.5 and I will upgrade them in the next month or so.

Continue reading

Improved r10k deployment patterns

In previous articles, I’ve written a lot about r10k (again, again, and again), the role/profile pattern, and hiera (refactoring modules and rspec test data). I have kept each of these in a separate repository (to wit: controlrepo, role, profile, and hiera). This can also make for an awkward workflow. On the other hand, there is great separation between the various components. In some shops, granular permissions are required: the Puppet admins have access to the controlrepo and all developers have access to role/profile/hiera repos. There may even be multiple repos for different orgs. If you have a great reason to keep your repositories separate, you should continue to do so. If not, let’s take a look at how we can improve our r10k workflow by combining at least these four repositories into a single controlrepo.

Starting Point

To ensure we are all on the same page, here are the relevant portions of my Puppetfile:

Continue reading

vCenter and Orchestrator compatibility note

I recently asked on twitter if the latest version of VMware’s Orchestrator, vRealize Orchestrator 6.0.1, would work with vCenter 5.5. I have not upgraded my hosts or vCenter to version 6 yet, but I wanted to save a step of having to upgrade vCenter Orchestrator to vRealize Orchestrator later if at all possible. I was directed to the VMware Product Interoperability Matrix. I selected VMware vRealize Orchestrator (Management Products), version 6.0.1 in step 1 and VMware vCenter Server (vCenter Server) and added it in step 2.

vRO Compat Fig 1 Continue reading