Posted by

Posted on

May 2, 2015

Posted under

Comments

Puppet Tutorials: Check out Puppetinabox instead

Quick note: I am deprecating my individual repos – role, profile, hiera etc – that I have used throughout the Puppet series. I will be doing representative work within the Puppetinabox repositories, mostly the controlrepo. I’m not sure when I’ll shut down the repos entirely, not until after I update old links, of course. Some of the older history will eventually be lost, but it’s mostly primitive versions of the code you shouldn’t want to copy. If you actually want the code, check out the repos now, while you still can:

Posted by

rnelson0

Posted on

May 1, 2015

Posted under

DevOps, git, Puppet

Comments

1 Comment

Preventing Git-astrophe – Judicious use of the force flag

I’d like to tell a tale of a git-astrophe that I caused in the hope that others can learn from my mistakes. Git is awesome but also very feature-ful, which can lead to learning about some of those features at the worst times. In this episode, I abused my knowledge of git rebase, learned how the -f flag to git push works, and narrowly avoided learning about git reflog/fsck in any great detail.

Often times, you will need to rebase your feature branch against master (or production, in this case, it was a puppet controlrepo) before submitting a pull request for someone else to review. This isn’t just a chance to rewrite your commit history to be tidy, but to re-apply the changes in your branch against an updated main branch.

For instance, you created branch A from production on Monday morning, at the same time as your coworker created a branch B. Your coworker finished up her work on the branch quickly and submitted a PR that was merged on Monday afternoon. It took you until Tuesday morning to have your PR ready. At this time, it is generally adviseable to rebase against the updated production to ensure your branch behaves as desired after applying B‘s changes. Atlassian has a great tutorial on rebasing, if you are not familiar with the concept.

Continue reading →

Posted by

rnelson0

Posted on

April 30, 2015

Posted under

Opinion

Comments

1 Comment

Troubleshooting: Recreation and Validation

When watching others troubleshoot, I have noticed one very important step that is frequently overlooked: reproduction of the problem and validation of the solution.

Once you believe you have remediated an issue, you should attempt to immediately recreate the problem (use your common sense – if the issue affects online sales on Black Friday, it’s probably best to make a note and schedule the testing for later!). This is often as simple as undoing the fix or re-implementing the broken config. If the problem does not return, you didn’t actually fix the issue! Something else must have happened in the meantime to fix the issue.

You may be asking yourself, “If the problem is fixed, why do I care if it was my efforts that fixed it or not?” There are three main reasons why you should care:

Ensure the problem does not reoccur without warning. If your fix isn’t a fix and you cannot induce the problem to occur immediately, you can at least document what steps were taken and that they did not resolve the issue. When it does occur again, no one will be surprised.
Your “fix” may have side effects. Revert the configuration change along with any compensating controls put in place, such as a set of permit rules above a deny rule that didn’t exist in the firewall before.
You may start a cargo cult! This is very likely if the fix isn’t a setting but an action – clearing cache, restarting a process, or even rebooting. These hoops and the need to jump through them may become part of the diagnosis and remediation process. If the solution was invalidated, everyone would realize that these efforts only waste time and have no benefit.

Customer satisfaction will increase when they see with certainty that a fix works and that it won’t spontaneously reoccur in the future. Explain that you want to take some time now to recreate the issue and validate the solution and almost all customers will be understanding and appreciate the effort.

Posted by

rnelson0

Posted on

April 29, 2015

Posted under

Opinion

Comments

2 Comments

Home Lab 2015 Project

I strongly believe that everyone needs a home lab in order to practice Continual Improvement of the self. I recently completed an upgrade of my own home lab, for those interested. This year’s upgrade was inspired partly by need after moving to a new house that lacked ethernet wiring and partly by Chris Wahl’s colorful network.

The Existing Lab

For the past years, my focus has truly been on virtualizing everything. The core of my lab are two Dell hosts running vSphere. The smaller is a 2012 PowerEdge T110 ii with a 4 core processor, 32 GB RAM (32 GB max), a single onboard NIC, and some local storage. The larger is a 2013 PowerEdge T320 with a 6 core processor, 32 GB RAM (96 GB max), dual onboard NICs, and some local storage. They are both single socket, but could take extra NICs or storage. The T320 could also have an iDRAC if I didn’t mind running downstairs once in a blue moon. They are currently running vSphere 5.5 and I will upgrade them in the next month or so.

Continue reading →

Posted by

rnelson0

Posted on

April 15, 2015

Posted under

DevOps, git, Puppet

Comments

8 Comments

Improved r10k deployment patterns

In previous articles, I’ve written a lot about r10k (again, again, and again), the role/profile pattern, and hiera (refactoring modules and rspec test data). I have kept each of these in a separate repository (to wit: controlrepo, role, profile, and hiera). This can also make for an awkward workflow. On the other hand, there is great separation between the various components. In some shops, granular permissions are required: the Puppet admins have access to the controlrepo and all developers have access to role/profile/hiera repos. There may even be multiple repos for different orgs. If you have a great reason to keep your repositories separate, you should continue to do so. If not, let’s take a look at how we can improve our r10k workflow by combining at least these four repositories into a single controlrepo.

Starting Point

To ensure we are all on the same page, here are the relevant portions of my Puppetfile:

Continue reading →

Posted by

rnelson0

Posted on

April 3, 2015

Posted under

vSphere

Comments

2 Comments

vCenter and Orchestrator compatibility note

I recently asked on twitter if the latest version of VMware’s Orchestrator, vRealize Orchestrator 6.0.1, would work with vCenter 5.5. I have not upgraded my hosts or vCenter to version 6 yet, but I wanted to save a step of having to upgrade vCenter Orchestrator to vRealize Orchestrator later if at all possible. I was directed to the VMware Product Interoperability Matrix. I selected VMware vRealize Orchestrator (Management Products), version 6.0.1 in step 1 and VMware vCenter Server (vCenter Server) and added it in step 2.

Continue reading →

Posted by

rnelson0

Posted on

March 26, 2015

Posted under

Puppet

Comments

Why not Puppet?

Alternatively: Common mistakes made when adopting Puppet.

I love me some Puppet, and anyone who knows me will tell you I’ll talk about it and configuration management as long as you let me. However, sometimes it’s not the answer people expect it to be. Is it even the right tool? As a counterpoint to Why Puppet?, let’s look at some potential use cases and see whether they are a good fit. These use cases have been gathered from my own usage, ask.puppetlabs.com, #puppet on IRC, and some user stories recounted to me and are presented in no specific order. Special thanks to Ryan McKern for some additional stories and editing.

Is it possible to run something only if the file/user/package/whatever is present? (IRC, nearly every day)

The situation is often presented as, “$Thing won’t install without me answering some questions or providing an answer file, can I get Puppet to manage it only if the package is installed?” Yes, but also no.

Continue reading →

Posted by

rnelson0

Posted on

March 16, 2015

Posted under

Opinion

Comments

2 Comments

On Karōjisatsu And Avoiding Burnout

Recently, John Willis (@botchagalupe) wrote an excellent article about Karōjisatsu, one who commits suicide due to mental stress, often work-related. It’s a very sad, emotional tale that is relevant in many industries, but one that speaks particularly to high-pressure, high-stress STEM jobs, including IT. If you have not read this article, please take a few moments to go read it now.

The core idea of nearly overwhelming burnout is probably one that you recognize. John’s article spoke very eloquently on the need to reach out if you feel overwhelmed, that you’re not alone, that there are many people who are willing to help you, and that suicide is not an option. I would like to add that if I can ever be of any assistance to anyone reading this, don’t hesitate to reach out. If you ever feel truly overwhelmed, reach out to the National Suicide Hotline at (800) 273-8255 as well. You do matter!

John describes some causes of Karōshi, including, “Stress accumulated due to frustration at not being able to achieve the goals set by the company.” There is always pressure to do more with less and in IT, we tend to feel this pressure very heavily. Systems and their associated problems always seem to come and rarely to go, giving even stable, growth-restricted companies an increasing IT burden. Every day, there is an increasing amount of systems knowledge – often of the tribal and oral history varieties – for each of us to remember and maintain. When things go wrong – and they always do – we have to drop what we are doing to put out the fires, delaying our schedule and often without the ability to adjust the delivery dates on the schedule. We often feel that we must work harder and longer to make up for these delays and maintain the schedule in order to hit the company’s goals. The mental and physical stress of something going wrong combined with the mental and physical stress of working harder and longer accumulates in a vicious cycle that must be broken before it leads to karōshi.

I know this feeling. I have found myself looking at the clock near midnight, telling myself that I’ll put the computer down in 10 minutes and go to bed, only to blink and the clock reads 3AM. I have gotten up early on a Sunday to fix something broken that I could not get to on Friday. I’ve even found myself getting up early to “fix” something that’s not broken! The pressure of needing to resolve an issue, ship a product, or address a customer’s question keeps my brain running at night when it should be resting and recuperating so that I can do good work the next day. Sometimes it’s not even a company goal that keeps me working on an issue, just my stubborn pride. Whatever the cause, I know the feeling of overwhelming pressure that affects all of us from time to time.

Burnout of any sort, whether it puts you on the edge of suicide or the edge of your career, is dangerous. We must all develop coping strategies to deal with these feelings. I have been fortunate to have some wonderful mentors in my career. I credit my first two bosses for giving me two great coping strategies to deal with this pressure, and I would like to share those strategies with you.

The first coping strategy is courtesy of Bob at Centerline, my first “real world” job. We were the IT Operations staff at an engineering firm. His advice was simple: “Sometimes, you let it burn.” It’s very easy to hear users scream and think that world really is ending. What the users are saying is important, but we must evaluate what we hear carefully and prioritize accordingly. Are we reacting because a single person is struggling with an issue or because the company is negatively affected by a problem more than they are positively affected by whatever you are currently doing? If you’re off shift when the issue occurs, must it really be taken care of immediately by you, or can it wait or be handled by someone else? Most of us have been taught repeatedly that the answer is always, “Fix it now!” but is that truly the case?

When issues have a low severity or affect a low number of users, particularly if you’re treating symptoms and not causes, let them “burn”. While things are burning, put your effort toward fixing the underlying causes in order to prevent future fires. You will often find that your environment is not as flammable as everyone thought and that a little fire and smoke won’t destroy the company. It’s still hot, and it still hurts, but it is a different kind of hurt. This is an especially great way to deal with chronic issues. Rather than dropping everything for, say, a single user who complains about a broken report that they need RIGHT NOW, fix the underlying bug in the reporting system. If you can pick just one “burn day” a month and spend that time on underlying causes, you will find yourself in a much better position in a few months. If you can do it more frequently, or cherry-pick some chronic issues to let burn, you may see results in just a few weeks.

Regardless of the frequency with which you have burn days, you’ll notice one thing very quickly: your stress levels will go down. When you do encounter a chronic issue that you cannot let burn, you know that someday soon you will be able to make that issue go away forever. Your time will be freed up to work on improvements and innovation rather than just outages, lowering the pressure put upon you and enabling you to meet the company’s goals.

The second coping strategy was taught to me by Scott from RBA Systems. This is a consulting firm where we provided both development and operations to our customers. I was a 21 year old kid who just dropped out of college and was out to prove myself in IT. In my first few weeks, Scott often had to tell me, “pace yourself.” I wish I could say I thought nothing of it, but as the young smartass I was, I thought it was something a jaded old guy would say. I’m tough and there’s no way I’ll let him slow me down! Instead, in just a few months, the blistering pace I had coming out the gate had to falter and Scott, who also had to manage a few other people at the same time, started lapping me.

There’s simply no way you can keep up a lightning pace forever. Going 110% seems great until your body and mind start to fall apart due to the constant pressure they are under. Even going 100% cannot be maintained. You might find yourself flagging at the end of the day or your typing rate going to shit or constantly typing the wrong commands in the wrong windows. This is especially dangerous with ‘reboot’, ‘write erase’, or ‘rm’ style commands! None of these actions help you, your company, or your customers. Find out what your 100% looks like, pull back a bit from it until you find your pace you can maintain that balances speed, efficiency, and accuracy. Keep adjusting that pace over time as your skills improve and your work/life demands shift to maintain the balance. You may be making adjustments every day, and that’s okay – no-one’s perfect.

I credit my ability to successfully maintain a high level of performance and avoid burnout in IT over the past fifteen years to the valuable lessons from my early mentors, burn days and pacing myself. I hope these tools can help others with this ongoing struggle.

Do these things because you have pride in your work, because you want to be able to continue contributing to IT for decades, because they’re the right things to do. Do it because you matter. Do it because you love life.

Posted by

rnelson0

Posted on

March 14, 2015

Posted under

vSphere

Comments

2 Comments

Updated vSphere Upgrade Order

On March 12, 2015, VMware released vSphere 6 for General Availability. I thought it would be a good time to recap, and pretty up, my older upgrade post. The previous post was based on vSphere 5.5 and the specifics of the software upgrades have changed, but the general order has not.

Check Compatibility

Read all of the Hardware Compatibility List, Interoperability Matrix, and other similar documents for all your components. Make sure all hardware is supported with ESXi 6 and that all software solutions support both ESXi and vCenter 6. Repeat with all other vSphere suite components such as SRM and the vRealize products. Contact vendors of incompatible solutions and find out when their v6 support is expected. If anything in your list does NOT support the latest version, make a decision on whether to remove or replace the components or to halt the upgrade until they are available.

Continue reading →