“No Deploy Friday” is a sign of IT Maturity

Last Friday, I made a pretty sarcastic joke about Friday deploys. Hopefully the image macro let you know it was a joke, though!

You shouldn’t be deploying on Fridays. Let’s qualify that a bit. “Deploy” in this instance means a high-risk change – a new product or equipment, a change in some protocol (moving from EIGRP to OSPF), or even some simple change that you’ve not done before so you don’t really know how risky it is – and “Friday “is your last day of the week. Patches on Friday? Up to you, though I don’t suggest doing so at 4:59PM. Is your shift 4×10, Monday through Thursday? Thursday is your Friday, get your deploys done by Wednesday. Is it a short week because of vacation plans? Maybe don’t deploy this week at all. Whatever the particulars are for you, “No Deploy Friday” is just the phrase for not deploying something brand new before you’re off for a few days.

Having the “No Deploy Friday” rule is a sign of IT maturity, for you and your company, in so many ways:

  • Realizing you cannot do everything. Sometimes you can’t do something when you want to. It’s better to accept that than try and force it.
  • Realizing that things go wrong. When you’re young, it’s easy to believe so fiercely that you know what you’re doing that you cannot accept that you might be wrong. With maturity comes the knowledge that even if you know what you’re doing, it can still turn out poorly.
  • Realizing that your decisions affect those around you. It’s important to recognize that when something does go wrong, you don’t exist in a vacuum. Other people are affected. Your boss and coworkers. Your customers. Your family. Your coworker’s family. You can make the decision that YOU do not mind staying late if something breaks, but you should not make the decision for your coworker’s spouse that they won’t be home for dinner.
  • Realizing that you matter. Work has to happen. But you don’t have to subsume yourself for it to happen. Your time off is yours. Friday movie night should be for watching movies, not fixing bad deploys. And instead of pushing a big deploy before you head on vacation, which pretty much ensures through Murphy’s Law that you won’t actually get to enjoy your vacation (or worse, you will but your company and users will suffer until you’re back on the grid!), talk to management and push the deploy out or find another owner. There’s also the more serious struggle with burnout that we all have. Reminder: You matter! If you need help, reach out to someone! We’ll listen!
  • Realizing that no matter what gets done today, there will always be more tomorrow! When I was younger, I sometimes thought I might “work myself out of a job”. What BS. If I ever actually did get everything done, it would have just given me time to find something else to do. For most of us, we’ll never actually empty out our task list. Since that’s the case, there’s no need to kill yourself trying to do so. This isn’t the same thing as saying, “feel free to slack.” It’s a fine line to know what NEEDS to be done today and what would really be nice to do today but would be more likely to be successful on another day. Knowing when you should postpone work until you have the time to do it right is another sign of maturity.
  • Realizing that “hurry” is the antithesis of “fast”. This one is counter-intuitive. We’re often told to, “move fast and break things.” I hate that saying. It really should be, “move fast and break things in development so you don’t break production.” That’s the real intent behind it. But even that’s not right. “Hurry” indicates your speed; “fast” is describing your velocity.  When you’re hurrying, it becomes easy to skip a step because someone hit you up on IM while you were working, or push through an error message because you want to leave to get to the movie on time. Those things end up costing you more time when you have to drop everything to troubleshoot or rollback. Now you’re hurrying at a high speed but have a negative velocity. You show maturity when you choose a velocity of zero over a negative velocity.
  • Realizing that Friday is great for “non-work” work. We work in a fast paced industry. We spend a lot of time working, but we have a lot of other things we have to do that we don’t always consider “work”. No-one likes doing their expense reports, but they’re great Friday work. We also have to keep up with new technologies and processes, and if you have an 8+ hour day with no deploys, it’s a good candidate for some contiguous learning time. There’s lots of ways to be productive while avoiding deploys, and your manager and finances will also be happy that they don’t have to hound you for your expense reports anymore!

It’s often a difficult struggle for maturity in IT. Just knowing what maturity looks like, doesn’t mean you can just go ahead and do it. If you’re in a culture of Friday deploys, you may have to lead the charge on this. If you’re the rogue IT person breaking the “No Deploy Friday” rules, talk to your coworkers and see what wisdom you can glean. Let maturity be your badge of pride, not your scars and war stories!

Parallelized Rspec Tests

Peter Souter showed me a recent PR for the puppet-approved jenkins module where he parallelized the rspec tests. When there are a large number of tests in separate files, it can take a lot of time when run in series. Parallelizing the tests MAY offer a speed improvement; in Peter’s case, it reduced the time by almost 50%. With a small number of tests, or when an overwhelming percentage of the tests are in a single file, there may be no benefit or even a decrease in performance, so be sure to test out its effects before committing to it. It’s a pretty simple change, but let’s look at it in some detail anyway.

Gemfile

In your Gemfile, you need to add one line:

gem 'parallel_tests'

Continue reading

Running rspec-puppet tests with granularity

When working on your puppet code, you’re going to want to run rspec against your tests on a regular basis. There are a few quirks to this process that we should cover quickly.

“Normal” Usage

Let’s start with a simple test of everything. You can do this with bundle exec rake spec (or the test target, which includes spec plus some other targets). That would look something like this (note: be is an alias for bundle exec):

Continue reading

#vDM30in30 in May!

Every year in November, NaNoWriMo occurs. For those of us who blog, a more recent challenge called vDM30in30 takes place at the same time: Write 30 blog posts in 30 days. However, November can be a difficult time for writing as the holidays and family can encroach on that. This has kept a number of people from participating in past challenges or led to people having to drop out before the month is out.

In response, this year we’d like to try two challenge events. In addition to the annual event in November, we’re launching a May event! It’s the same challenge – 30 blog posts in 30 days – but outside of the holiday season! Yes, we know, May has 31 days. It’s up to you if you want to write from May 1-30 or May 2-31, or maybe even write 31 posts in 31 days!

This challenge is entirely personal. The 30 blog posts can be about any subject you like, of any length. You can do one a day or clump them together. If you announce your posts on Twitter or Facebook, just add the hashtag #vDM30in30. The only goal is to push yourself to write frequently. Read more in the Q&A link below.

If you would like to participate, please contact Angelo Luciani or myself on twitter or us the comments below to let us know about your blog and social media contacts. We’ll put out a list of public participants and add you to a once-a-day summary post of all the participants.

Reference:

Getting started in IT: Years 0-4

Over the weekend, a really great hashtag came into existence, #FirstTechJob:

//platform.twitter.com/widgets.js

This in turn came from a great question about the requirements in job listings:

//platform.twitter.com/widgets.js

Please check out the hashtag, it’s a great sampling of the always humble, often mundane, beginnings of nearly everyone in IT. Some common themes were of course help desk support, managing printers and email systems, and managing or running ISPs (often including modems!). My, how times have changed. It did inspire me to talk a bit more about my journey in the hope that it may help some others on their own journey, whether they’re just getting started or have been at it for a while.

Getting Started

//platform.twitter.com/widgets.js

My very first professional job was working for a neighbor’s local PC business in the summer between high school and college. He sold then-high end computers (I think mostly 386s and sometimes 486s, but it’s been a while), with ISA graphics cards that took 30-60 minutes to render a single low-res frame, and needed assistance assembling them in a timely manner. The work itself wasn’t difficult – insert Tab A into Slot B, tighten Screw C – but I asked a lot of questions and learned a good bit about hardware. I made a few bucks, mostly spent at the local laser tag arcade, and most importantly was able to put “professional” work experience on my resume in addition to Wendy’s and KFC. Thank you, neighbor, for that first job!

After the first year of college, I lucked into a paid summer internship at a local engineering firm. The company’s owner was a friend from church and my Dad helped me get an interview – some of that was getting me in the door, but a good bit was getting me off my butt – and I was able to upsell my summer job and my schooling into experience. I did a number of responsibilities there over the next two summers. Migrating CAD files from an old unix terminal to the new NT 3.51 systems, then to NT 4.0; desktop support; network support; printers and plotters; and a bajillion other little things.

One memorable event was when Pittsburgh was struck by some severe weather (including tornados – a real rarity for that area!) and a lightning strike blew out the transformer outside our building. Always splurge for lightning suppression. Over half the hubs died and we got a fast track to switches. In 1997-98, that was ahead of the curve for many. There were of course many less memorable, but more important, things I learned. The most important was how to provide service and support to users and maintain a positive relationship. There were always trying people (I have actually seen someone stick a CD in a 5 1/4″ floppy and force the door shut, and it’s not pretty) but hey, I knew nothing about what they did, so why would I hold it against them for not being experts in my job area?

In spring of ’99, I was supposed to intern there again, but a hiring freeze changed that plan. I already had the college semester off and it was too late to schedule classes when I found out, so I canvassed and found two part-time jobs where I could maintain self-employment, my own schedule, and make money. I learned pretty quickly that I don’t want to be my own boss. That’s a lot of work, and sometimes I only had < 3 days of work in a week! I kept at this through most of ’99 and added “Y2K preparation” to my skillset. Note: you really want to retire before 2038.

In December of ’99, I found a full time job at a local IT consultant, except they weren’t local to me so I had to move. I am 99% certain the only reason I got the job was because I called the company every week asking if had openings and the owner decided it was easier to let me try a job on probation than to put me off anymore. Persistence pays off! This was my first full time, self-sustaining job. I stayed here for 3 years and did a little bit of everything: customer service was key to everything, large-scale OCR of court documents, web front-ends to said documents, Wireless WAN connectivity (pre 802.11b), and I really fell in love with network security.

Keep Going

That covers the first four years and a bit beyond, which gave me a really great foundation for the rest of my IT career. I would like to think I’ve done fairly well since then. These jobs may not seem like the kind of awe-inspiring jobs that everyone wants, but they were good jobs, with good people, and I appreciate how lucky I was to have them. I know it can be a struggle to get those first few jobs and years of experience, so if you can’t land a dream job out of the gate, know that you can find tons of other jobs that will benefit you and your career. IT is really diverse, and you may find something you didn’t know you were looking for; if not it will certainly help you with those “4+ years experience needed” jobs.

Good luck in your journey!

Ruby net/https debugging and modern protocols

I ran into a fun problem recently with Zabbix and the zabbixapi gem. During puppet runs, each puppetdb record for a Zabbix_host resource is pushed through the zabbixapi, to create or update the host in the Zabbix system. When this happened, an interesting error crops up:

Error: /Stage[main]/Zabbix::Resources::Web/Zabbix_host[kickstart.example.com]: Could not evaluate: SSL_connect SYSCALL returned=5 errno=0 state=SSLv2/v3 read server hello A

If you google for that, you’ll find a lot of different errors and causes described across a host of systems. Puppet itself is one of those systems, but it’s not the only one. All of the systems have something in common: Ruby. What they rarely have is actual resolution, though. Possible causes include time out of sync between nodes, errors with the certificates and stores on the client or server side, and of course a bunch of “it works now!” with no explanation what changed. To confuse matters even more, the Zabbix web interface works just fine in the latest browsers, so the SSL issue seems restricted to zabbixapi.

To find the cause, we looked at recent changes. The apache SSLProtocols were changed recently, which shows up in a previous puppet run’s output:

Continue reading

Announcement: Github repo for common vCenter roles

Last week, I was installing some of the vRealize Suite components and was creating accounts for each component using the Principle of Least Privilege. I was sometimes able to find vendor documentation on the required permissions, sometimes I found a few blog posts where people guessed at the required permissions, but in almost no cases was I able to find automated role creation. Perhaps my google-fu is poor! Regardless, I thought it would be nice to have documented and automated role creation in a single place.

To that end, I created a repo on GitHub called vCenter-roles. I use PowerCLI to create a role with the correct permissions, and only the correct permissions. Each cmdlet will allow you to specify the role name or it will use a default. For instance, to create a role for Log Insight, just run the attached ps1 script followed by the command:

New-LogInsightRole

It’s that easy!

I will be adding some other vRealize Suite roles as I work my way through the installation, but there are tons of other common applications out there that require their own role, and not just VMware’s own applications! I encourage you to open an issue or submit a Pull Request (PR) for any applications you use. The more roles we can collect in one place, the more helpful it is for the greater community. Thanks!

What is a backdoor?

Last month, a significant finding in Fortinet devices was discovered and published. When I say significant, I mean, it’s huge – Multiple Products SSH Undocumented Login Vulnerability. In other words, there’s a username/password combination that works on all devices running the affected firmware versions. If you are still running an affected version, you NEED to upgrade now! This is bad in so many ways, especially following similar issues with Juniper and everything we’ve seen from Snowden’s data dumps. Fortinet responded by saying ‘This was not a “backdoor” vulnerability issue but rather a management authentication issue.’

Is that right? What is a “backdoor” and what is “management authentication”? Is there an actual difference between the two, or is just a vendor trying to save their butt? I got into a discussion about that on twitter:

//platform.twitter.com/widgets.js

Ethan challenged me to think about the terminology and I think I’ve come around a bit. Here’s what I now believe the two terms mean.

Continue reading

Root Cause Analysis: It’s Still Valid

You’ve probably heard it before: Root Cause Analysis (RCA) doesn’t exist, there’s always something under the root cause. Or, there’s no root cause, only contributing factors. This isn’t exactly untrue, of course. Rarely in our entire life will we find some cause and effect so simple that we can reduce a problematic effect to a single cause. Such arguments against RCA may be grounded in truth but smooth over the subtleties and complexities of the actual process of analysis. They also focus on the singular, though nothing in the phrase “Root Cause Analysis” actually implies the singular. Let’s take a look at how RCA works and analyze it for ourselves.

Root Cause Analysis is the analysis of the underlying causes related to an outage. We should emphasize that “causes” is plural. The primary goal is to differentiate the symptoms from the causes. This is a transformative and iterative process. You start with a symptom, such as the common “the internet is down!” In a series of analytical steps, you narrow it down as many times as needed. That progression may look like:

  • “DNS resolutions failed”
  • “DNS server bind72 failed to restart after the configuration was updated”
  • “A DNS configuration was changed but not verified and it made its way into production”
  • “Some nodes had two resolvers, one of which was bind72 and the other was the name of a decommissioned DNS node.”

Each iteration gets us closer to a root cause. We may identify multiple root causes – in this case, lack of config validation and bad settings on some nodes. Not only are these causes, root causes, but they are actionable.  Validation can be added to DNS configuration changes. Bad settings can be updated. Perhaps there’s even a cause underneath – WHY the nodes had bad settings – because RCA is an iterative process. We can also extrapolate upward to imagine what other problems could be prevented. DNS configurations surely aren’t the only configurations that need validated.

Multiple causes and findings doesn’t invalidate Root Cause Analysis, it only strengthens the case for it. If it makes it easier to share the concept, we can even call it Root Causes Analysis, to help others understand that we’re not looking for a singular cause. Regardless of what we call it, I believe it is absolutely vital that we continue such analysis, that we don’t throw away the practice because some people have focused on the singular. Be an advocate of proper RCA, of iterative analytical processes, and of identifying and addressing the multiple causes at hand.

Puppet 4 Lessons Learned

I’ve been working recently on migrating to Puppet 4. All the modules I maintain have supported it for a little bit but my master and controlrepo were still on Puppet 3. I slowly hacked at this over the past month and a half when time presented itself and I learned a few things. This post is an assortment of lessons learned, more than a tutorial, but hopefully it will help others going through this effort themselves.

At a high level the process consists of:

  • Make sure your code is Puppet 4 compatible via Continuous Integration.
  • Find a module that manages Puppet 4 masters and agents. If your current module works with 4, this step is much easier.
  • Build a new template or base image that runs Puppet 4.
  • Update your controlrepo to work with a Puppet 4 master, preferably with the new puppetserver instead of apache+passenger, again using CI for testing.
  • Deploy the new master and then start adding agents.

The biggest lesson I can give you is to perform these changes as 5 separate steps! I combined the last three into a single step and I paid for it. I know better and my shortcut didn’t turn out so well. Especially, do not take your Puppet 3 master down until your Puppet 4 master build tests out okay! Alright, let’s go through some of the steps.

Continue reading