This is essentially a two-fold question. First, you must understand what Configuration Management (CM) is and why you need it. Second, of all the CM tools out there, why would you choose Puppet?
In spite of my telling Jason that the world doesn’t need another “Why CM?” post, here we go 🙂
Plenty of other people have done a great job explaining what Configuration Management is and why you need it. Chief among these is Information Technology Infrastructure Library, or ITIL, a framework for IT Service Management. In the Service Transition volume, Configuration Management is described. We can simplify the meaning to describing and managing the state of a configuration through a service’s lifecycle.
Why is this important? Plenty of reasons, from the tangible to the abstract and (possibly) difficult to quantify. We’ll start with the tangible. If you don’t have good configuration management:
- You’re constantly running commands like “mv service.conf service.conf.20141014”.
- If you forget to do that and have to roll back a change or upgrade, you have to reinvent the previous configuration from memory or guesses.
- You don’t have a good grasp of changes, so when something breaks at 2:30AM and you ask, “Did anyone make a change?” you have to take people at their word. People are inherently flawed, so even if someone is not purposefully hiding their SNAFU, they could simply forget that they actually made a change.
- If two people are making changes at the same time, they can step on each other, especially in a distributed team, potentially wiping out one person’s changes.
And if you do have good configuration management:
- You can roll back to any previous configuration at any time – one, two, twenty revisions back if necessary.
- When something breaks at 2:30AM, on-shift personnel can review change logs and determine what changed without waking you up.
- If two people are making changes at the same time, both changes are saved in revisions and can be gracefully merged upon completion.
- Anyone who violates the CM processes will quickly find their changes reverted, so cowboys are found out in 30 minutes or less, rather than 3 years later when they’ve left the company and the server is being rebuilt.
I like not being woken up at 2:30AM myself; the rest is pretty much gravy. So, what about the abstract and less quantifiable reasons? Let’s start with 5 Things About Configuration Management Your Boss Needs To Know by ScriptRock. Lots of words being thrown around that you don’t normally hear in IT – cost reduction, rapid detection, agility, better quality of service, decreased risk, faster restoration of service, and the key: improved visibility and tracking – and that’s just in the first thing your boss needs to know. Without CM, you probably cannot quantify something like “faster restoration of service.” For example, you might have a gut feeling, but you probably don’t know how long it takes for you to restore service.
There’s the time between outage detection and when they call, the time it takes you to wake up and grab some caff, however long it takes you to log in, that hour you spent on the wrong f’ing system cause it’s 2:30am and you’re not thinking straight – except now it’s 3:30am – however long it took for you to identify the issue, and then whatever time it took you to fix the service. Sounds like maybe 90-120 minutes, but is that accurate? You don’t really have the visibility to say that – yet. Except you’re not done. The next update might cause the same issue, so the next day you have to meet with the developers and tell them that foo should be baz, not bar, then they need to update their system, and then you’re reasonably confident that the issue is closed.
Enabling CM changes the game. You’ll know when the change was made that broke the service, when the issue was reported, and when it was corrected. You also get the other benefits of CM, like the on-staff personnel being able to review the changes before calling you, which should also reduce the time required. The last step, where the root cause is eliminated, isn’t even a step, since your CM ensures that the change you made now is part of the future configuration as well.
Once you have those metrics, you can also start to quantify other aspects, like the cost reductions or the amount of risk a change introduces. The next four bullet points simply build off the first – what happens when you don’t have CM, a real world example of how CM helped the economy, the costs of CM, and some positive ROI information.
Let’s also look at the 2014 State Of DevOps Report, by Puppet Labs. You’ll notice DevOps in the title. CM is part of DevOps, as it is part of ITIL (anyone who tells you ITIL and DevOps are orthogonal either doesn’t know what the word means or is selling you something), but not the whole. Still, a lot of the information applies. Flip to page 4 and you’ll find a startling find – high performers were agile and reliable (again, not terms most people associate strongly with IT!) and it resulted in 30x the deployments and one half the failures. Skip to page 17 and you’ll see that CM makes it easier for developers to fix test failures. On page 18, the 2nd top predictor of IT success is Version control for all production artifacts. That’s an essential part of CM.
There’s a lot of other great information in there, mostly related to DevOps more than CM specifically. There’s one more relation to CM buried on page 28 – “Make it safe to fail.” You can’t do that without effective CM. Wouldn’t you love to work in an organization that is agile, reliable, has half the failures, and where it’s safe to fail?
Now that you’re convinced that you need configuration management, you need to select some tools. Clearly, I have chosen Puppet. Why?
- It’s popular. In fact, it’s one of the fastest-growing tech skills. This means there’s a lot of people out there who know the tool, which will help you as you grow your teams.
- Community matters.
- Puppet uses Ruby, increasing your pool of developers.
- Luke Kanies (Puppet Labs CEO) opened Puppet Conf 2014 by stating his aim to make Puppet the lingua franca of data center automation.
- VMware is strongly pushing the SDDC, which relies heavily on Puppet.
- VMware has invested strongly in Puppet Labs, both through financial backing and through development contributions.
- Juniper, Cisco, and others offer Puppet agents for their network gear – Puppet, it’s not just for Servers anymore.
A tool that is a popular skill on its own, uses a language know to a lot developers, and has both vendor and ecosystem support. That sounds pretty promising. There’s an Open Source version of Puppet that requires some assembly, which is what we’ve been working with, and Enterprise which is already assembled. Regardless of which you use, you get all the benefits of the overall Puppet ecosystem. And if you are a VMware shop, like we are, you have even more reasons to embrace Puppet. Lastly, do not forget the community. At VMworld 2014, Vodafone said they chose VMware because of the community. It matters – lots of #puppet stuff on twitter and irc, plenty of blogs about it, all of which your team can benefit from.
What about other CM tools? There are plenty – Chef, Salt, and Ansible are popular, and new tools are cropping up all the time. How do you know which is better for you? I’d spend some time reading up on each, maybe some reviews such as InfoWorld’s that summarize the use case each is designed for or Ryan Lane’s blog article about how and why Lyft moved away from Puppet, and pick a few to try out. Which brings me to the last bullet item:
- There are more CM tools in the world than you can possibly try out. Choose one.
Seriously, you will NEVER have the time to try every one, certainly not at a scale that lets you see how they really work for you. You’re most likely going to pick two. In my case, we looked at Chef and Puppet. With those two, you’re going to try something small for testing. Scaling up can be expensive, even in a proof of concept, so unless you have a large team it’s unlikely you’ll have a lot of time to pour into both (we’ve already established that without CM, you’re not all that agile!). Take a look at two in a PoC and start using one, soon. You may have noticed earlier that the reports simply claim that “companies that use CM” succeed – the choice of tool is less important than the simple fact that they’re practicing CM.
Of course, Puppet isn’t the only CM tool in your belt, it’s just meeting one specific need – tracking and enforcing the configuration of your servers. CM relies on version control, and if you’ve been following this series, you know that I’ve chosen git. You’ll also need a way to track IP usage. Even if you’re satisfied with random IP addresses and do not need an IPAM, you should track the overall usage to avoid constraints, whether that is fitting inside a /24 or ensuring the servers in a farm stay below the farm manager’s limit. There will be other resources to track – RAM, disk, network, etc. – and you’ll want to monitor the state so you know things fail, and that will all feed up to something that tracks metrics. Puppet and the other tools will make up your CM toolset.
Find a CM toolset that works well enough for you and start using it. You’ll become more agile and free up a lot of time, so if you later decide to change tools, you’ll have more time to do so. Start using CM immediately.