When Good Hypotheses Go Bad

I’ve written recently about the necessity of hypotheses, whether you’re writing or troubleshooting. When you craft a hypothesis, it’s based on some preconceived notion you have that you plan to test. When your hypotheses are tested, sometimes they are found wanting. It’s tempting to discard your failed hypotheses and simply move on to the next, but even a failed hypothesis can have a purpose.

Imagine for a moment that you’re sitting in front of a user’s computer, helping them out with some pesky problem. Suddenly it’s the end of the day, you’ve tried everything in your repertoire and you’re calling it quits when the user looks at you and says, “I thought it was kinda weird you tried all that. Bob did everything you did last week and he couldn’t figure it out either.” Gee, thanks, Bob! There’s not even a ticket from last week, nor did he mention talking with this user. How many hours did you just waste that you could have saved if you knew none of it would work?

Bob spent hours crafting and testing his hypotheses, but he discarded all of them, straight to the circular file. You then proceeded to craft and test many of the same hypotheses which, of course, failed again. If only there was some way we could learn from our failures… Wait, there is!

Let’s take a quick look at another example, a scientific hypothesis. A researcher crafts a hypothesis and spends $100,000 to gather preliminary data that can be submitted for a grant worth $2,000,000. If the preliminary data looks good, great – well on the way to two million in funding. If it doesn’t pan out and the hypothesis is shot, $100,000 just went down the drain. That’s the nature of science. But…

A few years go by. Another researcher comes up with the same brilliant idea and sets out to collect some preliminary data for around $100,000. Whoops, the hypothesis isn’t that brilliant, doesn’t work out, and the scientist wasted time and money. Now science is out $200,000 on this failed hypothesis. If only she had known that someone else had tried this before, but there was nothing in the literature to indicate that someone had. She publishes her data in a journal and the next scientist who thinks they have it made can see what the results will look like before investing time and money in the idea. Good money isn’t thrown after bad money anymore.

You can help those after you (including future-you) if you take some time to record your hypotheses and how they failed. You don’t necessarily have to go into great detail, though scientific papers obviously require more rigor, often just a sentence or two will work. “Traceroutes were failing at the firewall, but a packet capture on the data port showed the traffic leaving the firewall,” or, “The AC fan wouldn’t start and the capacitor looked like it might be bad, but I swapped it out for my spare cap and it still won’t start.” If it’s a really spectacular failure – something that was ohhhhh-so-close to working, or a real subtle failure – maybe it’s worthy of a full blog article.

Make sure to store this information somewhere it will be found by someone who is likely to need it. In Bob’s case, this is what the ticketing system was there for, so that others can see his previous work on an asset or for a user. At home, you might keep a journal or put a note in the margins of the AC manual. For public consumption, you might write a blog article or submit your research results to a journal. Anywhere that will help prevent someone in the future from having to waste resources to rediscover the failed hypothesis.

Try and make this part of your habit when researching and troubleshooting. State your hypothesis, test the hypothesis, and record any failures before proceeding with successes. Don’t be a Bob!

Sometimes We Break Things

Today’s a no-deploy Friday for me, like it is for many. However, also like many others, here I am deploying things. Small, minor things, but it would ruin my weekend if they broke anyway. Sometimes the worst does happen and we break things. Don’t worry, we’re professionals!

So, what happens if you do break something? First, don’t panic. Everyone’s broken something before, and that includes everyone above you in the food chain. The second step is to notify those above you according to your internal processes. In most cases, that means stopping what you are doing and giving your boss a paragraph summary of the issue, what it affects, and what you’re doing about it, then getting back to work. Third, don’t panic! I know I already said that, but since you’ve now gone and told your boss, they may have induced some panic – let it pass. The only way you’ll recover is if you don’t panic. Breath.

Fourth, fix it! Use your mind to decide what was supposed to happen, what you did, and where things went wrong. Identify the steps required to either back things out or repair the situation so you can proceed. Document the steps and follow them. If you have a maintenance window you are operating under, put some time estimates down and set an alarm for when you need to make the go/no-go call. Though the situation is urgent, taking a few moments now to prepare will make you more efficient as you proceed. Give your management chain short updates throughout the event until it is cleared, and don’t let rising panic get to you.

Continue reading

Puppet Git Sync via REST: A learning experience

In an upcoming series, I’ll be writing about Puppet and Git. As part of the research, I spent a number of hours looking at existing tools for post-receive hooks that were compatible with Github and r10k. In the end, my research went a completely different way and my first effort didn’t pan out, but I did learn from the experience and thought that sharing it might help others.

I was attempting to take an integrated puppet/r10k installation supporting dynamic environments and add a post-receive hook. The current workflow finished up with having to log into the puppet master, su to root/sudo and run r10k to deploy. The primary goal of the hook was to eliminate this step. This would not only simplify the workflow, but also increase security (less people have to have root access) and eliminate mistakes (Why isn’t my change visible? Oops I forgot to run r10k). The concept of hooks is fairly simple – when certain git activities occur, programs are called – but I needed to put things together. I’m on this box, I do my git work and push it to origin, then I need origin to do … something … and tell the puppet master to do … something else.

My initial research was focused on identifying the somethings. A common solution is to install gitolite on a node and make that the origin. It can then call an external program that SSH’s to the master and runs r10k. I eliminated this option because it’s either another node to manage or another service on an existing node, plus I have to perform backups of the git repo. I’d rather use Github at home or Stash at work to foist some of those responsibilities off on others.

Continue reading