You’ve probably heard it before: Root Cause Analysis (RCA) doesn’t exist, there’s always something under the root cause. Or, there’s no root cause, only contributing factors. This isn’t exactly untrue, of course. Rarely in our entire life will we find some cause and effect so simple that we can reduce a problematic effect to a single cause. Such arguments against RCA may be grounded in truth but smooth over the subtleties and complexities of the actual process of analysis. They also focus on the singular, though nothing in the phrase “Root Cause Analysis” actually implies the singular. Let’s take a look at how RCA works and analyze it for ourselves.
Root Cause Analysis is the analysis of the underlying causes related to an outage. We should emphasize that “causes” is plural. The primary goal is to differentiate the symptoms from the causes. This is a transformative and iterative process. You start with a symptom, such as the common “the internet is down!” In a series of analytical steps, you narrow it down as many times as needed. That progression may look like:
- “DNS resolutions failed”
- “DNS server bind72 failed to restart after the configuration was updated”
- “A DNS configuration was changed but not verified and it made its way into production”
- “Some nodes had two resolvers, one of which was bind72 and the other was the name of a decommissioned DNS node.”
Each iteration gets us closer to a root cause. We may identify multiple root causes – in this case, lack of config validation and bad settings on some nodes. Not only are these causes, root causes, but they are actionable. Validation can be added to DNS configuration changes. Bad settings can be updated. Perhaps there’s even a cause underneath – WHY the nodes had bad settings – because RCA is an iterative process. We can also extrapolate upward to imagine what other problems could be prevented. DNS configurations surely aren’t the only configurations that need validated.
Multiple causes and findings doesn’t invalidate Root Cause Analysis, it only strengthens the case for it. If it makes it easier to share the concept, we can even call it Root Causes Analysis, to help others understand that we’re not looking for a singular cause. Regardless of what we call it, I believe it is absolutely vital that we continue such analysis, that we don’t throw away the practice because some people have focused on the singular. Be an advocate of proper RCA, of iterative analytical processes, and of identifying and addressing the multiple causes at hand.