Meet the DevOps Complacency Chimp: Chaos Monkey’s Evil Twin
Complacent attitudes can wreak havoc across your organization and make a monkey out of the business.
You may be familiar with Chaos Monkey―the concept of injecting failure into systems to increase resilience. Developed by Netflix, Chaos Monkey is basically a script that randomly shuts down cloud instances. For folks at Netflix, this means always working in an environment of instability.
That may sound crazy, but it gives engineers the ability to test their systems in unexpected failure conditions and build in fault tolerance from the get-go. Chaos Monkey is DevOps on steroids: using automation to purposefully influence behaviors in order to achieve the highest level of quality.
By working in this type of environment, engineers become skilled in designing highly resilient systems. It’s why Netflix streaming services keep on trucking―even during severe cloud outages. But Chaos Monkey isn’t just for cloud natives―it has also been adopted in some highly surprising contexts.
U.S. Citizen and Immigration Services CIO Mark Schwartz is a big fan. His team uses Chaos Monkey as part of its development work, to ensure the robustness of systems. But while we can only admire progressive thinkers like Schwartz, the reality facing many organizations is quite different.
Too often, good DevOps intentions don’t lead to tangible results. Something always seems to get in the way but what could it be? More often than not, it’s something similar to Chaos Monkey but far more destructive: a DevOps Complacency Chimp, of which there are four different kinds.
Status Quo Chimp
This chimp has a distinctive call: “It ain’t broken, so don’t fix it”. How often have you heard that the old piece of hardware with a spaghetti mass of cabling can’t be updated because it’s working fine or that those rigid change management processes and standardization dictates are tough love?
Removing Status Quo Chimp takes some work―you’ll find that some people have built careers on the back of entrenched processes. Take these away and they are likely to feel threatened, so be prepared for some squealing. Leaders charged with driving DevOps will need to be patient but pragmatic.
In organizations large and small, there’s a tendency to accept sub-optimal IT practices to a point where they become normal. There are many warning signs―like tests being skipped because of manual processing and setup delays or updates being rushed through without considering technical debt implications.
It’s one thing being able to identify this kind of normalized deviance but it’s quite another being able to remove it. As with anything involving people, the organizational and process barriers that cause staff to break rules or their colleagues to keep schtum are where most attention should be focused.
This ape loves to vainly gloat over diagnostic metrics that present it in a good light―SLA reports showing the number of customers fobbed-off in less than a minute, 99-point-something uptime stats... All great stuff but easily massaged and seldom correlating to anything that’s actually business critical.
The only way to banish Vanity Chimp is through the creation of insight into what really matters, through business metrics. This means incorporating customer experience KPIs and even revenue indicators on all operational dashboards and demonstrating them across all areas of the business.
This sucker thrives wherever IT staff are incentivized according to outputs―number of lines of code written or support calls answered etc.―rather than the outcomes that genuinely impact the business. This chimp can be particularly hard to displace if staff are financially rewarded for achieving “superstar” status.
A shift to focusing on outcomes is essential and building cross-functional reward programs can help. The goal is to get everyone across the development and operations organizations to feel that they all benefit from solving the same core business problems the business is truly concerned with.
The complacency chimp can kill any DevOps program before it starts. Left unchecked, its behavior will make a mess of your DevOps program and make a monkey out of the business.