The Joel Test is a set of twelve engineering practices—like source code control, one-step builds, and bug tracking—that differentiate a mature software team from, well, one that didn’t have bug tracking, version control, or one-step builds.
The test, written by Joel Spolsky, an influential programmer turned entrepreneur, took off, and not just as a guide to improvement, but as a way to measure successful companies. Even today, on Joel’s career board, companies advertise to programmers by listing their score publicly.
Cutline: Why work for Codility? For one thing, they score an 11 on the Joel Test.
Joel’s list was a wonderful step forward, but it’s also twelve years old. That leaves me wondering if we could do better than a “daily build” (item 3) or “hallway usability testing” (item 12). So over the past year I visited a long list of companies known for forward thinking, including Etsy, Zappos, Royal Caribbean, and GeoLoqi. And I took my notebook along (The pen and paper kind).
I wanted to find out what makes high-performance software teams successful today, what their challenges were, and how they measure success. One common theme emerged. They all strive to get developers, system administration, and testing closer together to reduce time-to-market and improve quality
I’ve thought about what I learned and have boiled it down to a set of 6 principles. In the spirit of the original Joel Test, I offer you the Heusser Test: 6 Steps to Better Code Faster.
1. Can you build a test server with any build of the code in one step?
According to a recent report by voke Research, the average large IT organization required access to 33 systems for development and test. That sounds huge, but once you tick off webserver, load balancer, database, in dev, test, QA, and production, the numbers rise.
Of those, only 4% say they have on-demand access to a test environment. That means old-school test environment work: filing tickets to set up servers, getting permission, and negotiating the version of the code to put on the test server. It’s the kind of thing that Bernie Berger described in his day in the life of a software tester and, thanks to modern service virtualization technology, it’s now entirely unnecessary.
When I started at Socialtext back in 2008, we had command-line tools to build a test server in a VM in about the time it takes to brew a cup of coffee. Can you do that today? Can your competition?
2. Can you deploy to a staging environment continually?
If you can have the Continuous Integration (CI) server deploy the latest version of each major branch to a virtual machine, you can forget waiting five minutes for a cup of coffee and test right now.
3. Can you perform load, stress, and performance testing on your application—without buying any hardware? Right now?
Let’s say you need to know if your website can handle an anticipated flood of traffic from a major (and unexpected) article in the Wall Street Journal, The New York Times, Slashdot, or Reddit. That means performance testing, probably on demand. Generating that kind of traffic would take a modest data center. The ability to quickly rent those servers and use them on demand might be the only economic choice.
4. Can you push to production within five minutes?
Many “prevention” techniques, like stage gates, tickets, reviews, multi-step operations, and “production change windows” were born in a different era, when a defect in production required mail-ordering of a million floppy disks. Today the ability to rollout quickly means the ability to roll-back quickly as well. It doesn’t have to be push-button, but it helps.
5. Can you monitor production continually, in a meaningful way?
If a defect occurs in production, but you find it and revert before any significant number of customers see it, did it really make a sound? That’s the point that Ed Keyes was trying to make with his Google Tech Talk, when he said that “sufficiently advanced monitoring is indistinguishable from testing.”
That’s exactly what the programmers do at Etsy. They watch the performance graphs, which allows them to take calculated risks, and helps them get more code in production in less time.
6. Can you trace errors to their root cause, correct them, and roll out changes within the hour when errors spike in your app?
This final piece ties everything together, compressing the find->fix->deploy loop into one hour. If you can’t do this, look to do the other pieces first, then figure out what your loop is and shrink it. The tighter the loop, the less impact a failure in production will have on the business.
These six steps, taken together, reduce the impact of failure, allow the team to move faster, and reduce risk. When I was at Zappos, they called the decision to pursue a strategy where technical staff can have what they need when they need it a “Strategic Investment in the Happiness of Our Employees.”
To many people, the end state of DevOps is clear, but how to get there is shrouded in mystery. Here’s a roadmap: Start by scoring your team on the Heusser test, then work on improving your score. Once you get to six, you won’t need a roadmap; you’ll be too busy inventing one.