Australia outages illustrate how fragile tech can be
While June 4th wasn’t a great night in Sydney to order a pizza online, buying or selling a car was fine. Those that weathered the storm would claim they had a balanced hybrid cloud plan.
Contrary to popular belief it does rain down under, often in a torrential fashion – such as the remarkable June 2016 rains that drenched the Eastern states of Australia. Unfortunately, as the clouds opened up, our own tech versions of condensed virtual goodness didn’t handle it too well. Even the mega cloud providers weren’t immune to dear old Mother Nature.
Some businesses fared better than others. While June 4th wasn’t a great night in Sydney to order a pizza online, buying or selling a car was fine. Those that weathered the storm would claim they had a balanced hybrid cloud plan, although there was probably a drop of luck involved too.
But with business success increasingly determined by digital dexterity, relying on dumb luck is no option. While cloud enables customer engagement at scale, that raw power can equally savage an overcommitted and unprepared business. Plus, Murphy’s Law suggests things will go wrong at the most inopportune time. Like 7 p.m. on a wet Sydney evening when folks are frantically hitting the mobile app to order a pizza.
While outages in Australia illustrate how fragile tech can be, this shouldn’t dampen our resolve to harness the cloud. Success, however, is increasingly dependent on highly-collaborative DevOps teams architecting the cloud for pace and scale, but – and here’s the rub – within acceptable risk tolerance levels. As the saying goes, “with great power comes great responsibility.” In the case of cloud computing, that power comes with many other ‘ity’ words, including:
Viability – architects should play an active role in determining whether cloud is actually right for the organization and the consequences of failure. Netflix transformed its operating model with cloud, but at the end of the day an outage might only temporarily disrupt video streaming (not forgetting Netflix are darned good at handling cloud failures). But if you deliver healthcare apps and monitoring devices, cloud outages may carry more ‘life and death” significance.
Additionally, success won’t come by using cloud to house old problems. Containerizing a legacy monolithic application and offloading it to the public cloud might be technically achievable, but what are the business benefits? In this case, are we just using it to cheat the business by deferring service improvements?
Testability – teams should recognize that to support increased change rates, cloud applications must be tested early, thoroughly and continuously. Mistakes happen when teams have long wait times to access test data or dependent systems aren’t available. Naturally, teams circumvent these problem by using cloud services for testing at scale (an obvious and common use case). However, this can lead to security and compliance breaches (again at scale) if the personally identifiable information on thousands of customers isn’t automatically masked during test data migration.
Interoperability – If an organization has private and public clouds but can’t efficiently move workloads between them, they don’t have a hybrid cloud, they just have more silos to manage. But achieving hybrid nirvana is difficult when proprietary APIs stifle integration. True, this might be solved with generic cross-platform variants, but there’ll be other challenges – like monitoring throughput against provider-set API rate limits, or worse, handling business losses due to poor API performance under load. These are all reasons why strong API governance, management and testing is so critical for cloud – especially in hybrid environments.
Variability – the fail-one, fail-all design of monolithic on-premise applications means everything that can go wrong must be monitored, putting an incredible strain on operations teams. Modern, loosely-coupled microservices allow frequent software releases without disrupting overall system availability, but they also introduce other thorny monitoring issues. Now teams have to ensure performance within and across thousands of processes. Throw in the unpredictable nature of cloud-app demand and strong consideration should be be given to using analytics over manually configuring rule-based monitoring systems.
Recoverability – if there’s one cloud guarantee its failure. That’s not such a bad thing if architects, developers and sysadmins design for it – leveraging good monitoring solutions together with the cloud’s inherent flexibility to better predict anomalies and initiate recovery. To this end, organizations should never rely on cloud provider capabilities alone. Taking advantage of high-availability architectures that fail over to other clouds is great in theory, but difficult when there are data sovereignty issues and the provider has all of its proverbial eggs in one disaster-affected basket.
Of course there’ll be many other considerations, not the least of which is API security – making sure there’s strong encryption and authentication that’s workable across a variety of devices. Cloud management will also blend availability and secureability, by monitoring APIs for performance and detecting threats from unusual traffic patterns.
Finally when moving to the cloud everything we’ve learned about availability, business continuity and risk tolerance doesn’t move with it (along with the blame when things go wrong). That’s why strong DevOps teams confront difficult cloud issues themselves – jointly sharing success and learning from inevitable failures.
And yes, that’s another important ‘ity’ word – accountability.