Gregor's Ramblings
HOME    PATTERNS    RAMBLINGS    ARTICLES    TALKS    DOWNLOAD    BOOKS    CONTACT

If you never kill anything, you will live among zombies. And they will eat your brain.

Sept 27, 2015

ABOUT ME
Gregor Hohpe
Hi, I am Gregor Hohpe, co-author of the book Enterprise Integration Patterns. I like to work on and write about asynchronous messaging systems, service-oriented architectures, and all sorts of enterprise computing and architecture topics. I am also the Chief Architect at Allianz SE, one of the largest insurance companies in the world.

Corporate IT lives among zombies: old systems that are half alive and have everyone in fear of going anywhere near them. They are also tough to kill completely. Worse yet, they eat IT staff's brains. It's like Shaun of the Dead minus the funny parts.

Legacy

Shaun of the deadLegacy systems are systems that are built on outdated technology, are often poorly documented, but (ostensibly) still perform important business functions -- in many cases, the exact scope of the function they perform is not completely known. Systems often fall into the state of legacy because technology moves faster than the business: life insurance systems often must maintain data and functionality for many decades, rendering much of the technology used to build the system obsolete. With a bit of luck, the systems don't have to be updated anymore, so IT may be inclined to "simply let it run", following the popular advice to "never touch a running system". However, changing regulations or security vulnerabilities in old versions of the application or the underlying software stack make this a pretty poor approach.

When confronted with the problem traditional IT often cites the need to support the business. They also often claim that digital companies don't have such problems because they have no legacy. Interestingly, when Mike Feathers came to Google to give a talk about Working Effectively with Legacy Code, almost 150 developers attended, making this a questionable assumption. Because Google's systems evolve rapidly they also accumulate legacy more quickly. So it's not that they have been blessed with not having legacy, but they simply found a way to deal with it.

Fear of Change

Systems become legacy zombies by not evolving with the technology. This happens in classic IT largely because change is seen as a risk. Once again: "never touch a running system". System releases are based on extensive test cycles that can last months, making updates or changes a costly endeavor. Worse yet, there is no "business case" for updating the system. This widespread logic is about as sound as considering changing the oil in your car a waste of money – after all the car still runs if you don't. And it even makes your quarterly P&L look a little better.

A team from Credit Suisse described how to counterbalance this trap in their aptly titled book Managed Evolution. The key driver for managed evolution is to maintain agility in a system. A system that no one wants to touch has no agility at all – it can't be changed. In a very static business and technology environment, this may not be all that terrible. Today's environment is everything but stable, though, turning the inability to change a system into a major liability for It and the business.

Fear of Code

The fear of change in corporate IT is closely related to the fear of code. Corporate IT is largely driven by operational considerations. Code is considered the stuff that contains all the bugs and performance problems, often written by expensive external consultants who are difficult to hold liable as they'll have long moved to another project by the time the problems surface. Hence corporate IT is a sucker for configuration over coding (see Configure this for my view on this), much to the delight of enterprise software vendors. The most grotesque example of fear of code I have seen has corporate IT providing application servers that carry no operational support once you put code on it. It's like voiding a car's warranty after you start the engine – after all the manufacturer has no idea what you will do to it!

There are two ironies in the fear of code: first, business value comes only from code, not from the network, hardware or operating system. Second, the fear of code leads enterprises to buy into large frameworks, which in turn make version upgrades tough and increase the chance of growing zombies. Anyone who has done an SAP upgrade can relate.

Lack of Feedback

Most things are the way they are for a reason. This is also true for the fear of change in corporate IT. These organizations typically lack the tools, processes, and skills to closely observe production metrics and to rapidly deploy fixes in case something goes awry. Hence they focus on trying to test for all scenarios before deploying and then running the application more or less "blind", hoping that nothing breaks. Jess Sussna describes the necessity to break out of this conundrum very nicely in his great book Designing Delivery.

Version Upgrades

The zombie problem is not limited to systems written in PL/1 running on an IBM/360, though. Often updating basic runtime infrastructures like application servers, JDK versions, browsers, or operating systems scare the living daylights out of IT, causing version updates to be deferred until the vendor ceases support. The natural reaction then is to pay the vendor to extend support because anything is less painful than having to migrate your software to a new version of the software stack.

Often the inability to migrate cascades across multiple layers of the software stack: one cannot upgrade to a newer JDK because it doesn't run on the current app server version, which can't be updated because the new version requires a new version of the operating system which lacks some library or feature the software depends on. I have seen IT shops that are stuck on Internet Explorer 6 because their software utilizes some proprietary feature not present in later versions. Looking at the user interfaces of most corporate applications one finds it difficult to imagine that they eked out every little bit of browser capability. They surely would have been better off not depending on such a peculiar feature and instead being able to benefit from browser evolution. Such a line of thought does require a "change is good" mindset to begin with, though.

Run vs. Change

The fear of change is even encoded in many IT organizations that separate "run" (operating) from "change" (development), making it clear that running software does not imply change. Rather, it is the opposite of change, which is done by application development – those guys who produce the flaky code IT is afraid of. Structuring IT teams this way will guarantee that systems will age and become legacy because no change could be applied to them.

One may think that by not changing running system, IT can keep the operational cost low. Ironically, the opposite is true: many IT departments spend more than half % of their IT budget on "run" and "maintenance", leaving only a fraction of the budget for "change" that can support the evolving demands of the business. That's because running and supporting legacy applications is expensive: operational processes are often manual; the software may not be stable, necessitating constant attention; the software may not scale well, requiring the procurement of expensive hardware; lack of documentation means time-consuming trial-and-error troubleshooting in case of problems. These reasons legacy systems tie up valuable IT resources and skills, effectively devouring the brains of IT that could be applied to more useful tasks, for example delivering features to the business.

Planned Obsolescence

When selecting a product or conducting an RfP , classic IT tends to compile a list containing dozens or hundreds of features or capabilities that a candidate product has to offer. Often, these lists are compiled by external consultants who don't even know the real business need or the company's IT strategy. However, they tend to come up with longer lists and longer appears better to the classic IT staff. To stick with the car analogy, this is a bit like evaluating a car by having an endless list of more or less (ir)relevant features like "must have a 12V lighter outlet", "speedometer goes above 200 km/h", "can turn the front wheels" and then scoring a BMW vs. a Mercedes for these. How likely this is to steer (pun intended) you towards the car you will enjoy the most is questionable.

Worse, one item routinely missing from such "features" lists is planned obsolescence: how easy is it to replace the system? Can the data be exported in a well defined format? Can business logic be extracted and re-used in a replacement system? How to avoid vendor lock-in? To the folks in the new product selection honeymoon this may seem like discussing a pre-nup before the wedding – who likes to think about it parting when you are just about to embark on a lifelong journey? In case of an IT system you better hope the journey isn't lifelong – systems are meant to come and go. So better have a pre-nup in place than being held hostage by the system (or vendor) you are trying to part with.

If it hurts, do it more often

How to break out of the "change is bad" cycle? As mentioned before, without proper instrumentation and automation, making changes is not only scary but indeed risky. The reluctance to upgrade or migrate software is similar to the reluctance to build and test software often. Martin Fowler issued the best advice to break this cycle: "if it hurts, do it more often". No, this is not meant to appeal to masochistic IT staff, but it highlights that deferring a painful task generally makes it disproportionally more painful: if you have not built your source code in months, it's guaranteed not to go smoothly. Likewise, if you are 3 versions of an application server behind, you'll have the migration from hell.

Performing such tasks more frequently provides a forcing function to automate some of the processes, e.g. with automated builds or test suites. Dealing with migration problems will also become routine. This is the reason emergency workers train regularly – otherwise they'll freak out in case of an actual emergency and won't be effective. Of course, training takes time and energy. But what's the alternative?

Culture of Change

Digital companies also have to deal with change and obsolescence. The going joke at Google was that every API had two versions: the obsolete one and the not-yet-quite-ready one. Actually, it wasn't a joke, but pretty close to reality. Dealing with this was often painful – every piece of code you wrote could break at any time because of changes in the dependencies. But living this culture of change allows Google to keep the pace up – the most important of today's IT capabilities that sadly is rarely listed as a KPI for classic IT. Even Shaun knows that zombies can't run fast.

Brave enough to kill some zombies?

Our team is still growing! I am looking for one more senior Cloud Infrastructure Architect to join my team at Allianz in Munich.

Share:            

Follow:       Subscribe  SUBSCRIBE TO FEED

More On:  ARCHITECTURE     ALL RAMBLINGS   


Gregor is the Chief IT Architect of Allianz SE. He is a frequent speaker on asynchronous messaging and service-oriented architectures and co-authored Enterprise Integration Patterns (Addison-Wesley). His mission is to make integration and distributed system development easier by harvesting common patterns and best practices from many different technologies.
www.eaipatterns.com