Monday, September 17, 2012

From old to new: dealing with legacy software

A tricky problem is to deal with software legacy.
I think I have never been anywhere where I did not need to deal with legacy some way or another. In fact my first task on my first job was to improve the performance of "old" design rule checker (DRC) by upgrading its geometric store with a KD-tree. That was during a summer job, and later as employee, I was asked to write a new DRC to replace the old one. (A DRC is used to check geometric rules in layouts of integrated circuits). The new DRC was better than the old one, and part of an automated layout tool for analog circuit, which... ultimatly failed. Commercially the project did break even, but other systems worked better. Looking back, with the hindsight of knowing which system succeeded, this CAD sytem failed to approach the problem the right way, which was incrementaly. For example, the requirements for the DRC were: do what the old software does but better and faster, yet they really should have been "make it incremental": a small change in the input should generate only moderate of work to revalidate the rules.

Legacy is about dealing with code that exists already and that causes you trouble. Typically the trouble is that the legacy software is equivalent to a massive traffic jam where it is no longer economic to work on it. Changes to the software are simply too expensive: they take too long and they are risky. Unlike new software, legacy software is already in use, and this simple fact is the root of all problems: it makes you stupid!  Take the example of that DRC project, my boss had said “do what the old software does but better and faster”,  I wrote a great boolean operations on large geometric figures package; It was fast, it was better than the previous package, but… it was not what was needed!  This is what I call  the “dumbifying” effect of legacy software, and the subject of this post.

When a magician performs a trick, you may know the magician is tricking you but mostly unable to do anything than to “fall for it” and think it looks real. Legacy software is a little bit like a nasty magician, it makes things look more real than they deserve. So for example, if your team tells you they can rewrite a legacy system, your tendency will be to believe them, just because the legacy system exists already and this somehow makes you more confident. If you are given analysis of the legacy system and it is incomplete, you will judge it less harshly than analysis for new yet to be written system. If your team tells you they can only test 10% of the features of your legacy system, you might say that is better than nothing, while a proposal to test “only” 10% coverage of a new system would be treated as a joke.   Legacy software is a little bit like a con artist! It puts down your guards and corrupts your sense of proportion. Worse than that, legacy software makes you feel bad when you think of some of its “legacy properties”, the result is like a Jedi mind shield, legacy software actually blocks you from thinking clearly by making you suffer when you look at it "truthfully "!

In a former job, as a development manager, I was continuously saying “no”, “no”, “no”… to the same question which was “can we rewrite it”. The thing is that rewriting legacy software when you have not yet accepted to look at it “truthfully” is often not going to work. So here are my recommendation on how approach you legacy software:
  • Never believe you can rewrite legacy software from scratch without keeping something. Maybe you are keeping a protocol, maybe you are keeping the business functionalities, maybe you are keeping the algorithm, but you need to keep something.
  • Keep what is most precious in your legacy software. Again, it might be the protocol, the business functionalities, .. but as you need to keep something (see above) you want to make sure that you keep what is most valuable. The more the better!
  • Make sure to deeply analyze what you have identified as valuable and that you want to keep in the legacy software. Analyze it to death, until it hurts!  That is because by default, you will shy away from doing this analysis, and that will cost you!  This is the really the number one rule with legacy software: do not let “it” convince you that you do not need to analyze what you will keep! (See dumbifying and con artist remark above).
  • Also, bring in new resources to provide fresh views on your legacy analysis. Again, by default the legacy will bias you, bring in someone that can look you problem and solution without prior experience with the legacy software.
Sometimes what you want to keep is deeply imbedded into your legacy software, for example as an implied data model or API. Do not hesitate to use “tricks” to extract it out at the lowest cost. For example you can extend your types to produce a trace info with each operation, to then analyze the trace files extract an implied structure.
Once I wanted to extract an implied model in C++, so I wrote a “hand crafted” C++ parser. It took about three days, I started with an equivalent yacc and lex and the first C++ file to parse, and proceeded to write the parser to parse exactly what I needed in the first file, extending the parser by running the parser over and over, line after line. When I got to the second file, I was already moving by a few lines at a time with each iteration, and by the end of the effort I had parsed the 50 or so files that had the data I needed.
Finally, do not forget the business: if you are in it to make a profit, never write code unless it is supporting your business strategy. This is probably the hardest part with legacy software: your business and marketing people usually have no idea how to approach your legacy rework. If you not careful your development will effectively hijack your legacy strategy and possibly lead you into business oblivion.


Anonymous said...

Interesting, how about how best to go about writing something greenfield from scratch? Best practices and the way forward?

equational said...

Dear Anonymous, as you may know there is an optimal balance in everything. For example, an optimal portfolio is built with a ratio of assets to achieve best return at minimum risk. When working on projects, green field included, the team's goal is to maintain themselves near that optimal point of having the right ratio of the different components that make up your "development machine" super efficient while keeping your risk as low as possible. The first ingredient to make this happen is to make sure you have a more "good" optimal behavior in your team than bad behavior. You cannot expect to have the dream team, but you do need to have enough of "sound investment" mentality so that the team stays focused and performs.
Best practice is very simple: take seasoned pros and give them fresh out of university resources. For the first two years, the newbies will behave well by following others, and as the others are experienced, it all works out. Then after two years you keep the ones that have learned good behavior and replace those that are becoming deviant.

Anonymous said...

hah, were it that easy to have a team of such composition I would have a lot less problems!

I'm thinking more along the lines of I have a team, they are fairly inexperienced as developers, they have some successful projects under their belts, but mostly small scale ones. If we were to embark on a new large scale project, the first of such size for most of them, what are the pitfalls we should be aware of? I know that code complexity is something that can come back to bite at later stages too when looking at bug fixes and things of this nature. It would be good to have the opinion of a seasoned pro :)

Harry Korine said...

This should be required reading for anyone dealing with legacy software, software specialists and laypersons alike.