2008-05-26

Fighting with duplication - attack of the clones

Last week I proceeded with clean up tasks on one project I'm working now. It's big number of jsp generated reports. Some time ago I setup initial environment for such reports, so even analytics without special JSP or JAVA knowledge could do some reports. It was temporary solution, before jumping to more enterprise tool. Results were not so bad - new reports were added very quick. Everything worked and was usable so enterprise tool is just forgotten mention.

Over two years it overgrowth overall code base and many problem issues and problems were related with those reports. In harder cases I had to dig into that and what I have seen is a massive duplication. I any possible form. So when we found bug in one section of code there was huge probability that the same problem was duplicated more times elsewhere. It was even worse. Instead of new features added to base report there where created new versions of the same report with new parameters or features. And divergence between those reports arises over time. So you can imagine how much time maintenance started cost at some point.

I like to stick a specially with one coding rule: do not create new and remove existing duplication.

Duplication starts when some functional block of code appears at least in two places in code. Sometimes it is just simple one liner - but even then you should consider pros and cons for wrapping that piece of code in some procedural statement.

Let's back to vicious mechanism of avalanche of duplicated code in non programmers environment - when people don't know good "coding" practices. One of reports (let's call it A) was copied with some functional difference (as B). We have got almost two identical files. Then comes another feature that seems to interfering with previous one. Because new feature ticket was assigned only to one of those reports (after some time everybody sees two reports and thinks - those are two separate reports) and new feature seems so "new" and practice before was to create separate functional report (for "clear solution") so there comes new report C derived from A. After some time somebody realized that there is no such feature for B report. Continuing previous process we have now 4 reports with D version. Then comes major change - added new module that shows new values in similar manner so base reports A,B,C,D are copied as 2A,2B,2C,2D and then changed some values and layout. Almost pure copy'n'paste coding.

Having one simple report at the beginning that would have parametrized additional features it's now eight versions of duplicated code with less than 10% differences. Cost of removing bug or implementing new feature is to about eight times bigger than for single but more complicated file. Not including effects of further "extension" mechanism - avalanche just gains new mass in exponential rate.

You could think what kind of procedures you have there to allow that kind of practices. Ok - team of report analytics was told to eliminate visible duplications, but it wasn't enough. When somebody concentrates on complex analytical problem doesn't think much about removing duplication. Even when report is complex and amount of code and sql is enormous. It's just additional burden that seems not helpful for analytical problem solving.

What could help? Some quick course of coding practices and techniques in context of that environment - basics how to write reusable pieces of code, avoid common pitfalls and reduce codebase to ease development and maintenance. And more practical explanatory examples. Sometimes duplication problem is visible but there is no simple solution - then pairing with more experienced programmer would help. It's just organizational issue.

So now the team is fighting with reports bugs and feature requests. And the biggest impact we are gaining now by not touching those new features or bugs but by merging and removing duplicates. It's ironic but number of bugs is decreasing now in reverse exponential rate.

No comments: