I was looking through an SQL script written by an ex-colleague recently. This script collects and cleans-up data from a third party system. The last 912 lines are devoted to setting contact details blank if it is not a real person’s name. It looks something like this:
update sira_party_names set forename = '', surname = '' where upper(forename) = 'DAMAGED' and upper(surname) = 'RAILINGS'; update sira_party_names set forename = '', surname = '' where upper(forename) = '' and upper(surname) = 'NEIGHBOUR'; update sira_party_names set forename = '', surname = '' where upper(forename) = '' and upper(surname) like '%CLAIMANT%'; update -- … many, many hundreds more, with a small number of variants …
How ugly! Every time there is an additional case found, the code has to be changed, tested and released through all the environments. It is also always a huge rush to get the changes completed by a deadline. This code is a pain to maintain, and slow to run - clearly a candidate for making the whole process data driven.
After a few minutes of thought, I created a control table with (non-null) columns for forename_fix and surname_fix to contain the fix-up control data:
forename_fix | surname_fix |
---|---|
DAMAGED | RAILING |
NEIGHBOUR | |
%CLAIMANT% |
Then one line of code performs all the work. My code now looks like this:
update p set forename = '', surname = '' from sira_party_names p inner join party_fix f on p.forename like f.forename_fix and p.surname like f.forename_fix;
The upper() function has gone as well. Presumably the code or the author started on a case-sensitive platform, but that's not the case now. I then added a fix_type field to my control table to designate the different fix types and set up corresponding queries. After a bit of debugging I was producing the same results of the 912 lines previously – in 5 SQL statements. It ran much quicker as well. This is rather more like it!
But, hold on. It turns out that some data is being changed that shouldn’t be (in both old and new variants of the code). So, rather than just directly updating the fields, I introduced the columns zap (in reality a couple of variants of zap) and zapped_by. Now my code reads:
update p set zap = 1, zapped_by = f.fix_id from sira_party_names p inner join party_fix f on p.forename like f.forename_fix and p.surname like f.forename_fix;
Now I can see which entries are to be zapped (changed), and which fix entry did the zapping. With this information, It’s very easy to sort out what should be done. Updating nearly a thousand lines of inconsistent code would have been impractical.
Of course all the 'fix rules’ still exist, but now they are in data. The maintenance of the fix table is passed on to others outside IT. It’s very quick to alter, and it is a nasty little job that is now out of my hair. Another success for simple, elegant data-driven code.
The moral: Keep an eye out for repetitive code with lots of literal values. This may be a candidate for being made into data-driven code, and reaping unexpected benefits with little effort.