This editorial was originally published on Apr 8, 2009. It is being re-run as Steve is on vacation.
Often, the mistakes that are made in SQL code seem to be so perverse that one is left scratching ones’ head and wondering whether there is a basic incorrect assumption underlying all the errors.
I think one of the most insidious assumptions is that each SQL Statement takes the same length of time to execute. It is crazy, I know, but we are talking about unconscious, impulsive judgments here. When I’m taking to developers who are gridlocked with a complex routine, I always advise them to break the problem down into a number of simple testable steps; using temporary tables for intermediate results. They hate it. Often, the problems have happened because they’ve gone headlong into an attempt to get a complex result in one SQL Statement. ‘Why create so many statements when one will do? , they often say.
When I had less self-confidence as a SQL Programmer, I’d break problems down into simple steps, test each step for sanity, logic and performance, and then re-write them in as few statements as possible, using derived tables and subqueries. Nowadays I don’t always bother with the last step, because the contribution that this step makes to the performance of the entire process is usually negligible. It just looks better. I have my pride; well I used to. The advantage of a series of simple steps is that it is easier to test, to maintain, and to understand. The disadvantage is that many developers will whistle through their teeth and say ‘Cor, I bet that runs slow!’, and try to rewrite it in one statement.
We all approach our development work armed with poorly tested assumptions, and a lot of the excitement of the job is in testing them and finding them wrong. However, if you keep them for too long, you’ll soon find that they get in the way of the quality of your output.
So what other assumptions do we fall over? Bearing in mind General Sedgewick's fatal assumption, "Why, my man, I am ashamed of you, dodging that way, They couldn't hit an elephant at this distance.", here are a few assumptions that can cause a lot of subsequent problems
- 'My routine will scale in a linear fashion'
- 'If I test this in a single-user database, it is likely to work in a multi-user one.'
- 'It is always quicker and easier to do stuff in C#'
- 'I can always shave time by denormalising'
- 'Joins cause performance problems'
- 'get something up and running. We can always tidy it up later'
Come across any other fatal assumptions recently in the office? It would be great to hear them.