I got a massive satisfaction last week when I was able to bring down the execution time of a sql stored procedure from > 3 hours to mere 2 minutes. It all boiled down to a missing joining key that led to a cartesian product within the tables.
Here is how the basic structure of the query (there were plenty of joining columns, however for our convenience lets consider 2 columns only) looked like
Select * from
table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
left join table3 t3 on t1.k1 = t3.k1 and t1.k2 = t2.k2
left join table4 t4 on t1.k1 = t4.k1
And this query ran forever when given a particular set of parameters. I started with the approach of joining the parent table with each subsequent table until I met the road block. For example:-
Step1 Select * from
table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
was returning rows
Step 2
Select * from
table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
left join table3 t3 on t1.k1 = t3.k1 and t1.k2 = t2.k2
was returning rows
Step 3
Select * from
table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
left join table3 t3 on t1.k1 = t3.k1 and t1.k2 = t2.k2
left join table4 t4 on t1.k1 = t4.k1
ran forever
As you can clearly notice, this query was missing a joining condition with column k2. On including the joining condition for column k2, query executed 1000 times quickly.
In the end, it turned out to be a bug or a blunder. But the takeaway is that whenever you have a query involving joins of multiple tables, always make sure that it does not lead to a cartesian product.
Happy coding!