T-SQL Requires the Right Approach Some of the most problematic T-SQL code I’ve seen over the years originated because the developer approached T-SQL with techniques more suitable for developing applications. I often saw these issues in my consulting days and always took the opportunity to teach the customer instead of only fixing the problems. One reason is that SQL is a declarative language; you say what you want, and it’s up to the database engine to figure out how to do it. When writing code in an imperative or procedural language such as C#, you give instructions to the computer, and it follows those instructions, while databases benefit from a “set-based” approach to writing code. I ran into a good example of this early in my career. There was a process to update employee information in a database with a table from the HR system each night to keep changes such as title, department, name, etc., up to date. It was a long time ago, and hardware was much slower then, but it took about 10 minutes to keep 2000 employee records up to date because it updated the target table with a nested loop approach. Every single row was compared column by column. If there were changes to two columns (i.e., department and title), the row would be updated twice -- once for each column. From a programming perspective, updating in a piecemeal fashion made sense – break the problem down into small pieces and only update what needs to be updated. Intuitively it seems like the best way to approach the problem, however, it’s not the best way to work with SQL Server or any database platform. The solution was to update all the columns (except for the primary key) in those 2000 rows in one statement, which took a few seconds. Today, a MERGE statement might be the solution but wasn’t available in SQL Server 6.5. Another problem I’ve seen frequently is the overuse of user-defined functions (UDFs), especially nesting scalar UDFs. When writing software, it makes sense to use functions for code reuse and to hide complexity, but it’s often a bad idea with SQL Server. For example, a customer had a UDF with just one parameter. The parameter’s value was determined early in the script, something like the current date. The UDF called another UDF, which called another UDF, and there were a couple of queries in the UDFs as well. The top UDF was used in the SELECT list of a query returning millions of rows, so it was called millions of times. Since the parameter value was not dependent on the query, just running the UDF once and saving the returned value in a variable used in the query instead of the function call made a huge difference. After moving the UDF out of the SELECT list, we “unwound” the UDF into a few statements at the top of the script. The resulting script was not as pretty, but it performed so much better. Many other anti-patterns cause poor query performance, but these two examples happened because T-SQL was treated like application code. Kathi Kellenberger Join the debate, and respond to the editorial on the forums |