(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan – you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)
In today’s blog posting I want to talk about a very specific performance problem related to indexing in SQL Server.
The Problem
Imagine the following simple query, which you have already seen hundreds of times in your SQL Server life:
-- Results in an Index Scan SELECT * FROM Sales.SalesOrderHeader WHERE YEAR(OrderDate) = 2005 AND MONTH(OrderDate) = 7 GO
With that simple query, we request sales information for a specific month in a specific year. Not very complicated. Unfortunately that query doesn’t perform very well – even with a Non-Clustered Index on the column OrderDate. When you look at the execution plan, you can see that the Query Optimizer has chosen the Non-Clustered Index that is defined on the column OrderDate, but unfortunately SQL Server performs a complete Scan of the index, instead of performing an efficient Seek operation.
This isn’t really a limitation of SQL Server, this is the way how relational databases are working and thinking . As soon as you are applying an expression (function calls, calculations) on an indexed column (a so-called Search Argument), the database engine HAS TO SCAN that index, instead of performing a seek operation.
The Solution
To get a scalable seek operation in the execution plan, you have to rewrite your query in a way to avoid the call of the function DATEPART:
-- Results in an Index Seek SELECT * FROM Sales.SalesOrderHeader WHERE OrderDate >= '20050701' AND OrderDate < '20050801' GO
As you can see from the rewritten query, the query returns the same result, but we have just eliminated the function call of DATEPART. When you look at the execution plan, you can see that SQL Server performs a seek operation – in that specific case it is a so-called Partial Range Scan: SQL Server seeks to the first value, and scan until he hits the last value of the requested range. If you have to call functions in the context of indexed columns, you should always make sure that these function calls are performed on the right hand side of your column in the query. Let’s have a look at a concrete example. The following query casts the indexed column CreditCardID to the data type CHAR(4):
-- Results in an Index Scan SELECT * FROM Sales.SalesOrderHeader WHERE CAST(CreditCardID AS CHAR(4)) = '1347' GO
When you have a more detailed look on the execution, you can see that SQL Server scans again the whole Non-Clustered Index. Not really scalable if your table gets larger and larger. If you are performing that conversation on the right hand side of your indexed column in the query, you can again eliminate the function call on the indexed column, and SQL Server is able to perform a seek operation:
-- Results in an Index Seek SELECT * FROM Sales.SalesOrderHeader WHERE CreditCardID = CAST('1347' AS INT) GO
Another nice example where you can run into the same problems is the use of the CASE expression in SQL Server. Let’s have a look at the following query, where the column PersonType is indexed through a Non-Clustered Index:
-- Results in an Index Scan SELECT CASE PersonType WHEN 'IN' THEN 1 WHEN 'EM' THEN 2 WHEN 'SP' THEN 3 WHEN 'SC' THEN 4 WHEN 'VC' THEN 5 WHEN 'GC' THEN 6 END AS ConvertedPersonType FROM Person.Person GO
You are calling here again a function on an indexed column, in our case the CASE expression. You can’t directly see that function call in the T-SQL query, but internally it’s nothing more than a function call. You are getting again a Scan of the Non-Clustered Index instead of a Seek operation. How can you avoid that problem? You can rewrite the query to get rid of the CASE expression. One example is a join against a Common Table Expression, which stores the needed lookup values. Let’s have a look on the rewritten query:
-- Results in an Index Seek WITH LookupCTE AS ( SELECT * FROM ( VALUES (N'IN', 1), (N'EM', 2), (N'SP', 3), (N'SC', 4), (N'VC', 5), (N'GC', 6) ) tbl (PersonType, Value) ) SELECT cte.Value FROM Person.Person p INNER JOIN LookupCTE cte ON cte.PersonType = p.PersonType GO
With that approach, SQL Server is able to perform a seek operation on the indexed column PersonType.
Summary
As you have seen in this blog posting, it is very important that you are NOT calling any function directly or indirectly on your indexed columns. Otherwise SQL Server has to scan your index, instead of performing an efficient seek operation. And scans will never ever scale, when your table gets more and more rows.
Please feel free to leave a comment, if you want to share other good examples where you have encountered this specific behavior.
Thanks for reading!
-Klaus