In case you missed the first post on this topic, you can read it here: SQL Server: SARGability. This post continues the series and today we’re going to discuss the dangers of functions.
Recently, I’ve seen several customers that have been placing functions around columns in the WHERE clause. This can significantly reduce performance of the query depending on what the function is.
Functions and SARGability
There are deterministic and nondeterministic functions. I’m not going to go into the detail of the difference but it’s important to understand that functions in general are not a good thing for performance if they are on a column. Let’s take a look at some examples:
Consider a program that passes the ProductNumber without a hyphen (BL2036) and the database stores a hyphen (BL-2036).
You’ve got three options:
1) Change the application to pass the data as it is in the database (preferred) Feel free to also give a stern grump cat look at the developer.
2) Add a hyphen to the product number.
3) Remove the hyphen from the product number. (not preferred)
Here are options 2 (Query 2) and 3 (Query 1):
SELECT Name, ProductNumber
FROM Production.Product
WHERE REPLACE(ProductNumber, ‘-‘, ”) = ‘BL2036’SELECT Name, ProductNumber
FROM Production.Product
WHERE ProductNumber = STUFF(‘BL2036’, 3, 0, ‘-‘)
Result:
Note that Query 1 uses a function around the column and this causes a scan which increases IO and CPU utilization. This is because every value in the column for the whole table must have the hyphen removed to see if it is a match.
In Query 2, we see a seek. This is because the value is modified instead of the column.
Here’s another example and one I’ve seen a few times in the past few months. The data in the table is stored as “BL-2036”.
SELECT Name, ProductNumber
FROM Production.Product
WHERE LOWER(ProductNumber) = ‘bl-2036’SELECT Name, ProductNumber
FROM Production.Product
WHERE ProductNumber = ‘bl-2036’
It’s a misconception that nothing will match unless it also matches the case.
Both of these queries actually return the same result (1 row).
Results:
Note the Scan and the Seek? Same goes here as above.
In some scenarios you may need to match on case. You’ll want to verify your database collation, which may be CS (Case Sensitive) or CI (Case Insensitive).
If your database is CS then you should consider using an UPPER or LOWER on the parameter value instead of the column.
Finally, we look at another scenario I’ve seen repeat over the years.
SELECT Name, ProductNumber
FROM Production.Product
WHERE RTRIM(ProductNumber) = ‘bl-2036’SELECT Name, ProductNumber
FROM Production.Product
WHERE ProductNumber = ‘bl-2036’
Results:
These also return the same row. It’s worth noting that the data doesn’t have a white space at the end so this test may be a bit of a bust. Let’s look at this one another way:
IF(‘ABC ‘ = ‘ABC‘)
SELECT 1
ELSESELECT 2
The result is 1. The values are equal and this should be true for any collation.
Summary
We looked at how functions can hurt performance in three different examples. Each of the examples are shared one common problem: the function modifies the column.
In short, don’t do that. Think of a better way that can lead you to a seek because table/index scans are very expensive and ultimately hurt performance.