The following queries look extremely innocent. However, they are all application killers once the target table grows to more than a couple thousand rows:
Select Count(*) From Person_Data Where Upper(Name_Last) = ‘SMITH’
Select Avg(Collar_Size) From Person_Data Where dbo.MyFunc(Person_Data_ID) > 500
For each of these statements, a table scan will be used to try and fetch data. A table scan is simply an operation where the database looks at each and every record in the table to see which ones meet its criteria in the Where clause. Since any unneeded I/O is wicked bad, forcing a table scan is wicked bad multiplied by the number of times it does it. A table scan, therefore, is equivalent to evil incarnate.
We would much rather the database use an index and only fetch the 20 records it needs rather than reading 200,000 records in a table scan fto locate the 20 records it needs. However, the use of the functions (Upper(), MyFunc(), MyOtherFunc(), etc) keep it from doing that.
Why do functions in the WHERE clause keep the database from using indexes? Simply put, it is because the database does not know the result of the function. To abuse my standard analogy, this would be like being the super-secret agent in the warehouse using a magic decoder digital watch. The agent would have to open each individual file, key in some information found in the file into the watch and see if the resulting answer meets the criteria.
Each time MyFunc() (a.k.a. your slick little digital watch) is used it could come back with a different result since the function could potentially be based on the input value, the time of day, or data that is being modified elsewhere in the database. The only way the database can know for certain the answer to the function call is to make it and see what the answer is. Since it doesn’t know ahead of time what the answer is, it cannot use any existing indexes (er, indices) to go to the corresponding entry.
This isn’t really a problem if the function call is in the Select, only when it’s in the WHERE or other key clauses (see below). The Select is safe since the function call is only made for records where the criteria is already met, hence it’s only called for the 20 matches and NOT for the 2.78 billion records that don’t match.
So, how do we get around using functions in a WHERE clause?
1) In the case of UPPER(), I suggest you don’t even bother. In SQL Server, by default, string handling is case-insensitive anyway. Thus, WHERE Name_Last = ‘Smith’ will match regardless of the casing in the database.
2) There are times where pre-calculating the result of the function call and placing it is a variable and the using the variable in the WHERE clause will save you time. This technique comes into play when you are using a function that SQL Server views as NONDETERMINISTIC, however, we being rather smarter know that it can be considered DETERMINISTIC (see below for a little discussion on Deterministic vs. Non-deterministic) A common example of this is a function call that accepts a Product Code and returns a Product ID. The same ID would always come back for a given code, but since there is a table call involved, SQL Server has to assume the function to be Non-deterministic. Pre-fetching this result into a variable solves the problem entirely.
3) If it’s a common request, or part of a “must have fast results” search screen, etc., then consider trading off some disk space to increase speed. You can store the raw data in the table and create a second column that contains the results of the function call. Simply index the column with the results of the function call. For an overly simplified example, if you have Price and Quantity in the table, and your function calculates Total Price, you could simply store the Total Proc in the record and index it. There are other techniques here, such as computed columns, indexed views, and so on, but you’ll definitely want to get a seasoned DBA in the loop before tackling such methods.
4) Tell the user to wait! If you just can’t get around using a function, then so be it. However, be courteous enough to let the user know that the search could take a while. This gives them the opportunity to flirt with co-workers, take a nap, or other productive uses of company time.
A couple of disclaimers:
1) There are some circumstances where using functions is OK. These have to do with whether or not a function is considered DETERMINISTIC. In other words, if it’s a guaranteed that a function will return the same answer given the same input, then the function is DETERMINITIC. In such a case, SQL Server can anticipate what the answer is without actually making the call, thus indexes can be used. For further discussion on DETERMINISTIC functions, see http://msdn.microsoft.com/en-us/library/aa214775(SQL.80).aspx.
2) So why have functions if they are so bad? Functions in a Select clause can be very useful! There is nothing evil about this! 3) It's not just the WHERE clause you have to watch out for. FROM (think Join clauses), GROUP BY, ORDER BY, and HAVING clauses are equally evil places to toss in a function. Buyer beware!
Want more? Please see my other articles in this Performance Tuning Series.