In this article, I discuss some possibilities for circumventing the
performance hit on dynamic queries. This problem crops up endlessly when
trying to build stored procedures that take multiple parameters in the where
clause. There are actually two parts to this problem: first, how do we
know which parameters to include in the where clause and second, how do we get
sql to cache a query plan for the stored procedure. Lets explore the first
problem with a simple example: Lets say I want to build an application
that will allow the user to search for an account based on some criteria.
Lets say our criteria will be Account Name, Contact first name, Contact last
name, city, state, and phone number. To do this we need to build a stored
procedure that takes six input parameters: @Aname, @Cfirst, @Clast, @city,
@state, @phone. The select statement we want will look something like
this: SELECT * from account A inner Join Contact C on a.primarykey=c.accountkey
inner join Address D on A.primarykey=D.foreignkey WHERE ...... The problem
is how do we build the WHERE clause since it is dynamic. In other words we
do not know which of the input parameters were actually passed values that need
to be in the where clause. Fortunately the rules are pretty simple, if a
parameter has a value then it must be contained in the where clause and if a
parameter does not have a value then it can not appear in the where clause.
So, how do we solve this problem? This article will present three different solutions:
Solution 1: Use the CASE or IF statement to test each
parameter and run an appropriate select statement
Example:
IF @Aname<>'' and @Cfirst='' and @Clast='' and @city='' and @state='' and
@phone=''
SELECT * from account A inner Join Contact C on a.primarykey=c.accountkey
inner join Address D on A.primarykey=D.foreignkey WHERE A.name=@Aname
ELSE
IF @Aname<>'' and @Cfirst<>'' and @Clast='' and @city='' and @state='' and
@phone=''
SELECT * from account A inner Join Contact C on a.primarykey=c.accountkey
inner join Address D on A.primarykey=D.foreignkey WHERE A.name=@Aname and
C.lastName=@Clast
ELSE........................
Advantages: Probably the most straight forward solution
Best SQL performance (low reads, fast sql execution)
Disadvantages: Complex code as the number of cases rises
exponentially with the number of parameters
CPU time increases based on the number of CASE or IF comparisons
Solution 2: Write your sql so that you can always include
every parameter in the where clause. How is this possible???
Example:
SELECT * from account A inner Join Contact C on a.primarykey=c.accountkey
inner join Address D on A.primarykey=D.foreignkey WHERE (A.name like @Aname or
A.name is NULL) and (C.firstName like @Cfirst or C.firstName is NULL) and (C.lastName
like @Clast or C.lastName is NULL) and (D.city like @city or D.city is NULL) and
(D.state like @state or D.state is NULL) and (D.phone like @phone or D.phone is
NULL)
Advantages: Easy code to write
Fast CPU time (code executes quickly)
Disadvantages: Need to decide what you are going to do with
Nulls
Slower SQL performance
Lets take a look at how this works. First
we have to make sure that our input parameters to the stored procedure are
correct. For this scheme to work every parameter will get represented in a
LIKE clause, thus the parameter must contain a % at the end. If parameters
are blank then they get converted to %. So what happens when we do
something like WHERE d.city LIKE '%' ? Well if you turn on show execution
plan in query analyzer you can see that it is represented as a SEEK(d.city Is
Not Null). In other words, all records will be returned unless they are
NULL. This is why we need to decide what to do with the Nulls. When
I created this solution, I decided that if the value was NULL then it is
conceivable that it could match, thus I added the criteria (or d.City is NULL)
to my where. I end up with the criteria of: WHERE (dcity LIKE '%' or
d.city Is NULL). This will return "all records where city is Not Null"
or "all records where city is NULL". In other words, it will return
all records (it is as if the criteria did not exist). Its magic, with this
technique we are able to always add the parameter to the select statement
without it actually limiting our return results (it is as if the criteria did
not exist).
Recap of the city example:
WHERE (d.city LIKE '%' or d.city Is NULL)
Returns all records (as if the criteria does not exits).
WHERE (d.city LIKE 'Orlando%' or d.city Is
NULL) Returns all records where city starts with Orlando or
city is Null
As I stated earlier, my implementation of this
solution returns records where the city is NULL. My reasoning for this was
that a NULL value by definition means that I do not have the information
available to me. If I do not know for sure that the city is Not Orlando
then it is very well possible that it could be Orlando. What would I do if
I wanted to get rid of the records where city is NULL? Well, your first
guess may be to remove the (or d.city Is NULL) clause, which would work fine in
cases where the city parameter has a value, but then what would happen if the
city parameter was blank? It would get converted to '%' and the clause
would be (d.city LIKE '%') which would limit my results to all records where the
city is not null. Take for instance the case where I want to return all
records where state is CA. My where clause would end up looking like this:
WHERE (A.name like '%') and (C.firstName like
'%') and (C.lastName
like '%') and (D.city like '%') and
(D.state like 'CA%') and (D.phone like '%')
My results would not contain any records where
a.name, c.firstName, c.lastname, d.city or d.phone were NULL even if the state
was CA.
The challenge is how could you
modify my solution so that all records that had a value of 'CA' for state will
be returned regardless of whether or not the values in the other fields are
Null? The solution must be generic so that it can work with any parameter
and any value passed in. Remember that the solution must also work for
cases where the parameter is blank '%'. When blank we do not want to
restrict any records returned, it should be as if the criteria was not there!
The only other problem I can find with this
solution is that you can not enforce operators (=, <>, > , <, IN) other than
Like as it stands. This may or may not be an issue depending on the
application. Another challenge would be to come up with a solution to
this!
Solution 3: Build a Cache table for every combination of
parameters that occur
Select @storedProc=storedProc from ProcCache where pAname=@Aname and pCfirst=@Cfirst
and pClast=@Clast and pCity=@city and pState=@state and pPhone=@phone
IF @@rowcount=1
exec (@storedProc)
ELSE
exec sp_CreateNewProc
Advantages: Don't waste code trying to handle situations
that never occur
Good performance (fast sql execution)
DisAdvantages: Need to do a lookup in the cache table every
time
Need to build a stored procedure on the fly the first time a combination occurs
This is an interesting solution to the problem
which I have not explored yet. Basically the way it would work would be
like this; The stored procedure is called with a group of parameters
passed in. The first thing I would do is a lookup in my ProcCache Table.
The ProcCache table would have a column for each parameter. A value of 1
in the column would indicate that the parameter is included in the where clause
and a value of 0 in the column would indicate that the parameter is not included
in the where clause. I would build a SELECT statement based on the
parameters passed into the stored procedure. For example assume I have a
stored procedure that accepted two input parameters @account, @name: My
SELECT would be created like this:
SELECT storedProcToRun fromProcCache where Account=CASE @account WHEN '' THEN
0 ELSE 1 END AND Name= case @name WHEN '' THEN 0 ELSE 1 END
The select statement would return the name of the stored procedure that was
created to handle this particular query. If there was no entry for the
query then I would insert a row into ProcCache and create a new stored procedure
on the fly. The next time the same query is run it will be in the
ProcCache. This solution means that I only have to create stored
procedures for the queries that are run (I do not have to figure out all
possible combinations). The stored procedure will be created automatically
since all of the information necessary to create it is known. As a new
query is attempted my collection of stored procedures grows automatically
without user intervention.
My thoughts are that this solution could offer
a good balance of performance and flexibility. I will always take a hit
doing the cache lookup but it should be very quick, and I will only take a hit
creating the stored procedure the first time. One thing I am considering
with this solution is to pass in both the parameter and the comparison operator.
That way I could do any kind of comparison (=, <>, >, <, Like, IN) I wanted and
use the one that makes the most sense in some situations. I
expect to play around with this idea shortly and write a follow up article on my
experiences.
SUMMARY
In this article, I have presented three
solutions to over come the problem of dynamic queries in stored procedures.
To be honest with you, I would not get too hung up on trying to avoid using
dynamic queries. Yes, sql server can not cache the query plan and you will
suffer some performance loss, but this is not any more performance loss than
passing the sql directly to sql server from an application. My opinion is
that stored procedures should not just be thought of as a method of increasing
sql performance, but also as a way to package your solutions and maybe even most
importantly a way to provide maximum security. By packaging your
solutions, I mean that I think of stored procedures as analogous to sub routines
in programming. My current strategy when programming is to do everything
through a stored procedure. Calling the stored procedures are just like
calling other sub routines in my program except that the stored procedures deal
directly with the database servers. You may have many situations where you
need to do something like return a list of valid users. Why not make a
stored procedure to do this and call it over and over. Making the stored
procedure generic enough to be used in any situation may require dynamic sql,
but I would not let that stop you. One of the greatest benefits of always
using a stored procedure for database access is that you can remove the users
permissions to the tables completely and place all of the permissions on the
stored procedures, thus limiting the users ability to do only those functions
that have been coded into stored procedures. Keep in mind when you are
creating dynamic queries in a stored procedure that there may be a higher
performance way to achieve the results especially if you are dealing with a
small number of input parameters. Those of you who are techno geeks, try
to solve the problem exposed in solution 2 and maybe some of you even have some
better solutions than I have presented here. Lets hear'em!