July 17, 2012 at 4:20 pm
The execution plans for those are substantially the same, with the exception that the "filter" step runs faster in my version (but the index scan runs faster in yours). I don't know what that means really, but I've always preferred that method over the CASE expression. When you have multiple search terms, it can get ugly with either method, and at that point I often turn to dynamic statements...
Where (@name is null or name = @name)
AND (@city is null or city = @city)
AND (@gender is null or gender = @gender)
That's about my limit for this pattern, three things, then I start building strings.
Also I disagree... I have been programming for 30 years so what I've seen more often does matter. I've seen both methods used quite a bit though, and I wasn't saying the second method was better, only that it was more common. I think that it would impossible to prove the general case of one method being better from an efficiency standpoint, and the other important characteristics (readability, correctness, etc) are a matter of personal taste.
My original reason for answering this was not to get into an argument over performance, merely to show the beginner programmer that there is an alternative, so they have to think instead of just copying the code.
July 17, 2012 at 4:57 pm
Jasmine D. Adamson (7/17/2012)
The execution plans for those are substantially the same, with the exception that the "filter" step runs faster in my version (but the index scan runs faster in yours). I don't know what that means really, but I've always preferred that method over the CASE expression. When you have multiple search terms, it can get ugly with either method, and at that point I often turn to dynamic statements...Where (@name is null or name = @name)
AND (@city is null or city = @city)
AND (@gender is null or gender = @gender)
That's about my limit for this pattern, three things, then I start building strings.
Also I disagree... I have been programming for 30 years so what I've seen more often does matter. I've seen both methods used quite a bit though, and I wasn't saying the second method was better, only that it was more common. I think that it would impossible to prove the general case of one method being better from an efficiency standpoint, and the other important characteristics (readability, correctness, etc) are a matter of personal taste.
My original reason for answering this was not to get into an argument over performance, merely to show the beginner programmer that there is an alternative, so they have to think instead of just copying the code.
No arguments, just a spirited discussion 🙂 Curious though... How do you know that the index scan runs faster in mine? I did not post any execution plan or statistics. What I found in the execution plan is the estimated amount of rows is significantly higher in the execution plan for the OR versus the CASE. I'm not sure exactly what that means, but I will look it up or someone more experienced with execution plans can answer. My main goal was to make the CASE statement work as expected with the first assumption that the OP was more interested in how to properly use a CASE statement in a where clause versus other means. To your point about what is more common... I don't know how long you have been programming in SQL versus other languages, but in my experience (not 30 years, but about 5) I have learned that there are more poor SQL programmers than there are good. That could be my minimal experience or just a difference in experience. I certainly don't take more or less common as a factor in what I should use versus testing on my hardware and my data. Not that you do, I just wanted to share that with others who read this. I think we agree that every situation is different including data, hardware, and business rules. So it is always best to test in your environment with your data.
Jared
CE - Microsoft
July 17, 2012 at 5:31 pm
http://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/
http://sqlinthewild.co.za/index.php/2009/09/15/multiple-execution-paths/
And to toss fuel on the fire, several of the query patterns mentioned in this thread would cause all sorts of hell when the plans are reused with different parameter values (cached with '', called with a real value or vis versa)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 18, 2012 at 12:42 am
select * from dbo.table where empname=case when @strempname='' then empname else @strempname end
July 18, 2012 at 12:46 am
select * from dbo.tablename where empname=case when @strempname='' then empname else @strempname end
July 18, 2012 at 9:33 am
SQLKnowItAll (7/17/2012)No arguments, just a spirited discussion 🙂 Curious though... How do you know that the index scan runs faster in mine? I did not post any execution plan or statistics.
I ran the script you gave on my server. My database isn't large, but it has 50 tables or so.
To your point about what is more common... I don't know how long you have been programming in SQL versus other languages, but in my experience (not 30 years, but about 5) I have learned that there are more poor SQL programmers than there are good. That could be my minimal experience or just a difference in experience.
No that is a cold hard fact 🙂
Most of the people writing SQL have no formal training in it. They are typically coming from procedural languages or object-oriented languages, so they tend to write procedural or object-oriented SQL, which works but it doesn't work well - you get a lot of RBAR from those guys. IMO, it is a management problem - the managers don't see the database as important software, they think of it like a disk drive - it's just a place the data goes to be stored. Management doesn't understand that the database is actually the core of your application, the foundation of it, and its code should be written accordingly. Most places need to spend a whole lot more money on database stuff. I'm sitting at a large aerospace company right now, and they have been in business for over 100 years, and I'm the first DBA. And you wonder why space hammers cost $1000 🙂
I certainly don't take more or less common as a factor in what I should use versus testing on my hardware and my data. Not that you do, I just wanted to share that with others who read this. I think we agree that every situation is different including data, hardware, and business rules. So it is always best to test in your environment with your data.
Yeah definitely. I'm not even sure the OP will come back and read this, but yes they should try both methods and see which is better with their setup.
July 18, 2012 at 9:37 am
GilaMonster (7/17/2012)
http://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/http://sqlinthewild.co.za/index.php/2009/09/15/multiple-execution-paths/
And to toss fuel on the fire, several of the query patterns mentioned in this thread would cause all sorts of hell when the plans are reused with different parameter values (cached with '', called with a real value or vis versa)
The interesting thing in that article is that for some reason, they have written the query backwards. Possibly to make their point true.
It should be "Where @variable is null OR column = @variable" so we can take advantage of short-circuits. I think the point they are making would still be true, but it would be interesting to see if the query plans came out the same with the query written correctly. I am having trouble getting AdventureWorks up and running at this location, so I can't actually check. I could maybe put it on SQL Express.
July 18, 2012 at 9:43 am
Jasmine D. Adamson (7/18/2012)
GilaMonster (7/17/2012)
http://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/http://sqlinthewild.co.za/index.php/2009/09/15/multiple-execution-paths/
And to toss fuel on the fire, several of the query patterns mentioned in this thread would cause all sorts of hell when the plans are reused with different parameter values (cached with '', called with a real value or vis versa)
The interesting thing in that article is that for some reason, they have written the query backwards. Possibly to make their point true.
It should be "Where @variable is null OR column = @variable" so we can take advantage of short-circuits. I think the point they are making would still be true, but it would be interesting to see if the query plans came out the same with the query written correctly. I am having trouble getting AdventureWorks up and running at this location, so I can't actually check. I could maybe put it on SQL Express.
The poster of those links is also the author of those articles 🙂 Although Gail is human, I have not seen her make mistakes in her blog posts.
Jared
CE - Microsoft
July 18, 2012 at 10:00 am
Jasmine D. Adamson (7/18/2012)
It should be "Where @variable is null OR column = @variable" so we can take advantage of short-circuits.
SQL doesn't short circuit evaluations. Only happens in some odd cases involving constants and variables only, not columns.
I think the point they are making would still be true, but it would be interesting to see if the query plans came out the same with the query written correctly.
Yes, the point that I make in that blog post is valid regardless of which way around you write the where clause (there's no 'correct' or 'incorrect' way)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 18, 2012 at 10:21 am
WTH?! It doesn't short circuit?! How can that be possible? I've based my whole life on that principle! OMG, what will I do now? How can we prove this? 🙂
July 18, 2012 at 10:31 am
Its not that it doesn't, its that you don't know when it will or won't becasue it is up to the optiizer to choose the best execution plan. I couldn't find a lot on this, but here is a good link: http://rusanu.com/2009/09/13/on-sql-server-boolean-operator-short-circuit/ Gail may have some better explanation or link as she is by far more knowledgeable on the internals than am I (my knowledge only consists of what I have read and retained from Kalen Delaney's book).
Jared
CE - Microsoft
July 18, 2012 at 10:47 am
I'm not sure how to take that comment soo...
No, SQL does not always short circuit, it cannot be depended on like you can with C#. The basic reason is twofold
1) The optimiser is always free to reorder predicates as it sees fit as long as the logic of the query remains the same. AND and OR are commutative, A AND B == B AND A, A OR B == B OR A, hence the optimiser can and does decide which to evaluate first and which to evaluate second (usually based on statistics and estimations) and is in no way obligated to keep the order specified or to make it at all clear which order it evaluated the expressions in. T-SQL is a declarative language, not a procedural one.
2) Plans must be safe for reuse (the only exception here is if OPTION RECOMPILE is specified in versions above 2008 SP2). As such any plan chosen by the optimiser must evaluate all predicates regardless of how they will evaluate the first time they are executed. Hence the optimiser cannot 'short circuit' the plan and not include a predicate as the plan must produce correct results when reused, no matter what parameter values are passed.
The second reason is also behind why I said that many of the query forms discussed in this thread will cause havoc when they are reused with different parameter values.
When writing queries that 'handle' optional predicates, performance testing on a clear cache with only one variation on the parameter values is not enough. Testing how the query will perform when a plan compiled with one set of parameters is called with another is also crucial.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 18, 2012 at 11:06 am
And to reinforce the point.
CREATE PROCEDURE SearchHistory
(@Product int = NULL, @OrderID int = NULL, @TransactionType char(1) = NULL, @Qty int = NULL)
AS
SELECT ProductID, ReferenceOrderID, TransactionType, Quantity,
TransactionDate, ActualCost from Production.TransactionHistory
WHERE (ProductID = @product Or @product IS NULL)
AND (ReferenceOrderID = @OrderID OR @OrderID Is NULL)
AND (TransactionType = @TransactionType OR @TransactionType Is NULL)
AND (Quantity = @Qty Or @Qty is null)
GO
CREATE PROCEDURE SearchHistory_optimised
(@Product int = NULL, @OrderID int = NULL, @TransactionType char(1) = NULL, @Qty int = NULL)
AS
SELECT ProductID, ReferenceOrderID, TransactionType, Quantity,
TransactionDate, ActualCost from Production.TransactionHistory
WHERE (@Product IS NULL OR ProductID = @product)
AND (@OrderID Is NULL OR ReferenceOrderID = @OrderID)
AND (@TransactionType Is NULL OR TransactionType = @TransactionType)
AND (@Qty is NULL OR Quantity = @Qty)
GO
EXEC SearchHistory @product = 978, @TransactionType = 'W'
EXEC SearchHistory_optimised @product = 978, @TransactionType = 'W'
Exec plans identical costs
and identical execution characteristics
1st
Table 'TransactionHistory'. Scan count 1, logical reads 1823, physical reads 0.
SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 45 ms.
2nd
Table 'TransactionHistory'. Scan count 1, logical reads 1823.
SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 46 ms.
Now reuse the plans with a different parameter value
EXEC SearchHistory @Qty = 100
EXEC SearchHistory_optimised @Qty = 100
The plans are absolutely identical to shown above, which is completely expected since it's a reused plan.
Execution characteristics
1st
Table 'TransactionHistory'. Scan count 1, logical reads 1390863, physical reads 0.
SQL Server Execution Times:
CPU time = 780 ms, elapsed time = 794 ms.
2nd
Table 'TransactionHistory'. Scan count 1, logical reads 1390863, physical reads 0
SQL Server Execution Times:
CPU time = 718 ms, elapsed time = 721 ms.
The CPU time is very slightly lower in the second case, so there appears to be some short-circuiting taking place, but that's less than 10% difference in CPU between the two and it's still a horridly inefficient query (700 ms and 1.4 million reads to return 1160 rows)
p.s. My copy of AW is padded out with extra data in many tables. You won't see the same degree of difference running on smaller tables. There will still be a difference (inefficient queries are still inefficient on smaller row counts), it just won't be as marked.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 13 posts - 16 through 27 (of 27 total)
You must be logged in to reply to this topic. Login to reply