December 29, 2010 at 8:26 pm
Comments posted to this topic are about the item Understanding T-SQL Expression Short-Circuiting
I would like to acknowledge and thank Gus Gwynne, Jeff Moden and Paul White for their review and the constructive feedback that they gave for this article.
For some reason my thanks did not make it to the article page. My humble apologies to all of you.
-- Gianluca Sartori
December 30, 2010 at 2:43 am
This is a great bit of research, with some timely warnings from Gianluca for anyone who makes assumptions about TSQL based on experience with a procedural language.
Best wishes,
Phil Factor
December 30, 2010 at 7:58 am
Any boolean expression is capable of being short-circuited, in the right circumstances.
So under what circumstances can you short-circuit an XOR? (i.e, if either A or B but not both then C)?
December 30, 2010 at 8:23 am
One easy way to make sure short circuiting works the way you want it is using case statements:
select
*
from
Person
where
1 = 1
and CreateDateTime > getdate() - 30
and case
when Age > 90 then 1
when Age < 5 then 0
when Gender = 'Male' then 1
when LastName like 'SAM%' then 1
else 0
end = 1
This gets records for all people over the age of 90, males of age 5 or more and anyone with a last name that starts with the letters SAM. Notice that the integer checks are done first as they are the easiest to evaluate and the expensive like expression is last. The documentation for the case statement explicity says:
Evaluates, in the order specified, Boolean_expression for each WHEN clause.
so this is like an explicit short circuit if you would like.
December 30, 2010 at 8:33 am
sknox (12/30/2010)
Any boolean expression is capable of being short-circuited, in the right circumstances.
So under what circumstances can you short-circuit an XOR? (i.e, if either A or B but not both then C)?
T-SQL lacks a XOR logical operator, but it can be implemented from its definition:
A XOR B = (A AND NOT B) OR (NOT A AND B)
Sorry for the stupid example, I can't think of a better one right now: to find all users with NULL first_name (expression A) or NULL middle_name (expression B) but not both you could write:
-- This is how you would do it if T-SQL had a XOR operator.
SELECT *
FROM user
WHERE (first_name IS NULL) XOR (middle_name IS NULL)
-- This is how you have to code it with AND, OR and NOT operators
SELECT *
FROM user
WHERE (first_name IS NULL AND middle_name IS NOT NULL)
OR (first_name IS NOT NULL AND middle_name IS NULL)
Any boolean operator can be rewritten using AND, OR and NOT.
-- Gianluca Sartori
December 30, 2010 at 8:33 am
Excellent article. Thanks for sharing!
December 30, 2010 at 8:55 am
Phil Factor (12/30/2010)
This is a great bit of research, with some timely warnings from Gianluca for anyone who makes assumptions about TSQL based on experience with a procedural language.
Thanks, Phil.
-- Gianluca Sartori
December 30, 2010 at 9:01 am
CELKO (12/30/2010)
Another short-circuit (or McCarthy) evaluation problem from Algol was functions with side effects:IF a=b OR Boolean_function_with side_effect (x) THEN ..
If the function is skipped, then there was no side effect; if it was executed, then a or b might be changed. Thius is why functional programming disallows aside effects.
Unfortunately, many programming languages don't disallow side-effects inside functions. It's up to the programmer to produce reliable code and avoid "dirty tricks".
As a side note, a CLR function can update data. 😉
-- Gianluca Sartori
December 30, 2010 at 9:17 am
Daniel Ruehle (12/30/2010)
One easy way to make sure short circuiting works the way you want it is using case statements:
select
*
from
Person
where
1 = 1
and CreateDateTime > getdate() - 30
and case
when Age > 90 then 1
when Age < 5 then 0
when Gender = 'Male' then 1
when LastName like 'SAM%' then 1
else 0
end = 1
This gets records for all people over the age of 90, males of age 5 or more and anyone with a last name that starts with the letters SAM. Notice that the integer checks are done first as they are the easiest to evaluate and the expensive like expression is last. The documentation for the case statement explicity says:
Evaluates, in the order specified, Boolean_expression for each WHEN clause.
so this is like an explicit short circuit if you would like.
You're right, Daniel. CASE is guranteed to evaluate expressions in the exact order they appear.
What is questionable is the time you save by pushing "expensive tests" down. Unless you're working with billion row tables, you wouldn't even notice the difference. It's the query plan that decides how fast the query will run, not the number of expressions to evaluate.
-- Gianluca Sartori
December 30, 2010 at 10:12 am
Excellent article - For more fun, check other DBMSes. I checked on Oracle and "select 'A' from dual where 1=0 or 1/0 = 1;" gives a division by zero error. (although it may need to be in a procedure on Oracle to do it with the IF statement). Anyone have DB/2 or Teradata handy?
December 30, 2010 at 10:26 am
Gianluca Sartori (12/30/2010)
sknox (12/30/2010)
Any boolean expression is capable of being short-circuited, in the right circumstances.
So under what circumstances can you short-circuit an XOR? (i.e, if either A or B but not both then C)?
T-SQL lacks a XOR logical operator, but it can be implemented from its definition:
A XOR B = (A AND NOT B) OR (NOT A AND B)
Sorry for the stupid example, I can't think of a better one right now: to find all users with NULL first_name (expression A) or NULL middle_name (expression B) but not both you could write:
-- This is how you would do it if T-SQL had a XOR operator.
SELECT *
FROM user
WHERE (first_name IS NULL) XOR (middle_name IS NULL)
-- This is how you have to code it with AND, OR and NOT operators
SELECT *
FROM user
WHERE (first_name IS NULL AND middle_name IS NOT NULL)
OR (first_name IS NOT NULL AND middle_name IS NULL)
Any boolean operator can be rewritten using AND, OR and NOT.
I know how to write an XOR using AND/OR/NOT. But while you can write it, you can't short-circuit it:
(first_name IS NULL AND middle_name IS NOT NULL) OR (first_name IS NOT NULL AND middle_name IS NULL)
In that code, both first_name and middle_name have to be evaluated. First we must evaluate first_name. If it's not NULL, then we can, yes, ignore middle_name here and short-circuit the first AND. But then we return false to the first part of the OR so we must evaluate the second part. Since first_name is not NULL, we know we must evaluate the second part of second AND, which evaluates middle_name. So you have to evaluate both sides of the XOR.
You can reorder the AND and OR operators, but since the two sides are mutually exclusive, you will always have to evaluate both of the original expressions. So not all boolean expressions can be short-circuited.
December 30, 2010 at 10:31 am
Gianluca, thanx for sharing an EXCELLENT article!!!
December 30, 2010 at 11:04 am
Gianluca Sartori (12/30/2010)
Daniel Ruehle (12/30/2010)
One easy way to make sure short circuiting works the way you want it is using case statements:
select
*
from
Person
where
1 = 1
and CreateDateTime > getdate() - 30
and case
when Age > 90 then 1
when Age < 5 then 0
when Gender = 'Male' then 1
when LastName like 'SAM%' then 1
else 0
end = 1
This gets records for all people over the age of 90, males of age 5 or more and anyone with a last name that starts with the letters SAM. Notice that the integer checks are done first as they are the easiest to evaluate and the expensive like expression is last. The documentation for the case statement explicity says:
Evaluates, in the order specified, Boolean_expression for each WHEN clause.
so this is like an explicit short circuit if you would like.
You're right, Daniel. CASE is guranteed to evaluate expressions in the exact order they appear.
What is questionable is the time you save by pushing "expensive tests" down. Unless you're working with billion row tables, you wouldn't even notice the difference. It's the query plan that decides how fast the query will run, not the number of expressions to evaluate.
Agree, but if the query plan says its going to scan the table, then it does come down to how long does it take to process each row. Since you aren't guaranteed the order that SQL Server will evaluate the conditions when just using boolean logic, it can choose to do then in an inefficient manner, which I believe was the jist of the article. In scenarios where it might matter, this gives you absolute control the order.
December 30, 2010 at 12:07 pm
Gianluca Sartori (12/30/2010)
As a side note, a CLR function can update data. 😉
Hello Gianluca. Thanks for a great and thorough article :).
This is slightly off topic, but how do you get a CLR Function to be able to alter the state of the DB? I have always seen this error:
System.Data.SqlClient.SqlException: Invalid use of a side-effecting operator 'INSERT' within a function.
I certainly don't think this is a good idea (to alter the state of the DB in a function), but you mention it can be done so I was curious.
Take care,
Solomon...
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
December 30, 2010 at 12:39 pm
Daniel Ruehle (12/30/2010)
... case
when Age > 90 then 1
when Age < 5 then 0
when Gender = 'Male' then 1
when LastName like 'SAM%' then 1
else 0
end = 1
This gets records for all people over the age of 90, males of age 5 or more and anyone with a last name that starts with the letters SAM.
Not quite. This gets records for all people over the age of 90, males of age 5 or more, and anyone age 5 or more with a last name that starts with the letters SAM. This will not retrieve a record for someone under age 5 with a last name starting with SAM.
Your point about using CASE for explicit short-circuiting is good, but your explanation is a perfect example of how careful you have to be when using CASE, for the same reason.
Viewing 15 posts - 1 through 15 (of 60 total)
You must be logged in to reply to this topic. Login to reply