November 18, 2008 at 12:08 am
Comments posted to this topic are about the item Understanding NULL
November 18, 2008 at 3:19 am
A slightly easier way to explain might be the fully-written-out equivalents of the IN clauses.
I've taken the liberty of "paraphrasing" T-SQL Syntax a little here, as there is no way of representing boolean values (the results of boolean operations) in T-SQL (or any SQL?) that I know of.
IF(NULL IN (34, 35, NULL))PRINT 'TRUE' =>
IF(NULL = 34 OR NULL = 35 OR NULL = NULL) PRINT 'TRUE' =>
IF(UNKNOWN OR UNKNOWN OR UNKNOWN) PRINT 'TRUE' =>
IF(UNKNOWN) PRINT 'TRUE'
and
IF(34 NOT IN (35, 36, NULL))PRINT 'FALSE' =>
IF(NOT (34 = 35 OR 34 = 36 OR 34 = NULL)) PRINT 'FALSE' =>
IF(NOT (FALSE OR FALSE OR UNKNOWN)) PRINT 'FALSE' =>
IF(NOT (UNKNOWN)) PRINT 'FALSE'
Because we all know(? 🙂 ) that "False OR UNKNOWN" evaluates to UNKNOWN, whereas "True OR UNKNOWN" evaluates to True...
There's some great articles on SQLServerCentral.com about Nulls, eg: http://www.sqlservercentral.com/articles/Advanced+Querying/2829/
Now if you REALLY want to have some fun, turn off the ANSI_NULLS option:
SET ANSI_NULLS OFF
All of a sudden three-valued logic becomes two-valued, and you have the equivalent of:
IF(NULL IN (34, 35, NULL))PRINT 'TRUE' =>
IF(NULL = 34 OR NULL = 35 OR NULL = NULL) PRINT 'TRUE' =>
IF(FALSE OR FALSE OR TRUE) PRINT 'TRUE' =>
IF(TRUE) PRINT 'TRUE'
and
IF(34 NOT IN (35, 36, NULL))PRINT 'FALSE' =>
IF(NOT (34 = 35 OR 34 = 36 OR 34 = NULL)) PRINT 'FALSE' =>
IF(NOT (FALSE OR FALSE OR FALSE)) PRINT 'FALSE' =>
IF(NOT (FALSE)) PRINT 'FALSE'
And now everything prints as you might have expected if you came to SQL Server before 2000, which is when ANSI_NULLS became default...
(and incidentally, that's a small issue in the question, which assumed that ANSI_NULLS was off - a reasonable assumption in most environments, but still a rich source of horrible confusion when it is not)
http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.
November 18, 2008 at 4:33 am
Good question, and good comment to explain the result and the fact that it depends on ANSI_NULLS setting.
It got me thinking about the other IN / NOT IN syntax as well:
Suppose I want a list of all the employees in NORTHWIND database who have other people reporting to them. I coud do it like this:
select * from employees where employeeid IN (select reportsto from employees)
and I get 2 records: employee 2 and employee 5.
Suppose I want a list of all the employees in NORTHWIND database who DON'T have anyone reporting to them. If I try it using the apparently equivalent code ...
select * from employees where employeeid NOT IN (select reportsto from employees)
... I get no records returned.
The reason is that there is one employee with a NULL entry in "reportsto", which makes the NOT IN comparison translate to (employeeid <>2 AND employeeid <>5 AND employeeid <> NULL)
And as mentioned above, TRUE AND TRUE AND UNKNOWN equals UNKNOWN.
To make it work I need to remove the NULLS:
select * from employees where employeeid NOT IN (select reportsto from employees where reportsto is not null)
Now I just need to check my actual code to make sure I haven't fallen for this one in real life.
November 18, 2008 at 4:53 am
Hmm, nice one, I've never seen that issue described.
That provides another reason why the following syntax is (in my opinion) preferable:
select *
from employees
left join employees as subordinates on employees.employeeid = subordinates.reportsto
where subordinates.employeeid Is Null
The main other reason I know of is performance on complex queries, where the query optimizer may sometimes evaluate the subquery (potentially passing a lot of data to the outer query) instead of doing an efficient join (I don't know the exact circumstances governing this, but I have definitely seen it happen)
Unfortunately the above syntax/approach has the disadvantage that it is less intuitive to many people 🙁
I make it a rule that subqueries (but not derived tables) are to be avoided at all costs; does anyone have a more flexible rule of thumb that can allow the use of subqueries without risk of perfromance or logic issues?
http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.
November 18, 2008 at 5:58 am
This question got me!
Well done muzahm
The set ansi-nulls off could off course be debated, but still, good question.
The first if I had no problem with, but the second one got me...
I learned a lot about nulls through Tao's explanation and examples. Thanks, clear, precise and to the point.
The matter on nulls can so easily lead to queries not returning expected results and if you are unaware of the implications don't even realize your query is not returning what you expect and the "bug" is left in the code.....
November 18, 2008 at 8:41 am
Tao Klerks (11/18/2008)
Hmm, nice one, I've never seen that issue described.That provides another reason why the following syntax is (in my opinion) preferable:
select *
from employees
left join employees as subordinates on employees.employeeid = subordinates.reportsto
where subordinates.employeeid Is Null
The main other reason I know of is performance on complex queries, where the query optimizer may sometimes evaluate the subquery (potentially passing a lot of data to the outer query) instead of doing an efficient join (I don't know the exact circumstances governing this, but I have definitely seen it happen)
Unfortunately the above syntax/approach has the disadvantage that it is less intuitive to many people 🙁
I make it a rule that subqueries (but not derived tables) are to be avoided at all costs; does anyone have a more flexible rule of thumb that can allow the use of subqueries without risk of perfromance or logic issues?
This example is really more of a find unmatched query. It would return an employee record if the employeeid was deleted but another employee record still had that ID as the reportsto in addition to those that are truly unassigned. Let's say we had EmployeeID 10 and an employee with ReportsTo as 10 (let's say ID 45), then we delete Employees where employeeid = 10. In the above query we would see employeeid 45 because the join would fail and leave the subordinates derived table as null. 🙂
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
November 18, 2008 at 9:02 am
Hi slange,
I think you're confusing a couple of possibilities.
The query above reports employees who have no "subordinates" - there are no employees that have their employeeid in the reportsto field.
The query that you are thinking of is almost exactly the opposite:
select *
from employees
left join employees as supervisors on employees.reportsto = supervisors.employeeid
where supervisors.employeeid Is Null
and employees.reportsto Is Not Null
It's actually slightly different because it also has an extra WHERE criterion - making sure that the employees returned actually have a reportsto value (for a supervisor that no longer exists, or never existed).
Does this help?
http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.
November 18, 2008 at 9:13 am
Tao Klerks (11/18/2008)
Hi slange,I think you're confusing a couple of possibilities.
The query above reports employees who have no "subordinates" - there are no employees that have their employeeid in the reportsto field.
The query that you are thinking of is almost exactly the opposite:
select *
from employees
left join employees as supervisors on employees.reportsto = supervisors.employeeid
where supervisors.employeeid Is Null
and employees.reportsto Is Not Null
It's actually slightly different because it also has an extra WHERE criterion - making sure that the employees returned actually have a reportsto value (for a supervisor that no longer exists, or never existed).
Does this help?
Try this and you will see what I mean. 🙂
create table #employees (employeeid int, reportsto int)
insert #employees select 10, null
insert #employees select 45, 10
--here is the original query
select *
from #employees
left join #employees as subordinates on #employees.employeeid = subordinates.reportsto
where subordinates.employeeid Is Null
--now delete the reportsto record
delete #employees where employeeid = 10
--they still show up
select *
from #employees
left join #employees as subordinates on #employees.employeeid = subordinates.reportsto
where subordinates.employeeid Is Null
drop table #employees
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
November 18, 2008 at 10:08 am
Yes, employee 45 SHOULD show up in this list, because nobody reports to them.
Add another record where someone reports to 45, then 45 will not appear - even if you delete its parent record:
create table #employees (employeeid int, reportsto int)
insert #employees select 10, null
insert #employees select 45, 10
insert #employees select 50, 45
--here is the original query - we find employee 50 as the only one who does not have anyone reporting to them
select *
from #employees
left join #employees as subordinates on #employees.employeeid = subordinates.reportsto
where subordinates.employeeid Is Null
--now delete the reportsto record
delete #employees where employeeid = 10
-- we still get the same answer, 45 still does not appear, even though its parent is deleted
select *
from #employees
left join #employees as subordinates on #employees.employeeid = subordinates.reportsto
where subordinates.employeeid Is Null
drop table #employees
November 18, 2008 at 11:55 am
As rightly mentioned by Tao, try to run it with ANSI_NULLS OFF:
set ansi_nulls off
IF(NULL IN (34, 35, NULL))PRINT 'TRUE'
IF(34 NOT IN (35, 36, NULL))PRINT 'FALSE'
and you will get a completely different answer
Regards,Yelena Varsha
November 19, 2008 at 12:44 am
Great one. I got confused whether there is something wrong. It was a clever question. But with Set Ansi_Nulls off we are getting different result.
November 19, 2008 at 10:34 am
Thanks to Tao Klerks and Archie Flockhart for a very edifying discussion of nulls. I've saved your comments into my sql server reference folder.
-- Save the planet. It's where I keep all my stuff.
November 19, 2008 at 2:40 pm
with ansi nulls on: NULL is like a black hole that converts anything it comes into contact with to NULL as well. Kind of like a vampire, NULL bites other values on the neck then what do you know, there vampries (NULL) to.
This is of course if you do not use ISNULL/IS NULL which is like garlic for NULL.
Carlton..
January 16, 2013 at 2:32 am
nice question and excellent explanation from you guys
_______________________________________________________________
To get quick answer follow this link:
http://www.sqlservercentral.com/articles/Best+Practices/61537/
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply