December 2, 2003 at 12:10 pm
Table1 def is
IDint(Identity)
Field1char
Table2 def is
ID int (Identity)
Field1 char
ID2 int (not identity)
Table1 has ID values 1, 2, 3, 7, 8, 9
Table2 has ID2 values 1, 2, 3, null
If I use select * from table1 where id not in (select id2 from table2) I get no results; I would expect to return 7,8,9.
If I remove the null entry, that is what I get; also, if I add 'where id2 is not null' to the subquery, I also get my expected result.
Why does it behave this way?
December 2, 2003 at 1:05 pm
The value NULL means the data value for the column is unknown. In your query, SQL compares each value in the Table1 with the subset returned from the Table2. Since Table2 includes a record with the value NULL it is undefined if the numbers (7,8,9) exists or not in the subset.
By adding 'where id2 is not null' you are eliminating the record NULL from the subset, so now it can be determined that the numbers (7,8,9) are not in the subset.
December 2, 2003 at 1:48 pm
I'm not sure I quite understand. Why wouldn't it ignore the null value or at least not stop there but continue the next records in the table?
December 2, 2003 at 1:59 pm
The NULL value cannot be ignored because it's a record with a value NULL.
SQL checks all (4) records in Table2.
December 2, 2003 at 2:08 pm
Still not getting it. I don't understand how one entry with a "value" of null is preventing SQL from reporting the other entries that are not in the table.
December 2, 2003 at 2:25 pm
quote:
From BOL:A value of NULL indicates the value is unknown. A value of NULL is different from an empty or zero value. No two null values are equal. Comparisons between two null values, or between a NULL and any other value, return unknown because the value of each NULL is unknown.
When SQL compares the value 7 with NULL it returns NULL, then the expression NOT IN is undefined (returns NULL). It has to return TRUE for the record to be included in the result set.
December 2, 2003 at 2:26 pm
Yeah - null handling in these types of queries is wierd. For example...
select * from table1 where id not in ( null )
will produce no results. The problem is in the internal handling of IN. Since all comparisons with null fail, it is a little like saying <> null (instead of IS NOT null) - which produces unexpected results unless you change the default settings.
I would recommend you either leave IS NOT NULL in the subquery, or use a left join... i.e.
SELECT t1.* FROM Table1 t1
LEFT JOIN Table2 T2 ON T2.ID2 = T1.ID
WHERE T2.ID2 IS NULL
my $0.02
Guarddata-
December 2, 2003 at 2:32 pm
I got around it ok. I just don't recall ever encountering this circumstance before. I usually, for whatever reason, find it quicker to compose 'not in' sytax queries on the fly rather than using outer joins. From now on I'll have to be a little more careful. Thanks for the info.
December 2, 2003 at 2:37 pm
Sure - I understand that. From a performance perspective, I have found the OUTER JOIN to be much better when there are lots of rows. The NOT IN syntax causes table scans...for what it's worth.
Guarddata-
December 2, 2003 at 11:35 pm
I normally use isnull(id2,-1)=-1 ( when it is a numeric) or isnull(id4,'---')='---' (where id4 is a string and '---' is an impossible value) instead of where id2 is null. Is the method i use not advisable when it comes to performance ..
Harsha
HArsha
HArsha
December 3, 2003 at 7:38 am
Try the following
select * from table1 where id not in (select id2 from table2 where isnull(id2,'') <> '' )
December 3, 2003 at 7:58 am
Might I suggest that you use a NOT EXISTS? It should perform better than a NOT IN. It's also curious that I couldn't reproduce your issue. It may be a database setting difference. Here's my test and example of NOT EXISTS:
CREATE TABLE Table1 (
ID INT,
Field1 char(8))
CREATE TABLE Table2 (
ID INT,
Field1 char(8),
ID2 INT NULL)
INSERT Table1 VALUES (1, 'a')
INSERT Table1 VALUES (2, 'b')
INSERT Table1 VALUES (3, 'c')
INSERT Table1 VALUES (7, 'g')
INSERT Table1 VALUES (8, 'h')
INSERT Table1 VALUES (9, 'i')
INSERT Table2 VALUES (1, 'a', 1)
INSERT Table2 VALUES (2, 'b', 2)
INSERT Table2 VALUES (3, 'c', 3)
INSERT Table2 VALUES (4, 'd', NULL)
SELECT * FROM Table1 T1 WHERE NOT EXISTS (SELECT 1 FROM Table2 T2 WHERE T2.ID2 = T1.ID)
IDField1
7g
8h
9i
SELECT * FROM Table1 T1 WHERE ID NOT IN (SELECT ID2 FROM Table2 T2)
IDField1
7g
8h
9i
Jeff
December 3, 2003 at 12:33 pm
Yes, "not exists" with a correlated subquery is way faster than "not in".
I've tested all these methods, and using a left join and looking for a null is fastest (beats out "not exists" by a little bit). But if I'm dealing with a small dataset I'll use "not exists", as it's easier to read, and the performance diff is neglible when under 1000 rows or so. Anything more and you better use the Left Join method.
cl
cl
Signature is NULL
December 5, 2003 at 8:11 am
It's probably so obvious that you forgot to mention it, but you have to be careful to only use the left join method when at least one side of the comparison is a unique field, so you don't get duplicate rows in the result.
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply