Select Statement Using Not In and Null Values

Question

Select Statement Using Not In and Null Values

pnewhart

Ten Centuries

Points: 1033
More actions
December 2, 2003 at 12:10 pm

#101783

Table1 def is
IDint(Identity)
Field1char
Table2 def is
ID int (Identity)
Field1 char
ID2 int (not identity)
Table1 has ID values 1, 2, 3, 7, 8, 9
Table2 has ID2 values 1, 2, 3, null
If I use select * from table1 where id not in (select id2 from table2) I get no results; I would expect to return 7,8,9.
If I remove the null entry, that is what I get; also, if I add 'where id2 is not null' to the subquery, I also get my expected result.
Why does it behave this way?

Viewing 14 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

mdaniel SSCarpal Tunnel Points: 4616 More actions · Answer 1

The value NULL means the data value for the column is unknown. In your query, SQL compares each value in the Table1 with the subset returned from the Table2. Since Table2 includes a record with the value NULL it is undefined if the numbers (7,8,9) exists or not in the subset.

By adding 'where id2 is not null' you are eliminating the record NULL from the subset, so now it can be determined that the numbers (7,8,9) are not in the subset.

pnewhart Ten Centuries Points: 1033 More actions · Answer 2

I'm not sure I quite understand. Why wouldn't it ignore the null value or at least not stop there but continue the next records in the table?

mdaniel SSCarpal Tunnel Points: 4616 More actions · Answer 3

The NULL value cannot be ignored because it's a record with a value NULL.

SQL checks all (4) records in Table2.

pnewhart Ten Centuries Points: 1033 More actions · Answer 4

Still not getting it. I don't understand how one entry with a "value" of null is preventing SQL from reporting the other entries that are not in the table.

mdaniel SSCarpal Tunnel Points: 4616 More actions · Answer 5

quote:
From BOL:
A value of NULL indicates the value is unknown. A value of NULL is different from an empty or zero value. No two null values are equal. Comparisons between two null values, or between a NULL and any other value, return unknown because the value of each NULL is unknown.

When SQL compares the value 7 with NULL it returns NULL, then the expression NOT IN is undefined (returns NULL). It has to return TRUE for the record to be included in the result set.

guarddata SSCertifiable Points: 6583 More actions · Answer 6

Yeah - null handling in these types of queries is wierd. For example...

select * from table1 where id not in ( null )

will produce no results. The problem is in the internal handling of IN. Since all comparisons with null fail, it is a little like saying <> null (instead of IS NOT null) - which produces unexpected results unless you change the default settings.

I would recommend you either leave IS NOT NULL in the subquery, or use a left join... i.e.

SELECT t1.* FROM Table1 t1

LEFT JOIN Table2 T2 ON T2.ID2 = T1.ID

WHERE T2.ID2 IS NULL

my $0.02

Guarddata-

pnewhart Ten Centuries Points: 1033 More actions · Answer 7

I got around it ok. I just don't recall ever encountering this circumstance before. I usually, for whatever reason, find it quicker to compose 'not in' sytax queries on the fly rather than using outer joins. From now on I'll have to be a little more careful. Thanks for the info.

guarddata SSCertifiable Points: 6583 More actions · Answer 8

Sure - I understand that. From a performance perspective, I have found the OUTER JOIN to be much better when there are lots of rows. The NOT IN syntax causes table scans...for what it's worth.

Guarddata-

Harsha_ASP SSC Enthusiast Points: 117 More actions · Answer 9

I normally use isnull(id2,-1)=-1 ( when it is a numeric) or isnull(id4,'---')='---' (where id4 is a string and '---' is an impossible value) instead of where id2 is null. Is the method i use not advisable when it comes to performance ..

Harsha

HArsha

DimitrisAgelidis Ten Centuries Points: 1166 More actions · Answer 10

Try the following

select * from table1 where id not in (select id2 from table2 where isnull(id2,'') <> '' )

lambje SSCrazy Points: 2085 More actions · Answer 11

Might I suggest that you use a NOT EXISTS? It should perform better than a NOT IN. It's also curious that I couldn't reproduce your issue. It may be a database setting difference. Here's my test and example of NOT EXISTS:

CREATE TABLE Table1 (

ID INT,

Field1 char(8))

CREATE TABLE Table2 (

ID INT,

Field1 char(8),

ID2 INT NULL)

INSERT Table1 VALUES (1, 'a')

INSERT Table1 VALUES (2, 'b')

INSERT Table1 VALUES (3, 'c')

INSERT Table1 VALUES (7, 'g')

INSERT Table1 VALUES (8, 'h')

INSERT Table1 VALUES (9, 'i')

INSERT Table2 VALUES (1, 'a', 1)

INSERT Table2 VALUES (2, 'b', 2)

INSERT Table2 VALUES (3, 'c', 3)

INSERT Table2 VALUES (4, 'd', NULL)

SELECT * FROM Table1 T1 WHERE NOT EXISTS (SELECT 1 FROM Table2 T2 WHERE T2.ID2 = T1.ID)

IDField1

7g

8h

9i

SELECT * FROM Table1 T1 WHERE ID NOT IN (SELECT ID2 FROM Table2 T2)

IDField1

7g

8h

9i

Jeff

Calvin Lawson SSChampion Points: 11030 More actions · Answer 12

Yes, "not exists" with a correlated subquery is way faster than "not in".

I've tested all these methods, and using a left join and looking for a null is fastest (beats out "not exists" by a little bit). But if I'm dealing with a small dataset I'll use "not exists", as it's easier to read, and the performance diff is neglible when under 1000 rows or so. Anything more and you better use the Left Join method.

cl

Signature is NULL

Scott Coleman One Orange Chip Points: 27452 More actions · Answer 13

It's probably so obvious that you forgot to mention it, but you have to be careful to only use the left join method when at least one side of the comparison is a unique field, so you don't get duplicate rows in the result.