WHERE col NOT IN (SELECT...) not working

Question

WHERE col NOT IN (SELECT...) not working

Dennis D. Allen

Right there with Babe

Points: 719
More actions
January 21, 2009 at 1:25 pm

#71230

I have a parent child relation between to tables. Not all parents have children, some parents have 1 or more children. I want all of the Parent rows that do not have any Child rows.
There are about 1 million Parent rows, and about 500 thousand Child rows, so I am sure there are at least 500 thousand Parent rows (there are several more because some Parents have multiple children). I am sure that my query is correct, it is in the form;
SELECT *
FROM Parent
WHERE ParentID NOT IN (SELECT ParentID FROM Child)
But, I am getting 0 results. My query plan follows as best I can visualize it here;
SELECT <-- HASH Match (Right Anti Semi Join) <-- Index Scan (Child non-clustered Index)
^-- Nested Loops (Left Anti Semi Join [Warning: No Join Predicate]) <-- Table Scan (Parent)
^-- Row Count Spool (Lazy Spool) <-- Clustered Index Seek (Child)
My query plan seems to show a broken query plan, but I have no clue what is causing it to choose this plan nor how to fix the plan.
Any suggestions would be greatly appreciated!
Dennis

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442470 More actions · Answer 1

Dennis D. Allen (1/21/2009)
I have a parent child relation between to tables. Not all parents have children, some parents have 1 or more children. I want all of the Parent rows that do not have any Child rows.
There are about 1 million Parent rows, and about 500 thousand Child rows, so I am sure there are at least 500 thousand Parent rows (there are several more because some Parents have multiple children). I am sure that my query is correct, it is in the form;
SELECT *
FROM Parent
WHERE ParentID NOT IN (SELECT ParentID FROM Child)
But, I am getting 0 results. My query plan follows as best I can visualize it here;
SELECT <-- HASH Match (Right Anti Semi Join) <-- Index Scan (Child non-clustered Index)
^-- Nested Loops (Left Anti Semi Join [Warning: No Join Predicate]) <-- Table Scan (Parent)
^-- Row Count Spool (Lazy Spool) <-- Clustered Index Seek (Child)
My query plan seems to show a broken query plan, but I have no clue what is causing it to choose this plan nor how to fix the plan.
Any suggestions would be greatly appreciated!
Dennis

Try this:

select

p.*

from

Parent p

left outer join Child c

on (p.ParentID = c.ParentID)

where

c.ParentID is null

Let us know if this works.

Luke L SSC-Dedicated Points: 35547 More actions · Answer 2

Depending on your indexes and such this could be rather a bit on the long side but why not something like the following???

SELECT Parent.*

FROM Parent Left Join Child ON Parent.ParentID = Child.ParentID

WHERE child.ParentID IS NULL

To help us help you read this[/url]For better help with performance problems please read this[/url]

Luke L SSC-Dedicated Points: 35547 More actions · Answer 3

One of these days I'll learn to recheck the original post before I post the same answer that's already been suggested a minute or two ago...

-Luke.

To help us help you read this[/url]For better help with performance problems please read this[/url]

Lynn Pettis SSC Guru Points: 442470 More actions · Answer 4

Use the Force Luke...

Actually, nothing wrong with posting an answer that's already been posted. One thing you'll see is a difference in formating styles.

Luke L SSC-Dedicated Points: 35547 More actions · Answer 5

Yeah, I used to run all my posts through a SQL formatter, but My HIPS software doesn't like any of the online ones anymore (keeps thinking I'm trying to run a SQL injection attack against them) so I just went with a copy and paste from the OP and added the appropriate syntax. Typically I'd end up having mine look eerily similar to yours, at least in my production code.

-Luke.

To help us help you read this[/url]For better help with performance problems please read this[/url]

Lynn Pettis SSC Guru Points: 442470 More actions · Answer 6

Lynn Pettis

SSC Guru

Points: 442470

January 21, 2009 at 1:55 pm

#930696

Format by hand do I.

Dennis D. Allen Right there with Babe Points: 719 More actions · Answer 7

Greetings Lynn,

Yes that does work and is a fine workaround.

I want to understand why this does not work as it should work correctly and the fact that it is not might point to an error in my system that I would like to find and fix. Also, I am trying to learn to read the execution plan better.

Any ideas?

Luke L SSC-Dedicated Points: 35547 More actions · Answer 8

Format by SQL Prompt I do, but what can I say I used to like being able to post in color... makes it much easier to read while answering posts...

-Luke.

To help us help you read this[/url]For better help with performance problems please read this[/url]

Jeffrey Williams SSC Guru Points: 90351 More actions · Answer 9

Dennis D. Allen (1/21/2009)
Greetings Lynn,
Yes that does work and is a fine workaround.
I want to understand why this does not work as it should work correctly and the fact that it is not might point to an error in my system that I would like to find and fix. Also, I am trying to learn to read the execution plan better.
Any ideas?

Hi Dennis, the reason is really simple. When you run the query:

SELECT ParentID FROM Child;

You are returning at least one row where the ParentID is NULL. That forces the IN statement to evaluate to UNKNOWN which is not TRUE and you get no rows.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

Lynn Pettis SSC Guru Points: 442470 More actions · Answer 10

Dennis D. Allen (1/21/2009)
Greetings Lynn,
Yes that does work and is a fine workaround.
I want to understand why this does not work as it should work correctly and the fact that it is not might point to an error in my system that I would like to find and fix. Also, I am trying to learn to read the execution plan better.
Any ideas?

Not that what I gave you is a workaround. I actually think that it is more efficient than using NOT IN. The execution plan showed a nested loop. Think about this for a moment. For each row in Parent, you have to loop through 1,000,000 values returned from Child before knowing if it does not exist in Child. Not very efficient.

Now, the left outer join returns all records from Parent regardless of a matching record in Child, and you then filter on the Child ParentID, which will be null if there was no match to the ParentID from Parent.

Dennis D. Allen Right there with Babe Points: 719 More actions · Answer 11

Jeffrey Williams (1/21/2009)
Hi Dennis, the reason is really simple. When you run the query:
SELECT ParentID FROM Child;
You are returning at least one row where the ParentID is NULL. That forces the IN statement to evaluate to UNKNOWN which is not TRUE and you get no rows.

That was indeed the issue. Since the child table should not allow NULLs in the ParentID column, I never thought to look for that.

Thank you so much!

Dennis

Dennis D. Allen Right there with Babe Points: 719 More actions · Answer 12

Lynn Pettis (1/21/2009)
Not that what I gave you is a workaround. I actually think that it is more efficient than using NOT IN. The execution plan showed a nested loop. Think about this for a moment. For each row in Parent, you have to loop through 1,000,000 values returned from Child before knowing if it does not exist in Child. Not very efficient.
Now, the left outer join returns all records from Parent regardless of a matching record in Child, and you then filter on the Child ParentID, which will be null if there was no match to the ParentID from Parent.

I intended workaround in the sense that the technique I was employing was causing unexpected results, and using a different technique was avoiding finding the cause of the unexpected behavior.

Thank you for pointing out the efficiency gain in your technique. In comparing the executing plan and both the IO and TIME statistics, it is evident you are correct. Your query produces a 50% reduction in table scans on the Child and 15% reduction in logical reads on the child and a 15% gain in execution time.

Thank you Lynn 🙂

RBarryYoung SSC Guru Points: 143329 More actions · Answer 13

Lynn Pettis (1/21/2009)
Use the Force Luke...

Doggone It! I've been waiting for weeks for an excuse to use that line on Luke!

😎

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Luke L SSC-Dedicated Points: 35547 More actions · Answer 14

Ya know, I'd say that "I'd never join you" in my whiniest text possible, however who knows what the future will bring...

And I've been wondering for weeks why we have 2 Vaders running around here, or are you and Perry one in the same?

-Luke.

To help us help you read this[/url]For better help with performance problems please read this[/url]