Any way to use "not in" with multiple columns?

Question

Post reply

Any way to use "not in" with multiple columns?

Greg J

SSCarpal Tunnel

Points: 4288
More actions
June 21, 2006 at 12:01 pm

#114560

I currently have:
INSERT INTO table1
SELECT colA, colB, colC, colD, colE
FROM table2 t2
LEFT OUTER JOIN table1 t1
ON t1.colC = t2.colC
AND t1.colD = t2.colD
WHERE t1.colX IS NULL

I'd like to get a performance comparison between that and something like:
INSERT INTO table1
SELECT colA, colB, colC, colD, colE
FROM table2
WHERE colC, colD NOT IN (SELECT colC, colD FROM table1)

I'm dealing with record sets around 5 million for both t1 and t2. The columns being compared are CHAR(3)'s. ColX in the first query is an arbitrary column. Anyone? Anyone? Other ideas entirely? Thanks...

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

JeffB Hall of Fame Points: 3541 More actions · Answer 1

OUTER JOINs always perform better than NOT INs.

I don't believe you can do what you are trying to do.

PW-201837 SSC-Insane Points: 20805 More actions · Answer 2

For NOT IN with multiple join columns, you need a correlated sub-query, which means you need NOT EXISTS:

INSERT INTO table1

SELECT colA, colB, colC, colD, colE

FROM table2 As t2

WHERE NOT EXISTS (

SELECT *

FROM Table1 As t1

WHERE t1.ColC = t2.ColC

AND t1.ColD = t2.ColD

)

stax68 SSChampion Points: 11761 More actions · Answer 3

The only way to do it as a single NOT IN subquery would be to concatenate the columns. But unless you use indexed calc columns, and probably even then, this would perform worse than an outer join.

So you should use an outer join, specifying in the where clause that a join column on the outer table IS NULL:

select t1.c1, t1.c2
from t1 left join t2 on t1.c3 = t2.c3 and t1.c4 = t2.c4
where t2.c3 is null

Tim Wilkinson

"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 4

I agree with JeffB... outer join, search for the nulls... haven't tested for speed but seems that it would be faster than either the correlated subquery or the concatenate method... could be wrong, though.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

PW-201837 SSC-Insane Points: 20805 More actions · Answer 5

>>haven't tested for speed but seems that it would be faster than either the correlated subquery

I generally get the same execution plan using NOT EXISTS or LEFT JOIN check for NULL. I tend to use NOT EXISTS so that I know the intention of the query a year from now when I have to maintain/modify it.

stax68 SSChampion Points: 11761 More actions · Answer 6

Bold assertion: left anti-join never performs worse than the equivalent NOT EXISTS...

Tim Wilkinson

"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant

Greg J SSCarpal Tunnel Points: 4288 More actions · Answer 7

thanks fellas. I'm actually getting the quickest times using the concatenate and NOT IN clause. Could be a fluke, but I made sure to run freeproccache and dropcleanbuffers first.

I'll revisit the perf testing tomorrow and post the times and table defs.

stax68 SSChampion Points: 11761 More actions · Answer 8

is that using indexed calc columns?

Tim Wilkinson

"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant

kalyan_ch8 Grasshopper Points: 11 More actions · Answer 9

kalyan_ch8

Grasshopper

Points: 11

June 15, 2009 at 6:34 pm

#1009982

This is working perfect for me.

Thanks 🙂

Carl Federl One Orange Chip Points: 25384 More actions · Answer 10

There are some performance comparisons of various techniques at http://weblogs.sqlteam.com/peterl/archive/2009/06/12/Timings-of-different-techniques-for-finding-missing-records.aspx

"NOT EXISTS" was the fastest for those specific test conditions.

SQL = Scarcely Qualifies as a Language