June 21, 2006 at 12:01 pm
I currently have:
INSERT INTO table1
SELECT colA, colB, colC, colD, colE
FROM table2 t2
LEFT OUTER JOIN table1 t1
ON t1.colC = t2.colC
AND t1.colD = t2.colD
WHERE t1.colX IS NULL
I'd like to get a performance comparison between that and something like:
INSERT INTO table1
SELECT colA, colB, colC, colD, colE
FROM table2
WHERE colC, colD NOT IN (SELECT colC, colD FROM table1)
I'm dealing with record sets around 5 million for both t1 and t2. The columns being compared are CHAR(3)'s. ColX in the first query is an arbitrary column. Anyone? Anyone? Other ideas entirely? Thanks...
June 21, 2006 at 1:10 pm
OUTER JOINs always perform better than NOT INs.
I don't believe you can do what you are trying to do.
June 21, 2006 at 1:15 pm
For NOT IN with multiple join columns, you need a correlated sub-query, which means you need NOT EXISTS:
INSERT INTO table1
SELECT colA, colB, colC, colD, colE
FROM table2 As t2
WHERE NOT EXISTS (
SELECT *
FROM Table1 As t1
WHERE t1.ColC = t2.ColC
AND t1.ColD = t2.ColD
)
June 21, 2006 at 1:21 pm
The only way to do it as a single NOT IN subquery would be to concatenate the columns. But unless you use indexed calc columns, and probably even then, this would perform worse than an outer join.
So you should use an outer join, specifying in the where clause that a join column on the outer table IS NULL:
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
June 21, 2006 at 8:33 pm
I agree with JeffB... outer join, search for the nulls... haven't tested for speed but seems that it would be faster than either the correlated subquery or the concatenate method... could be wrong, though.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 22, 2006 at 8:30 am
>>haven't tested for speed but seems that it would be faster than either the correlated subquery
I generally get the same execution plan using NOT EXISTS or LEFT JOIN check for NULL. I tend to use NOT EXISTS so that I know the intention of the query a year from now when I have to maintain/modify it.
June 22, 2006 at 10:36 am
Bold assertion: left anti-join never performs worse than the equivalent NOT EXISTS...
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
June 22, 2006 at 2:07 pm
thanks fellas. I'm actually getting the quickest times using the concatenate and NOT IN clause. Could be a fluke, but I made sure to run freeproccache and dropcleanbuffers first.
I'll revisit the perf testing tomorrow and post the times and table defs.
June 22, 2006 at 2:29 pm
is that using indexed calc columns?
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
June 15, 2009 at 6:34 pm
This is working perfect for me.
Thanks 🙂
June 15, 2009 at 7:53 pm
There are some performance comparisons of various techniques at http://weblogs.sqlteam.com/peterl/archive/2009/06/12/Timings-of-different-techniques-for-finding-missing-records.aspx
"NOT EXISTS" was the fastest for those specific test conditions.
SQL = Scarcely Qualifies as a Language
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply