August 10, 2007 at 5:32 am
I agree with all of you about IN is bad and NOT IN is worse. I've got alternative for IN i.e. using INNER JOIN or EXISTS but what is the alternative to NOT IN. I found EXCEPT clause in SQL 2005 but that requires equal number of fields and same order, moreover you cannot really stuck with same number/types of fields all the times. I'm sure I'm missing something but what is it, 'm stuck
August 10, 2007 at 7:59 am
You can use a left outer join to do the same.
Show me everything in table 1 where not in table 2.
So long as your columns are correctly indexed, it'll be a fast query and be SARGable.
CREATE
TABLE #abc (
Col1
int
)
CREATE
TABLE #def (
Col1
int
)
INSERT
INTO #abc VALUES (1)
INSERT
INTO #abc VALUES (2)
INSERT
INTO #abc VALUES (3)
INSERT
INTO #def VALUES (1)
INSERT
INTO #def VALUES (2)
INSERT
INTO #def VALUES (5)
SELECT
*
FROM
#abc A
LEFT OUTer JOIN #def b ON a.col1 = b.col1
WHERE
b
.col1 IS null
Cheers,CrispinI can't die, there are too many people who still have to meet me!It's not a bug, SQL just misunderstood me!
August 10, 2007 at 11:56 am
I've seen the article before and I had the same question then.
Take the following query:
select c1 from t1 where c2 = @value
There is an index defined on c2. So this query is perfectly sargable. Now place the query as a subquery in an IN statement.
...IN (select c1 from t1 where c2 = @value)
While a join would be (in most cases) more efficient, isn't the query, as a subquery, just as sargable as when it was a query? And if we then add NOT:
... NOT IN (select c1 from t1 where c2 = @value)
how does this effect the sargability of the subquery?
According to the article you cite, just adding NOT contributes to poor performance "because the SQL Server optimizer has to use a nested table scan to perform this activity". Why?
Is it because there is a problem in the SS optimizer where the addition of NOT throws it for a loop and it generates table scans no matter how the subquery is constructed? If so, then fine, that's good to know. But let's not talk about it as general relational behavior.
Unless, of course, it is general relational behavior. I must admit that, except for simple queries where I'm just looking for a few known values (...IN ('this', 'that', 'other')...), it's probably been more than 10 years since I've used a subquery in an IN. I learned why not to do this so long ago the specifics have faded in my rapid aging memory and I have no recollection that NOT IN was any worse than just IN. But I am ready to be corrected.
Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms
August 11, 2007 at 7:12 am
I was about to add when asking my question that please don't advise using LEFT OUTER JOIN as I found this sometimes taking same time as NOT IN but EXCEPT. Is there any other suggested method as an alternative to NOT IN other than LEFT OUTER JOIN. Just bit curious!
August 11, 2007 at 8:19 am
Not exists.
Right join .
Change the db design
August 11, 2007 at 7:10 pm
very funny!!!
August 11, 2007 at 7:14 pm
Thanks.
But that's actually pretty all you can do. You can always do select * from both tables and manually do the joins... but there comes a point where you just got to use the tools available for hte job .
August 11, 2007 at 8:48 pm
Jim,
What is the work assignment that says you must use IN? The reason I ask is that it's a strange assignment and, as others have mentioned, is generally a bad idea.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 11, 2007 at 9:08 pm
Actually, NOT IN and NOT EXISTS create identical execution plans and both are a touch more effecient than an outer join with null detection... as always, here's the proof...
--===== Create a decent size test table
-- Takes about 42 seconds to execute.
SELECT TOP 1000000
RowNum = IDENTITY(INT,1,1),
SomeInt = CAST(RAND(CAST(NEWID() AS VARBINARY))*50000+1 AS INT),
SomeString = CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))
+ CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65)),
SomeCSV = 'Part1,Part2,Part3,Part4,Part5,Part6,Part7,Part8,Part9,Part10',
SomeNumber = CAST(RAND(CAST(NEWID() AS VARBINARY))*100 AS MONEY),
SomeDate = CAST(RAND(CAST(NEWID() AS VARBINARY))*3653.0+36524.0 AS DATETIME)
INTO dbo.JBMTest
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
--===== Add a primary key
ALTER TABLE dbo.JBMTest
ADD PRIMARY KEY CLUSTERED (RowNum)
--===== Create another table (bit of a heap, really)
SELECT TOP 1000
RowNum = ISNULL(CAST(RAND(CAST(NEWID() AS VARBINARY))*50000+1 AS INT),0),
SomeDate = CAST(RAND(CAST(NEWID() AS VARBINARY))*3653.0+36524.0 AS DATETIME)
INTO dbo.JBMTest2
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2 --Lack of join criteria makes this a CROSS-JOIN
--===== Add an index
CREATE INDEX IX_JBMTest2_RowNum ON dbo.JBMTest2 (RowNum)
--===== These 2 queries form identical execution plans
-- and take the same time to execute
SELECT t1.RowNum
FROM dbo.JBMTest t1
WHERE t1.RowNum NOT IN (SELECT RowNum FROM JBMTest2)
SELECT t1.RowNum
FROM dbo.JBMTest t1
WHERE NOT EXISTS (SELECT 1 FROM JBMTest2 WHERE RowNum = t1.RowNum)
--===== This query takes just slightly LONGER
SELECT t1.RowNum
FROM dbo.JBMTest t1
LEFT OUTER JOIN
dbo.JBMTest2 t2
ON t1.RowNum = t2.RowNum
WHERE t2.RowNum IS NULL
--Jeff Moden
Change is inevitable... Change for the better is not.
August 11, 2007 at 9:14 pm
Results please?
August 11, 2007 at 11:54 pm
Heh... gettin' lazy in your old age, Remi All times in milliseconds to return 999015 rows in the grid mode of Query Analyzer...
5941 NOT IN
5994 NOT EXISTS
6175 OUTER JOIN WITH NULL DETECT
--Jeff Moden
Change is inevitable... Change for the better is not.
August 12, 2007 at 2:32 am
9096 NOT IN
9865 NOT EXISTS
8954 OUTER JOIN WITH NULL DETECT
9816 EXCEPT in SQL 2005
Just in case,
SELECT RowNum
FROM dbo.JBMTest
EXCEPT
SELECT RowNUM
from dbo.JBMTest2
And I always thought EXCEPT is better than NOT IN or NOT EXISTS or LEFT OUTER JOIN (Profiler is your best friend) everyday is a new day at this forum.
August 12, 2007 at 2:33 am
I understand what you mean, but as you also know it's not possible to change DB design very often.
August 12, 2007 at 5:46 am
You know that results change from PC to PC in that kind of test... and since you already have them printed on your screen, I think it's a farely easy step to just copy/paste the results to us as well as the test .
As for the laziness... let's just say I'm busy these days (which is a great thing btw) .
August 12, 2007 at 9:43 am
Heh... yeah, I know... thanks for the reminder to post the results (it's true, I forget a lot)... you're lazy and I'm forgetful... we make a hell of a team
"Jack Sprat could eat no fat,
his wife could eat no lean,
so between the two, when they ate,
they licked the platter clean."
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 16 through 30 (of 34 total)
You must be logged in to reply to this topic. Login to reply