March 3, 2005 at 11:52 am
This will in many cases be the situation. However there can be various factors that can effect the performance. Sometimes you have to try it several ways to get the best option and as the system grows this can even change.
March 3, 2005 at 12:13 pm
1.First and foremost INDEXES DO MATTER!!!
2.Second, the number of rows must not be trivial
3.And Last but not least the HW in some cases plays a major role.
In summary you could forget about HW for this one but point 1 and 2 can not be taken lightly.
HTH
* Noel
March 3, 2005 at 12:21 pm
Yes, I agree.
Regarding point 1: I understand that indexes will affect the access plan and execution time... I intentionally left off any indexes because I wanted both to table scan. Essentially, I was trying to illustrate for myself the true value of the boolean return on "EXISTS" versus the set/subquery of "IN". I was primarly surprised that EXISTS took longer during several query runs.
Point 2: I'm not sure what you mean. Are you saying I need a larger table to see the performance return on EXISTS?
Point 3: I have heard that the optimizer will sometimes choose different access plans for same structure databases that reside on different servers. I haven't had much time to play with that though...
March 3, 2005 at 12:37 pm
>>Point 2: I'm not sure what you mean. Are you saying I need a larger table to see the performance return on EXISTS<<
ABSOLUTELY! if the number of rows is small enough so that a table scan is always the only choice probably IN will be better
NOW, Set up an index and populate the table with a decent amount of rows 10k+ for low end HW 100K+ for mid 100mil + for High end
Now compare: join, in and exists
BTW - This numbers change every day with technology improvements -
HTH
* Noel
March 3, 2005 at 12:49 pm
OK, I'm seeing a small improvement for "EXISTS" now. FYI, the distinct keyword was used below because too much time was lost to the network in transporting 500,000 records to my laptop.
CREATE TABLE test (ID integer, SomeChar varchar(10), SomeOtherChar varchar(10), SomeInteger integer)
GO
CREATE CLUSTERED INDEX [IX_test_ID_Clustered] ON [dbo].[test]([ID]) WITH FILLFACTOR = 90
GO
DECLARE @intCount integer
set @intCount = 0
WHILE @intCount < 100000
BEGIN
INSERT INTO TEST VALUES (100, 'CharString', 'Whatever' ,3200)
INSERT INTO TEST VALUES (200, 'SomeString', 'Blah', 3600)
INSERT INTO TEST VALUES (300, 'AnotherStr', 'BlahBlah', 3800)
INSERT INTO TEST VALUES (400, 'Whatever', 'character', 4000)
INSERT INTO TEST VALUES (500, 'finstring', 'final', 4200)
set @intCount=@intCount+1
END
SELECT distinct ID FROM test t1 where ID IN ( Select ID FROM test )
--SQL:StmtCompleted 59 49 170 0
-- Cumulative client processing time 508 520.333
-- Cumulative wait time on server replies 27492 7.42505e+006
SELECT distinct ID FROM test t1 where EXISTS ( SELECT * FROM test t2 WHERE t2.ID = t1.ID )
--SQL:StmtCompleted 49 39 170 0
--Cumulative client processing time 559 530
--Cumulative wait time on server replies 48045 5.5808e+006
March 3, 2005 at 1:06 pm
OK,
There is a problem here also in the data used to simulate this. If you use what you posted, the cardinality of the index will be so low that I doubt that that index is even usable I don't have a sql machine now but I am positive that is barely usefull. If you want to simulate this think in terms of Orders and OrderDetails and that's a typical scenario for this cases. Remember knowing about your data is also a valuable tool
HTH
* Noel
Viewing 6 posts - 16 through 20 (of 20 total)
You must be logged in to reply to this topic. Login to reply