Just a thought: Which statement is faster?

  • This will in many cases be the situation. However there can be various factors that can effect the performance. Sometimes you have to try it several ways to get the best option and as the system grows this can even change.

  • 1.First and foremost INDEXES DO MATTER!!!

    2.Second, the number of rows must not be trivial

    3.And Last but not least the HW in some cases plays a major role.

    In summary you could forget about HW for this one but point 1 and 2 can not be taken lightly.

    HTH


    * Noel

  • Yes, I agree. 

    Regarding point 1:  I understand that indexes will affect the access plan and execution time...  I intentionally left off any indexes because I wanted both to table scan.  Essentially, I was trying to illustrate for myself the true value of the boolean return on "EXISTS" versus the set/subquery of "IN".  I was primarly surprised that EXISTS took longer during several query runs.

    Point 2:  I'm not sure what you mean.  Are you saying I need a larger table to see the performance return on EXISTS?

    Point 3:  I have heard that the optimizer will sometimes choose different access plans for same structure databases that reside on different servers.  I haven't had much time to play with that though...

  •  

    >>Point 2:  I'm not sure what you mean.  Are you saying I need a larger table to see the performance return on EXISTS<<

    ABSOLUTELY! if the number of rows is small enough so that a table scan is always the only choice probably IN will be better

    NOW, Set up an index and populate the table with a decent amount of rows 10k+ for low end HW  100K+ for mid 100mil + for High end

    Now compare: join, in and exists

    BTW - This numbers change every day with technology improvements -

     

    HTH

     


    * Noel

  • OK, I'm seeing a small improvement for "EXISTS" now.  FYI, the distinct keyword was used below because too much time was lost to the network in transporting 500,000 records to my laptop.

    CREATE TABLE test (ID integer, SomeChar varchar(10), SomeOtherChar varchar(10), SomeInteger integer)

    GO

    CREATE  CLUSTERED  INDEX [IX_test_ID_Clustered] ON [dbo].[test]([ID]) WITH  FILLFACTOR = 90

    GO

    DECLARE @intCount integer

    set @intCount = 0

    WHILE @intCount < 100000

     BEGIN

      INSERT INTO TEST VALUES (100, 'CharString', 'Whatever' ,3200)

      INSERT INTO TEST VALUES (200, 'SomeString', 'Blah', 3600)

      INSERT INTO TEST VALUES (300, 'AnotherStr', 'BlahBlah', 3800)

      INSERT INTO TEST VALUES (400, 'Whatever',   'character', 4000)

      INSERT INTO TEST VALUES (500, 'finstring',  'final', 4200)

     

      set @intCount=@intCount+1

     END

    SELECT distinct ID FROM test t1 where ID IN ( Select ID FROM test )

    --SQL:StmtCompleted 59 49 170 0

    -- Cumulative client processing time 508 520.333

    -- Cumulative wait time on server replies 27492 7.42505e+006

    SELECT distinct ID FROM test t1 where EXISTS ( SELECT * FROM test t2 WHERE t2.ID = t1.ID )

    --SQL:StmtCompleted 49 39 170 0

    --Cumulative client processing time 559 530

    --Cumulative wait time on server replies 48045 5.5808e+006 

  • OK,

    There is a problem here also in the data used to simulate this. If you use what you posted, the cardinality of the index will be so low that I doubt that that index is even usable I don't have a sql machine now but I am positive that is barely usefull. If you want to simulate this think in terms of Orders and OrderDetails and that's a typical scenario for this cases. Remember knowing about your data is also a valuable tool

    HTH


    * Noel

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply