EXISTS vs IN vs .... COUNT?!

  • Hi,

    I made the following test since some of our developers where using the COUNT similar to EXISTS:

    -- criar a tabela e inserir 10000 linhas

    CREATE TABLE seats (Id INT IDENTITY(1,1), Seat INT NOT NULL, Reserved TINYINT NOT NULL DEFAULT 0, CONSTRAINT pk_seats PRIMARY KEY (Id))

    CREATE TABLE seats2 (ID2 INT IDENTITY(1,1), SeatH INT NOT NULL, CONSTRAINT pk_seats2 PRIMARY KEY (Id2))

    DECLARE @i INT = 1

    WHILE @i <= 10000

    BEGIN

    INSERT INTO seats (seat) VALUES (@i)

    SET @i = @i+1

    END

    SET @i = 0

    WHILE @i <= 10000

    BEGIN

    INSERT INTO seats2 (seath) VALUES (@i)

    SET @i = @i+1

    END

    SELECT * FROM Seats s WHERE (SELECT COUNT(1) FROM Seats2 s2 WHERE s.id = s2.id2) > 0

    SELECT * FROM Seats s WHERE EXISTS (SELECT 1 FROM Seats2 s2 WHERE s.id = s2.id2)

    SELECT * FROM Seats s WHERE s.ID IN (SELECT Id2 FROM Seats2 s2)

    They all have the same execution plans, statistics IO, ....

    IN can not be used if you what to compare more than one column, EXISTS does, but also does COUNT(...).

    Is it Ok to use the COUNT as a "replacement" for EXISTS in these cases or should EXISTS always be used?!

    Thanks,

    Pedro



    If you need to work better, try working less...

  • Personally I would stick with the Exists solution as IMHO its far easiernd understand compared with the Count(*) method.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • Jason-299789 (1/8/2013)


    Personally I would stick with the Exists solution as IMHO its far easiernd understand compared with the Count(*) method.

    yep.. it's more ANSI Standard but never the less a surprise for me knowing this...

    I've read lots of post comparing IN vs EXISTS vs INNER JOIN but none mentioned the COUNT(1) trick..

    Pedro



    If you need to work better, try working less...

  • Since you are using a correlated subquery why bother with any of those approaches?

    This returns the same thing.

    select *

    from seats s

    join seats2 s2 on s.ID = s2.id2

    And FWIW, you should start using a tally table for generating data like that. It will make your life a lot easier if you learn how to use it. Using a tally table to generate the insert is insanely fast compared to the looping method. The loops you posted took nearly 14 seconds on my machine. Using the queries below it didn't even register a time in the status bar in SSMS.

    insert seats (seat)

    select * from Tally where N <= 10000

    insert seats2 (seath)

    select * from Tally where N <= 10000

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (1/8/2013)


    Since you are using a correlated subquery why bother with any of those approaches?

    This returns the same thing.

    select *

    from seats s

    join seats2 s2 on s.ID = s2.id2

    Cause, as it says in http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/:

    If all you need is to check for matching rows in the other table but don’t need any columns from that table, use IN. If you do need columns from the second table, use Inner Join.

    And if you need to compare more than one columns use EXISTS instead of IN.

    JOIN should only be used if any field from the 2nd table is needed... and if the 2nd table has more than one row per each from the 1st table the row will repeat, with EXISTS or IN it doesn't..

    Pedro



    If you need to work better, try working less...

  • PiMané (1/8/2013)


    Sean Lange (1/8/2013)


    Since you are using a correlated subquery why bother with any of those approaches?

    This returns the same thing.

    select *

    from seats s

    join seats2 s2 on s.ID = s2.id2

    Cause, as it says in http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/:

    If all you need is to check for matching rows in the other table but don’t need any columns from that table, use IN. If you do need columns from the second table, use Inner Join.

    And if you need to compare more than one columns use EXISTS instead of IN.

    JOIN should only be used if any field from the 2nd table is needed... and if the 2nd table has more than one row per each from the 1st table the row will repeat, with EXISTS or IN it doesn't..

    Pedro

    Good point, didn't realize you weren't using them. Back to get another cup of coffee now. :blush:

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I've seen Count() kill a server where Exists() was fine.

    A code-push was done on a Tuesday night. Everything went fine. Monday morning, however, the websites wouldn't come up, and the database servers were all completely slammed, to the point of crashing to the passive node in one case.

    Followed the bug down, and it was a Count() > 0 check that was being run hundreds of times per second by the new code. (Turns out, QA tested the pages, but didn't load-test any of it.) I converted that to an Exists() check, on the fly, in production (don't do this), and immediately the database servers went down to about 20% load, with no other change in anything.

    That was on big, heavy-traffic tables. Tens of millions of rows, that kind of thing.

    Count() has to do more work, and requires more aggressive locking.

    Think of it this way:

    How fast can you find out if there's anyone named George in the phonebook?

    vs

    How long would it take you to count how many people named George there are in the exact same phonebook?

    One, you do a quick scan till you find a George. The other, you need to sit down and keep a tally of every row. Plus you need to make sure there are no new editions (locking) while you're doing your count.

    I believe SQL Server will sometimes shortcut a "Where (select count(*) blah blah blah) > 1" into an Exists() check. But don't count on the optimizer being smarter than you in that kind of way.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Sean Lange (1/8/2013)


    Good point, didn't realize you weren't using them. Back to get another cup of coffee now. :blush:

    LOL 🙂

    I'm on my 4th coffee today.... and still need some more caffeine.. probably a cappuccino 🙂



    If you need to work better, try working less...

  • GSquared (1/8/2013)


    How fast can you find out if there's anyone named George in the phonebook?

    vs

    How long would it take you to count how many people named George there are in the exact same phonebook?

    One, you do a quick scan till you find a George. The other, you need to sit down and keep a tally of every row. Plus you need to make sure there are no new editions (locking) while you're doing your count.

    I believe SQL Server will sometimes shortcut a "Where (select count(*) blah blah blah) > 1" into an Exists() check. But don't count on the optimizer being smarter than you in that kind of way.

    Exactly... then the DEV department will ear from me 🙂

    Just like the IF EXISTS (SELECT TOP 1 ... FROM)... useless 5 chars (TOP 1) 🙂

    Thanks,

    Pedro



    If you need to work better, try working less...

  • GSquared (1/8/2013)


    I believe SQL Server will sometimes shortcut a "Where (select count(*) blah blah blah) > 1" into an Exists() check. But don't count on the optimizer being smarter than you in that kind of way.

    It seems to have an optimisation for COUNT(*) ... > 0, but that's a special case and just for 0, and doesn't work always.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • So, just to be on the safe side AVOID COUNT() and use EXISTS...

    Thanks,

    Pedro



    If you need to work better, try working less...

  • GilaMonster (1/8/2013)


    GSquared (1/8/2013)


    I believe SQL Server will sometimes shortcut a "Where (select count(*) blah blah blah) > 1" into an Exists() check. But don't count on the optimizer being smarter than you in that kind of way.

    It seems to have an optimisation for COUNT(*) ... > 0, but that's a special case and just for 0, and doesn't work always.

    Typo on my part. Started to type ">= 1", changed my mind to "> 0", ended up with half of each, which is just plain wrong. 🙂

    Yeah, > 0 sometimes gets shortcut. Sometimes. Not reliably.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply