Where exists vs Where in

  • If I understand it correctly, "exists" stops evaluating once it hits the first true whereas "in" keeps going. To me, this makes "exists" the obvious choice between the 2 but I still see queries using both and I'm not sure why.

    Should there be a fast and simple rule for using one over the other or does it depend on the data in the table or the query being run?

    Thanks,

    Mark

  • This blog post illustrates one big difference: SQL Server: JOIN vs IN vs EXISTS - the logical difference



    Rick Krueger

    Follow @dataogre

  • Mark Eckeard (7/1/2012)


    If I understand it correctly, "exists" stops evaluating once it hits the first true whereas "in" keeps going. To me, this makes "exists" the obvious choice between the 2 but I still see queries using both and I'm not sure why.

    No this is not always true for every case.

    Please have a look on below article by Gilla Monster.

    http://sqlinthewild.co.za/index.php/2010/04/27/in-exists-and-join-a-roundup/">

    http://sqlinthewild.co.za/index.php/2010/04/27/in-exists-and-join-a-roundup/

    --rhythmk
    ------------------------------------------------------------------
    To post your question use below link

    https://www.sqlservercentral.com/articles/forum-etiquette-how-to-post-datacode-on-a-forum-to-get-the-best-help
    🙂

  • Also have a look on below article.

    http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/">

    http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/

    --rhythmk
    ------------------------------------------------------------------
    To post your question use below link

    https://www.sqlservercentral.com/articles/forum-etiquette-how-to-post-datacode-on-a-forum-to-get-the-best-help
    🙂

  • That link appears to be broken or incomplete.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/

    http://sqlinthewild.co.za/index.php/2010/04/27/in-exists-and-join-a-roundup/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Mark Eckeard (7/1/2012)


    If I understand it correctly, "exists" stops evaluating once it hits the first true whereas "in" keeps going. To me, this makes "exists" the obvious choice between the 2 but I still see queries using both and I'm not sure why.

    Should there be a fast and simple rule for using one over the other or does it depend on the data in the table or the query being run?

    Thanks,

    Mark

    Keeping in mind that INNER JOIN will allow dupes to occur and the other two will not (and a couple of other differences highlighted in Gail's article), if you're looking for a speed advantage of WHERE EXISTS compared to WHERE IN, there really isn't one. Here's a test...

    -----------------------------------------------------------------------------------------------------------------------

    IF OBJECT_ID('tempdb.dbo.JBMTest','U') IS NOT NULL DROP TABLE dbo.JBMTest;

    IF OBJECT_ID('tempdb..#MyInTable','U') IS NOT NULL DROP TABLE #MyInTable;

    GO

    RAISERROR('Creating and populating the test table...',0,1) WITH NOWAIT;

    --===== Create and populate a 1,000,000 row test table.

    -- Column "RowNum" has a range of 1 to 1,000,000 unique numbers

    -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

    -- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings

    -- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers

    -- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times

    -- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'

    -- for all rows.

    -- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)

    -- Jeff Moden

    SELECT TOP 1000000

    RowNum = IDENTITY(INT,1,1),

    SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,

    SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65),

    SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

    SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

    SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

    SomeHex12 = RIGHT(NEWID(),12)

    INTO dbo.JBMTest

    FROM sys.all_columns t1

    CROSS JOIN sys.all_columns t2

    RAISERROR('Adding the PK...',0,1) WITH NOWAIT;

    --===== A table is not properly formed unless a unique clustered index has been assigned

    ALTER TABLE dbo.JBMTest

    ADD PRIMARY KEY CLUSTERED (RowNum)

    RAISERROR('Creating the WHERE IN table...',0,1) WITH NOWAIT;

    --===== Create a table with some "WHERE IN" values

    SELECT TOP 1000

    IDENTITY(INT,1,1) AS RowNum,

    ABS(CHECKSUM(NEWID()))%50000+1 AS SomeInt

    INTO #MyInTable

    FROM sys.all_columns t1

    RAISERROR('Adding another index to the test table...',0,1) WITH NOWAIT;

    ----===== Add an index to the larger table

    CREATE INDEX IX_JBMTest_SomeInt

    ON dbo.JBMTest (SomeInt)

    ----===== A table is not properly formed unless a Primary Key has been assigned

    -- ALTER TABLE #MyInTable

    -- ADD PRIMARY KEY CLUSTERED (RowNum)

    ----===== Add an index to it

    -- CREATE INDEX IX_MyInTable_SomeInt

    -- ON #MyInTable (SomeInt)

    -----------------------------------------------------------------------------------------------------------------------

    IF OBJECT_ID('tempdb..#MyHead1','U') IS NOT NULL DROP TABLE #MyHead1;

    IF OBJECT_ID('tempdb..#MyHead2','U') IS NOT NULL DROP TABLE #MyHead2;

    IF OBJECT_ID('tempdb..#MyHead3','U') IS NOT NULL DROP TABLE #MyHead3;

    --===== THE TEST CODE

    RAISERROR('========== INNER JOIN ==================================================',0,1) WITH NOWAIT;

    SET STATISTICS TIME ON

    SELECT DISTINCT t.*

    INTO #MyHead1

    FROM dbo.JBMTest t

    INNER JOIN #MyInTable i

    ON t.SomeInt = i.SomeInt

    SET STATISTICS TIME OFF

    GO

    RAISERROR('========== WHERE IN ==================================================',0,1) WITH NOWAIT;

    SET STATISTICS TIME ON

    SELECT t.*

    INTO #MyHead2

    FROM dbo.JBMTest t

    WHERE t.SomeInt IN (SELECT SomeInt FROM #MyInTable)

    SET STATISTICS TIME OFF

    GO

    RAISERROR('========== WHERE EXISTS ==================================================',0,1) WITH NOWAIT;

    SET STATISTICS TIME ON

    SELECT t.*

    INTO #MyHead3

    FROM dbo.JBMTest t

    WHERE EXISTS (SELECT 1 FROM #MyInTable i WHERE t.SomeInt = i.SomeInt)

    SET STATISTICS TIME OFF

    GO

    -----------------------------------------------------------------------------------------------------------------------

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for all the links. I'll read them later this evening when I get home.

    Thanks,

    Mark

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply