'EXISTS' vs 'IN'

  • Hi all,

    :).I am confused with the usuage of 'EXISTS' and 'IN' in my queries.can anyone help me in this regard.

    Thanks

    Kiran

  • Yes... don't use either... use an INNER JOIN instead.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Consider i have two tables University(parent table) and Branch(child table) The primarty key for university table is id ,the foriegn key for the branch table is uid

    i have a query like this

    A:select b.name from branch b inner join university u on u.id = b.uid

    rewriting the above query as

    B:select b.name from branch where b.uid exists( select u.id from university where u.id = b.uid)

    If i run both queries in toad and observe the explain plan the second query B is performing better than query A.

    so how can i decide where to use joins and where to use exists.

    Kiran

  • Hmm, I just posted something like this yesterday...:


    It is important to realize that there are several types of Subquery:

    ...

    2. IN subqueries: These return any number of rows, each with a single column and they look like this:WHERE value IN(Select col1 From ... ) They are typically used in WHERE clauses to test if a particular value is in the list of values returned.

    3. EXISTS subqueries: These return any number of rows with any number of columns and they look like this:WHERE EXISTS(Select * From ... ) They are typically used in WHERE clauses to test if corresponding rows in another table exist. Because the column values returned are not actually used, typicaly you use a "*" instead of bothering to name any columns.

    ...


    And as far as I'm concerned, IN and EXISTS OK to use. They are just as fast as Joins when used correctly and greatly aid code clarity.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • rbarryyoung (4/1/2008)


    And as far as I'm concerned, IN and EXISTS OK to use. They are just as fast as Joins when used correctly and greatly aid code clarity.

    WHERE IN is actually converted to an INNER JOIN by the optimizer and is treated exactly the same by the optimizer. Left to their own devices, I've found that Developers will improperly write triple and quadruple nested joins if allowed to use WHERE IN on a regular basis because it IS easy for them to read along with the resulting severe performance problems.

    WHERE EXISTS is not... WHERE EXISTS creates a form of RBAR known as a CORRELATED SUB-QUERY. If written correctly, it's also quite fast but I don't allow people to use it in my shop because it makes the Execution Plan look especially effective (displays plan for first row only) when it's not (there are the rare exceptions in 2k but never needed in 2k5). Developers see that and start using it everywhere for every inner join.

    I agree with Barry that WHERE IN is an easy read... but for the reasons above, I don't allow them in my shop (Yes, we code review everything).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes... don't use either... use an INNER JOIN instead

    Not always. The matching values resulting from an INNER JOIN can be more than one (1). This will result in as many rows as there are matches. With an IN or EXISTS, only a single row will be returned per the match regardless of the number of values in the IN list (explicit or via a query) or that are present in the sub-table (EXISTS).

    So there are three (3) different ways to perform matches -- INNER JOIN, IN, and EXISTS. Each query needs to be carefully examined for the proper use. I.e., the proper tool for the job...


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • I agree with Jeff on this one. I've had to go through 2 systems the used the WHERE IN syntax and replace them with an INNER JOIN. These were for reports that had to go through lots and lots of data. The performance increase was dramatic when using the INNER JOIN style.

    When there were possibilities of multiple join conditions I used an INNER JOIN to a derived table that was either an aggregate or I used a DISTINCT in the SELECT for the derived table. Both of these were faster than WHERE IN in actual practice.

    Todd Fifield

  • Jeff Moden (3/31/2008)


    Yes... don't use either... use an INNER JOIN instead.

    hi jeff

    your point is very usefull

    as per my experience IN clause is usefull whene check Listed values

    and EXISTS clause is for checking record exist in the table

    INNER JOIN is very handy and powerfull

    thanks

    shamsudheen

  • Here's another good reason why not to use IN:

    Run this on the AdventureWorks database in SQL 2005:

    begin tran

    delete Sales.SalesOrderDetail where SalesOrderId in (select SalesOrderId from Doctor)

    rollback tran

    Whoops! You've just deleted 120,000 SalesOrders. Hopefully you included the "rollback tran" in that statement you just ran. Now consider the fact that the table Doctor has no column named SalesOrderId. If you run the inner query by itself it generates an error, but if you run the statement as a whole it deletes every row in SalesOrderDetail.

    All it takes is for a developer to accidentally type in the wrong table name in the inner query and you could end up in serious trouble.

  • nice example

    shamsudheen

  • JohnG (4/2/2008)


    Yes... don't use either... use an INNER JOIN instead

    Not always. The matching values resulting from an INNER JOIN can be more than one (1). This will result in as many rows as there are matches. With an IN or EXISTS, only a single row will be returned per the match regardless of the number of values in the IN list (explicit or via a query) or that are present in the sub-table (EXISTS).

    So there are three (3) different ways to perform matches -- INNER JOIN, IN, and EXISTS. Each query needs to be carefully examined for the proper use. I.e., the proper tool for the job...

    Thanks for the reminder (I forgot :blush:), John... you're correct... WHERE IN and WHERE EXISTS acts like a SELECT DISTINCT on the joined table.

    The WHERE EXISTS has some benefit of usage like finding the TOP 2 of every group... but it does have the disadvantage of being RBAR in nature. Not many ways around that in 2k for things like the TOP 2 of each group but plenty of set based solutions in 2k5 for the same thing that are setbased. Both should be evaluated for performance in 2k5 when trying to decide which to use.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Here's another good reason why not to use IN:

    Run this on the AdventureWorks database in SQL 2005:

    begin tran

    delete Sales.SalesOrderDetail where SalesOrderId in (select SalesOrderId from Doctor)

    rollback tran

    Whoops! You've just deleted 120,000 SalesOrders. Hopefully you included the "rollback tran" in that statement you just ran. Now consider the fact that the table Doctor has no column named SalesOrderId. If you run the inner query by itself it generates an error, but if you run the statement as a whole it deletes every row in SalesOrderDetail.

    I got caught with this in a trigger referencing the deleted table. It was a syntax error that SQL Server should have caught and thrown an error. This is just a BAD BUG in SQL Server. Shame on you Microsoft!

    That is why you should always qualify every column with the table name (or an alias).


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • JohnG (4/3/2008)


    I got caught with this in a trigger referencing the deleted table. It was a syntax error that SQL Server should have caught and thrown an error. This is just a BAD BUG in SQL Server. Shame on you Microsoft!

    That is why you should always qualify every column with the table name (or an alias).

    Actually this isn't a SQL Server issue rather than a SQL issue. I can't remember where I saw this but I remember reading about it and how it was a "feature" of SQL and how you'd get this same behaviour on Oracle or other RDMBS's.

    I never went out to verify this so I might be wrong.

    Can anyone confirm whether this is the case?

  • Actually this isn't a SQL Server issue rather than a SQL issue. I can't remember where I saw this but I remember reading about it and how it was a "feature" of SQL and how you'd get this same behaviour on Oracle or other RDMBS's.

    Confirmed.

    Oracle exhibits the exact same behavior as per the example below. Note that the "users" table does not have a column named "content_id". However, if the column is properly qualified, it does throw an error.

    SQL*Plus: Release 9.2.0.7.0 - Production on Thu Apr 3 09:13:08 2008

    Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

    SQL> SELECT content_id FROM users;

    SELECT content_id FROM users

    *

    ERROR at line 1:

    ORA-00904: "CONTENT_ID": invalid identifier

    SQL> DELETE FROM content WHERE content_id IN (SELECT content_id FROM users);

    0 rows deleted.

    SQL> DELETE FROM content WHERE content_id IN (SELECT x.content_id FROM users x);

    DELETE FROM content WHERE content_id IN (SELECT x.content_id FROM users x)

    *

    ERROR at line 1:

    ORA-00904: "X"."CONTENT_ID": invalid identifier

    SQL>

    However, notice that the following DOES throw an error:

    SQL> DELETE FROM content WHERE content_id IN (SELECT foobar FROM users);

    DELETE FROM content WHERE content_id IN (SELECT foobar FROM users)

    *

    ERROR at line 1:

    ORA-00904: "FOOBAR": invalid identifier

    SQL>

    So the issue is not the IN syntax per se, but one of "scope" as it pertains to the column names being referenced.


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply