Which way is better? To join or not to join or does it matter?

  • I'm writing a script that will probably join a bit more then 20 tables.

    It's my understanding that joins are costly.

    Would it cost less to write certain parts like my sample2 below?

    I'm only returning that 1 field from Table2.

    All ID fields are unique. I read about this a while back but can't remember anything about it.

    Sample1

    select t2.MyName PersonName

    , t1.*

    from Table1 t1

    inner join Table2 t2

    on t1.SalesPersonID = t2.ID

    where t1.ID= 12345

    Sample2

    select (select t2.MyName from Table2 t2 where t2.ID = t1.SalesPersonID) PersonName

    , t1.*

    from Table1 t1

    where frqr.tril_gid = 12345

    Thanks a bunch.

    ---------------------------------------------------------------
    Mike Hahn - MCSomething someday:-)
    Right way to ask for help!!
    http://www.sqlservercentral.com/articles/Best+Practices/61537/
    I post so I can see my avatar :hehe:
    I want a personal webpage 😎
    I want to win the lotto 😀
    I want a gf like Tiffa :w00t: Oh wait I'm married!:-D

  • My $.02.

    If your tables are modeled and indexed correctly then joins are a necessary evil.

    Your sample 2 uses a correlated subquery, and I believe this is more costly than a join.

  • Want a cool sig (10/25/2012)


    I'm writing a script that will probably join a bit more then 20 tables.

    It's my understanding that joins are costly.

    Would it cost less to write certain parts like my sample2 below?

    I'm only returning that 1 field from Table2.

    All ID fields are unique. I read about this a while back but can't remember anything about it.

    Sample1

    select t2.MyName PersonName

    , t1.*

    from Table1 t1

    inner join Table2 t2

    on t1.SalesPersonID = t2.ID

    where t1.ID= 12345

    Sample2

    select (select t2.MyName from Table2 t2 where t2.ID = t1.SalesPersonID) PersonName

    , t1.*

    from Table1 t1

    where frqr.tril_gid = 12345

    Thanks a bunch.

    its worth testing but in my experience joins are faster than correlated sub queries (however not all correlated sub queries can be rewritten as joins). reason being the sub query has to be run for every row instead of being able to use a join to get the same results in one pass. in our production code we try to eliminate sub queries because of this added cost. if we can join we join. if its a big long complicated query we test both ways in our dev environment and use which ever is faster. but the key there is we test test and test again.


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

  • Thanks a bunch. Correlated sub queries. Forgot what it was called. Forgot what was said about it. Been forgetting a lot of things. Must be getting old. :doze:

    The DB is not designed properly. The app is OO VB6 that allowed the users to create DB objects (table/column) etc from the app interface (pretty much any user can add an object--column/table).

    Basically forcing an OO model on a Relational DB. It's a nightmare. 200+ columns when only 25 related objects exists. Not to mention that the same data can potentially be written across 30+ different tables. Makes reporting fun. 🙂

    ---------------------------------------------------------------
    Mike Hahn - MCSomething someday:-)
    Right way to ask for help!!
    http://www.sqlservercentral.com/articles/Best+Practices/61537/
    I post so I can see my avatar :hehe:
    I want a personal webpage 😎
    I want to win the lotto 😀
    I want a gf like Tiffa :w00t: Oh wait I'm married!:-D

  • Correlated subqueries do not run per row of the outer query (unless triangular joins or top 1 ... order by).

    Correlated subqueries are not usually more expensive than joins, they usually get compiled to exactly the same query plan.

    Joins are not expensive operations to be avoided

    Unless joins have been tested and are unacceptably slow, a more complex alternative should not be considered.

    Write queries the simplest way, test performance, if unacceptable then consider complex rewrites

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (10/25/2012)


    Correlated subqueries do not run per row of the outer query (unless triangular joins or top 1 ... order by).

    Correlated subqueries are not usually more expensive than joins, they usually get compiled to exactly the same query plan.

    Joins are not expensive operations to be avoided

    Unless joins have been tested and are unacceptably slow, a more complex alternative should not be considered.

    Write queries the simplest way, test performance, if unacceptable then consider complex rewrites

    looks like im off to hit the books on correlated subqueries now. then go yell at the guy i heard it from.


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

  • looks like im off to hit the books on correlated subqueries now. then go yell at the guy i heard it from.

    +1 to that...

    Could've swore I've read in the official MCTS exam book that correlated subqueries are performed on a row by row basis if used in the SELECT, and perform exactly like a join if used in the EXISTS part of a WHERE clause.

    Update: Iv actually looked it up and tested it, the book wrongly says:

    A correlated subquery is typically equal in performance compared to using a JOIN when used in combination with the EXISTS operator to filter the outer query... While this type of correlated subquery is typically a good implementation, the use of correlated subqueries in the SELECT clause often has a negative effect on performance compared to JOINs. Of course, this depends on the number of rows returned by the outer query. If a large number of rows are returned, each query in the SELECT clause would be executed for each row

    Damn, can't even trust books anymore. Thanks for clearing things up Gail

    -----------------
    ... Then again, I could be totally wrong! Check the answer.
    Check out posting guidelines here for faster more precise answers[/url].

    I believe in Codd
    ... and Thinknook is my Chamber of Understanding

  • That's doubly wrong, because EXISTS with correlated subqueries does not perform equally with joins, it's usually very, very slightly better (and actually they're not even equivalent query forms)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (10/26/2012)


    That's doubly wrong, because EXISTS with correlated subqueries does not perform equally with joins, it's usually very, very slightly better (and actually they're not even equivalent query forms)

    Agree with Gail on the fact that EXISTS with correlated subqueries is not equivalent to JOIN's. However in terms of performance - it depends. From my experience I wouldn't say that usually one is better than another. I've seen all three permutations of EXISTS vs OUTER JOIN's .

    I do usually test both where appropriate.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • GilaMonster (10/26/2012)


    That's doubly wrong, because EXISTS with correlated subqueries does not perform equally with joins, it's usually very, very slightly better (and actually they're not even equivalent query forms)

    Good learning for me.. glad i spent one hour after my hectic office hours at Home ..thanks gail :-). can you please put some light on its explanation

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • GilaMonster (10/25/2012)


    Correlated subqueries do not run per row of the outer query (unless triangular joins or top 1 ... order by).

    One more case when optimizer is not able to decorrelate subquery is an assert impact.

    use tempdb;

    go

    create table t1 (a int);

    create table t2 (a int);

    go

    insert t1 values (1),(2),(3),(4),(5);

    insert t2 values (1),(2),(3),(4),(5);

    go

    set statistics profile on

    go

    select

    a,

    other_a = (select a from t2 where t1.a = t2.a)

    from

    t1

    ;

    go

    set statistics profile off

    go

    drop table t1,t2;

    51select a, other_a = (select a from t2 where t1.a = t2.a) from t1 ;

    00 |--Compute Scalar(DEFINE:([Expr1009]=[Expr1011]))

    51 |--Nested Loops(Left Outer Join, OUTER REFERENCES:([tempdb].[dbo].[t1].[a]))

    51 |--Table Scan(OBJECT:([tempdb].[dbo].[t1]))

    55 |--Assert(WHERE:(CASE WHEN [Expr1010]>(1) THEN (0) ELSE NULL END))

    55 |--Stream Aggregate(DEFINE:([Expr1010]=Count(*), [Expr1011]=ANY([tempdb].[dbo].[t2].[a])))

    55 |--Table Scan(OBJECT:([tempdb].[dbo].[t2]), WHERE:([tempdb].[dbo].[t1].[a]=[tempdb].[dbo].[t2].[a]))

    upd. fixed formatting.


    I am really sorry for my poor gramma. And I hope that value of my answers will outweigh the harm for your eyes.
    Blog: http://somewheresomehow.ru[/url]
    Twitter: @SomewereSomehow

  • It still showing sad smiles after fixing and putting plan text into |code| tag instead of |quote|.

    Yes, that is sad query =))


    I am really sorry for my poor gramma. And I hope that value of my answers will outweigh the harm for your eyes.
    Blog: http://somewheresomehow.ru[/url]
    Twitter: @SomewereSomehow

  • One more case when optimizer is not able to decorrelate subquery is an assert impact.

    Interesting! so what is a scenario when the optimizer compiles both a correlated subquery query and a join query to the same query plan?, Am trying to come up with one but not managing, even if the query plan looks "almost" the same, SQL is using different join operators (for example Nested Loop for correlated subquery vs Hash Match for join).

    />L

    -----------------
    ... Then again, I could be totally wrong! Check the answer.
    Check out posting guidelines here for faster more precise answers[/url].

    I believe in Codd
    ... and Thinknook is my Chamber of Understanding

  • SELECT sod.SalesOrderID ,

    SalesOrderDetailID ,

    AccountNumber = (SELECT AccountNumber FROM Sales.SalesOrderHeader soh WHERE soh.SalesOrderID = sod.SalesOrderID)

    FROM Sales.SalesOrderDetail AS sod

    SELECT sod.SalesOrderID ,

    SalesOrderDetailID ,

    AccountNumber

    FROM Sales.SalesOrderDetail AS sod LEFT OUTER JOIN Sales.SalesOrderHeader soh ON soh.SalesOrderID = sod.SalesOrderID

    Near-identical plans (an extra compute scalar in the first)

    Or take SomewhereSomehow's example and add a unique constraint (or primary key) to the second table. If SQL knows the column is unique, it doesn't need an assert and hence doesn't need the nested loop join but can use other joins as it likes.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply