SQL Execution Order with multiple Joins

  • Hi,

    Following FROM cluase is giving different results. Developers say it should not. I am trying gether experts opinion on this.

    *************** Query 1 *************

    FROM [a]

    LEFT JOIN

    ON [a].x = .x

    LEFT JOIN [c]

    ON [a].x = [c].x

    LEFT JOIN [d]

    ON [a].x = [d].x

    [AND [d].y = [c].y

    **************** Query 2 *************

    FROM [a]

    LEFT JOIN

    ON [a].x = .x

    LEFT JOIN [d]

    ON [a].x = [d].x

    LEFT JOIN [c]

    ON [a].x = [c].x

    AND [c].y = [d].y

    You can notice that 2nd and 3rd Join condition is different. Not only their order is different but ON condition is also different. I strongly believe that the result will be different as I assume that result of first Left Join is passed on to the next Join condition and so on. Changing the order will change the result.

    Am I correct in my assumption?

    Thanks

  • i think your developers are right, and that the order of joins has no meaning; as I understand it, no matter the order of the joins, the Query Engine will arrange the order in the order it believes will get the data the fastest.

    instead of pseudo code, do you have a Real example? simply look a the execution plans of both queries and see if they are equivilent or not.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Jack Corbett posted up a blog link recently which covered this, and the fact that the join sequence and the position of the ON clauses can have a profound effect on how the query works. I'll try to find it, but I'm timed out for today.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks for the replies guys.

    While waiting on the reply, I created this small script and it proves what I assumed. The results are different:

    create table #a (x int, y int)

    create table #b (x int, y int)

    create table #c (x int, y int)

    create table #d (x int, y int)

    insert into #a values(1,2)

    insert into #a values(3,4)

    insert into #a values(5,6)

    insert into #a values(7,8)

    insert into #a values(9,10)

    insert into #b values(1,2)

    insert into #b values(3,4)

    insert into #b values(5,6)

    insert into #b values(11,12)

    insert into #c values(1,2)

    insert into #c values(3,4)

    insert into #c values(15,16)

    insert into #d values(13,14)

    insert into #d values(5,6)

    --Query 1

    select #a.*, #b.*, #c.*, #d.*

    FROM #a

    LEFT JOIN #b

    ON #a.x = #b.x

    LEFT JOIN #c

    ON #a.x = #c.x

    LEFT JOIN #d

    ON (#a.x = #d.x

    AND #d.y = #c.y)

    --Query 2

    select #a.*, #b.*, #c.*, #d.*

    FROM #a

    LEFT JOIN #b

    ON #a.x = #b.x

    LEFT JOIN #d

    ON #a.x = #d.x

    LEFT JOIN #c

    ON (#a.x = #c.x

    AND #c.y = #d.y)

    drop table #a

    drop table #b

    drop table #c

    drop table #d

    But still would like to know how does the engine determine order the processing Joins?

    Thanks

  • While you can hint for some types of joins, or indexes, I'm not sure you can control join order. The optimizer looks at statistics, data sizes, and guesses what would be most efficient.

    Some links:

    http://blogs.msdn.com/b/conor_cunningham_msft/archive/2010/04/23/conor-vs-left-outer-join-reordering.aspx

    http://blogs.msdn.com/b/conor_cunningham_msft/archive/2009/12/10/conor-vs-does-join-order-matter.aspx

  • Here's the link to Jack's post:

    http://www.sqlservercentral.com/Forums/FindPost964392.aspx

    The post points out that it's the order of the ON clauses which changes the effect of the query, rather than the order in which the JOINs appear in the FROM list. The ON clause order affects the order in which the joins are evaluated.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks Chris...let me go through it.

    Thanks

  • You're welcome.

    Jack's post and invaluable explanation has already found a use here 😎

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Steve Jones - Editor (8/17/2010)


    While you can hint for some types of joins, or indexes, I'm not sure you can control join order. The optimizer looks at statistics, data sizes, and guesses what would be most efficient.

    Some links:

    http://blogs.msdn.com/b/conor_cunningham_msft/archive/2010/04/23/conor-vs-left-outer-join-reordering.aspx

    http://blogs.msdn.com/b/conor_cunningham_msft/archive/2009/12/10/conor-vs-does-join-order-matter.aspx

    Technically the optimizer does a bunch of math to calculate what would be most efficient Steve - it doesn't really "guess"! I will admit that sometimes to the uninformed it sure SEEMS like it is guessing - and poorly at that!! 😀

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Finally we had meeting with our developers. They showed us how the users actually select options from the front end. They have limited control over forcing users to select elements in certain order (out thousands of the elements). Therefore they were looking for DBAs advise.

    If you run the example above as is, you will see different results with same data. I am still not able to understand why the results are different if JOIN order does not matter.

    Thanks

  • The optimizer examines a bunch of plans, tries to pick the best one. You might get different results at different times depending on data changes, statistics, etc.

    I think the point of Conor's post is that changing join order doesn't cause changes. There may be a correlation of changes, but it's not causality.

  • Steve Jones - Editor (8/19/2010)


    The optimizer examines a bunch of plans, tries to pick the best one. You might get different results at different times depending on data changes, statistics, etc..

    If this is how SQL works, to me it is a big loophole...I can clearly see a mess for reporting systems. Different reports at different times.

    If data does not change, why does result change if JOIN order is changed? I still trying.....:w00t:

  • It rarely happens, and it has been a big problem. In 2008, you can save off a "good" plan and force it to be re-used. I might do that in some cases, perhaps periodically evaluating if it's still "good"

    We had this happen to a large production system while TechEd was going on. A stored proc "broke" and stopped working. Running the same code in QA (SQL 2000) worked. Different plan. We had an MS developer RDP in and said that it sometimes happens, plan changes for strange reasons. Part of the issue in having a complex optimizer that tries to "quickly" find the "best" plan.

  • The problem is that the two queries are not equivalent. It has nothing to do with the order of the joins. If you notice, the records for a and b are the same in both results because we do not change the requirements. It is only with tables c and d that we are seeing a difference.

    Table c:

    In the first query you are performing a left outer join on table c where a.x = c.x only. This returns all rows where x is equivalent for table c, meaning x=1(,2) and x=3(,4). In the second query you have a left outer join on table c where a.x = c.x and c.y = d.y. For a.x = c.x we have 1,2 and 3,4. But neither 2 or 4 are in y for table d, so table c returns no records in the second query.

    Table d:

    In the first query you are performing a left outer join on table d where a.x = d.x and d.y = c.y. The a.x = d.x will get record 5,6 but there is no y value of 6 in table c so it returns nothing for table d. In the second query you are left joining table d on a.x = d.x which will return the 5,6 record.

  • +1 for Arthur Teter. This is nothing to do with the optimizer; the two queries are different. The developers (and far too many people on this thread) are wrong.

    The optimizer does not make changes to a query that would produce different sets of rows*. The only thing that can change due to optimization is the row order (and you should always use an explicit ORDER BY if you care about the order).

    * - if it does, there is a bug in the optimizer.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply