JOIN ON vs. WHERE

  • Best to start with an example.

    Query I:

    SELECT * FROM table1 JOIN table2

    ON table1.id = table2.id

    WHERE (table1.column2 = table2.column3 AND table2.column4 = 'whatever')

    Query II:

    SELECT * FROM table1 JOIN table2

    ON (table1.id = table2.id AND table1.column2 = table2.column3)

    WHERE table2.column4 = 'whatever'

    As expected, both queries give the same result set and the same execution plan.

    Then one of our developers experimented with some redundancy in the code:

    Query III:

    SELECT * FROM table1 JOIN table2

    ON (table1.id = table2.id AND table1.column2 = table2.column3)

    WHERE (table1.column2 = table2.column3 AND table2.column4 = 'whatever')

    Again it gave the expected result set, but now with a different execution plan and (even more disturbing) a better performance!

    Is this a bug? Our version is 10.0.2531.0 standard edition.

  • Interesting. Must have something to do with your distribution of data. I have tried this on my PC using AdventureWorks (2005) and can't duplicate your results.

    It would be interesting to see the execution plans. Perhaps in them there is an explanation.

    On a side note the first query is not ANSI-compliant so you should use the JOIN syntax.

  • keep in mind that getting the same results stops being true when those inner joins turn into outer joins.

    Jack - what's non-ANSI compliant in Query 1? I'm running a bit slow tody, but I 'm just not seeing it.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (#4) (1/29/2010)


    keep in mind that getting the same results stops being true when those inner joins turn into outer joins.

    Jack - what's non-ANSI compliant in Query 1? I'm running a bit slow tody, but I 'm just not seeing it.

    The first query isn't ANSI-92 compliant. It is compliant to older ANSI standards. I agree with Jack, using the INNER JOIN in the ON clause helps clarify what the join criteria between tables is and filter conditions in the WHERE clause.

  • Matt Miller (#4) (1/29/2010)


    keep in mind that getting the same results stops being true when those inner joins turn into outer joins.

    Jack - what's non-ANSI compliant in Query 1? I'm running a bit slow tody, but I 'm just not seeing it.

    Duh, I didn't notice that it was a 2 column join. I read the queries as

    SELECT * FROM table1, table2

    WHERE (table1.column2 = table2.column3 AND table2.column4 = 'whatever')

    And

    SELECT * FROM table1 JOIN table2

    ON (table1.column2 = table2.column3)

    WHERE table2.column4 = 'whatever'

    So I basically missed the JOIN statement in Query 1 which also means my tests in Adventureworks didn't exactly duplicate it.

  • Lynn Pettis (1/29/2010)


    Matt Miller (#4) (1/29/2010)


    keep in mind that getting the same results stops being true when those inner joins turn into outer joins.

    Jack - what's non-ANSI compliant in Query 1? I'm running a bit slow tody, but I 'm just not seeing it.

    The first query isn't ANSI-92 compliant. It is compliant to older ANSI standards. I agree with Jack, using the INNER JOIN in the ON clause helps clarify what the join criteria between tables is and filter conditions in the WHERE clause.

    I'm sorry if I'm being dense, but there IS an ON clause, and a JOIN predicate.

    Just because one of your filtering criteria happens to compare values from the tables being JOINED doesn't make it a JOIN criterion vs a filtering criterion, does it? Again - make those join left outer joins, and the two queries' would be equivalent (but both would be ANSI-92 compliant I think).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • With INNER JOINs I don't think it really matter where you put the filter condition. Where it matters is when you are converting old style outer joins to ANSI-92 outer joins. You could easily get different results in that situation as what you think is a filter condition actually is a join condition between two tables.

  • F. van Ruyven (1/29/2010)


    ...again it gave the expected result set, but now with a different execution plan and (even more disturbing) a better performance! Is this a bug? Our version is 10.0.2531.0 standard edition.

    No, not a bug. The textual query you write goes through a number of transformations before the engine produces an execution plan. Adding redundancy like this can mean that one or more possible transformations or optimizations no longer fits an optimization rule, so the query is optimized differently.

    SQL Server does not guarantee that all possible representations of a logical query in the T-SQL syntax, will produce an identical execution plan. The broad goal is to produce a probably very good plan in a short time. By skipping one or more of the possible transformations, you might have wound up with a better plan (in terms of your particular data, hardware, or server configuration) than otherwise...but it's just a quirk of the process, and with a different, but syntactically similar query, the results might well be different.

    There are such a broad range of inputs to the optimization process that it is impossible to be precise about exactly what happened in your case, but I hope the general explanation above helps a little.

  • Thank you all !!

  • I think it was touched on with the distribution of data. But, to add to that, if you have a query plan in cache it'll use that until it thinks it needs to generate a new one. So, it is possible that statistics/data changed enough that a new plan was faster, but not enough change that it caused the cahced plan to be flushed. You could try again with a RECOMPILE option and see what happens.

  • I have a more challenging scenario. I am doing left joins where I am trying to find values that are in the left table but not in the right table, like a NOT IN. Simple enough except I am also looking for specific values besides the joining key. In other words, I am looking for Lookup keys for which there is certain value in the LEFT but not certain values in the RIGHT. Whether I put the predicates on "ON" or "WHERE" made a huge difference.

    To prepare the DB:

    create table #A (Lookup int, Val varchar(10))

    create table #B (Lookup int, Val varchar(10))

    insert into #A values (1, 'A')

    insert into #A values (2, 'B')

    insert into #A values (3, 'C')

    insert into #B values (2, 'B')

    insert into #B values (3, 'C')

    insert into #B values (4, 'D')

    insert into #B values (1, 'E')

    Then these three SELECTS trying to find Lookup values for which there is a "B" in the LEFT

    but no "E" in the right.

    select * --this is correct

    from #A

    LEFT join #B

    on #A.Lookup=#B.Lookup

    and #B.Val in ('E')

    WHERE #B.Lookup is null

    and #A.Val = 'B'

    select * --this returns nothing

    from #A

    LEFT join #B

    on #A.Lookup=#B.Lookup

    WHERE #B.Lookup is null

    and #A.Val = 'B'

    and #B.Val in ('E')

    select * --this returns everything

    from #A

    LEFT join #B

    on #A.Lookup=#B.Lookup

    and #A.Val = 'B'

    and #B.Val in ('E')

    WHERE #B.Lookup is null

    Syntax-wise they look very similar but the results are very different and only the 1st one is correct. The explain plan is also very different. I research this and it seems it has something to do with the order with which SQL Server parse the ON vs WHERE clause.

    It seems like to get what I want, I need to have all the predicates for the RIGHT table in the "ON" clause and have the RightTable.ID is null and all the predicates for the LEFT table in the "WHERE" clause.

    Can someone explain this behavior, using my data, for all three queries included?

    Thanks

  • You should post this new question in a thread of your own.

    That said, the answer is simple enough. See Join Fundamentals in Books Online. The key part is:


      Inner joins can be specified in either the FROM or WHERE clauses. Outer joins can be specified in the FROM clause only. The join conditions combine with the WHERE and HAVING search conditions to control the rows that are selected from the base tables referenced in the FROM clause.


  • Thank you for the reply. I will move this to a new topic. I am not sure your answer addresses the question per se because all 3 of my examples are "valid" as far as syntax goes but only 1 is correct for results and I am looking for an explanation why there are different explain plan and results.

    Regards

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply