A question on Where vs Join filtration methodology and order of filtration

  • Greetings all.

    This question has been forming in my mind for a while so now I will present it here!

    This mainly is on Inner Joins but Left and Right joins work as well in certain situations.

    I learned within the last 6 months or so you can use many Where clause filters within the Join clause when Joining the tables. So my first question, what is the best practice? What is the performance difference? Should the Where clause filters be left only for the main table being joined, and every other filter be within the Join clauses?

    Also, I have been playing around with this lately but haven't had the time to benchmark results; Does the order in the Where clause affect the performance? For instance, lets say you have a million rows, the first column has 2 possible values, 0 and 1 and they are split evenly. Then the second column has 20 possible values. I am thinking the where clause should filter the second column first since the second filter is applied against 5% of the rows instead of all of them...does that make sense? Or does the Query engine figure that out anyways?

    Link to my blog http://notyelf.com/

  • The order in the Where clause won't matter, since the engine will make its own decisions on that subject.

    The deciding factor on whether to put something in the Join clause or the Where clause is what will get you the results you need. This mainly applies to Outer joins.

    Beyond that, it's a question of readability, maintainability, and standards.

    Personally, if it has something to do with the join, I keep it in the Join clause. Makes it easier to read and to modify if needed. If it's part of the overall filter for the query, I put it in the Where clause, because again that makes the query easier to understand and debug/refactor.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Inner Joins:

    If i'm writting production level stored procs that must be optimized then i would put everything that exist in the ON statement in the WHERE as well when using an inner join just to be safe. I have run into situation where it helps. Now, i know the order of the constraints in the ON statement are imparative to the optimizer picking the correct index however i'm not sure about the order in the where. Personally i put the one on top who will omit the most records just in case however dont really know if it helps.

    Outter joins:

    You obviously cant do the above here cause then it will turn a left or right join into an inner join. I would still put the constraints you can in the where if they apply.

    Again i would only do the above in situations where it is imparative to have completely optimized stored procedures. i wouldn't do this in development because the redundance may cause more porblems than it is worth. It might actually screw u up and u put somehing somewhere u shouldnt or forget to for that matter.

    I'm not an expert but that's my take.

  • I'm interested to find out if the order of the constraints in the where matters as well. I would have to thik it does.

  • Very cool and good to know.

    I think you just saved my plenty of time in my OCD of ordering joins and where clauses logically.

    I agree with you on the joins and where clause on the readability. Well, except those queries where you have 30+ tables all with their own little logic added. At that point I like it in the the join clause to figure that out lol.

    Link to my blog http://notyelf.com/

  • BaldingLoopMan (12/17/2009)


    Inner Joins:

    If i'm writting production level stored procs that must be optimized then i would put everything that exist in the ON statement in the WHERE as well when using an inner join just to be safe. I have run into situation where it helps. Now, i know the order of the constraints in the ON statement are imparative to the optimizer picking the correct index however i'm not sure about the order in the where. Personally i put the one on top who will omit the most records just in case however dont really know if it helps.

    SQL Server will ignore the order you write them in most of the time. It decides which sequence of actions makes it do the least work based on complex algoriths.

    The way to see this in action is to write a moderately complex query, something that joins three tables for example and has multiple Where evaluations, and look at the execution plan. Then change the sequence of the Where evaluations and look at the execution plan again. Then move something from the Join clause to the Where clause or vice versa and again look at the execution plan.

    Try this on a variety of queries, with and without input parameters, on large tables, small tables, a mixture of large and small, on tables with and without appropriate indexes.

    In the vast majority of cases, you'll find that fiddling with the sequence of the Join and Where tests doesn't change a single thing in the execution plan, unless it would actually change the data that's returned.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • When writing queries I put the join criteria in the FROM clause and the query filter criteria in the WHERE clause. IMHO, this makes the queries easier to understand and maintain. Part of this comes from the way I also try to write most of my queries. I first put together the joins between tables. This is where I normal determine the type of joins (INNER or OUTER) required to anwser a question. From there I start working on the filter criteria. Sometimes, especially when using outer joins, these filter criteria become part of the join criteria.

  • The order in which you place these things and whether or not you place the criteria in the ON or WHERE cluase (at least for INNER JOINS) does not matter. The optimizer will rearrange to suit itself. That's a fact.

    As a rule, I put the JOIN criteria in the ON clause and filtering criteria into the WHERE clause.

    However, there is a little bit of an exception to this. When your queries get very complex, and you start seeing that the execution plans generated are due to a timeout, it means that the optimizer was unable to try every permutation of placement of the order of the operations. In this case, you can, sometimes, get better performance by moving the criteria into the ON clause, helping the optimizer make better choices quicker. It doesn't always work, so I wouldn't suggest it as a common practice, but it's worth experimenting with if you've tried everything else to tune a query.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Placement in the WHERE or ON clause could be important, also, when dealing with OUTER joins. It could make the difference between getting the data you need or not.

  • ------------------------------------------------------------------------------------------------

    The order in which you place these things and whether or not you place the criteria in the ON or WHERE cluase (at least for INNER JOINS) does not matter. The optimizer will rearrange to suit itself. That's a fact.

    ------------------------------------------------------------------------------------------------

    i always thought the order would affect the index selected for some reason. you're right though it doesn't. I've been reading up on it. That's good to know.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply