May 10, 2011 at 1:59 am
Hi
I know that SQL joins tables from left to right, but in what order should we join tables for the code to be most efficient?
Say I wanted a list of all customers orders. Would I start by querying the customers table and joining to the customers details table and then the orders table and then joining to the order details table? Or would I choose that the orders table is the central table to my database and start from there? Or would I start from the Order details table as this table has the most rows?
Assuming that all indexes are correct (clustered indexes on primary keys and non clustered on foreign keys) would it be better to start with the smaller tables and then join to the larger tables? I'm thinking that this method would be the most likely to use a index scan rather than a full table scan.
To make it even faster I would move the filters from the WHERE clause into the JOINS, and for adhoc queries I would include variables to improve readability.
I know there are some serious SQL optimizers out there. Please help me guys. 🙂
May 10, 2011 at 4:37 am
I'm no expert by any means but I posted a similar question once before. Generally speaking, just from what I was told, it really doesn't matter in basic joins. The compiler will decide what is best.
May 10, 2011 at 5:30 am
I've heard the same as tyson price, that for SQL server, the order of the joins don't matter, because the optimizer will change the order based in the execution plan based on it's best estimate of getting the data quickly based on the execution plan.
I've also heard (but cannot point to a source) that other DBMS's are affected by the order, so it's best to create the joins in an order that restricts the most data quickly.
Lowell
May 10, 2011 at 5:58 am
terrykzncs (5/10/2011)
I know that SQL joins tables from left to right, but in what order should we join tables for the code to be most efficient?
Any order you like. Unless you have a hint like FORCE ORDER, the optimiser's going to ignore the order you specify them in and search around for an optimal plan anyway. SQL does not join left-to-right unless left-to-right happens to also be the most optimal order as calculated by the query optimiser
Assuming that all indexes are correct (clustered indexes on primary keys and non clustered on foreign keys) would it be better to start with the smaller tables and then join to the larger tables? I'm thinking that this method would be the most likely to use a index scan rather than a full table scan.
Table vs index scan depends on available indexes and what the query is doing, not order the tables or predicates are specified. If there's an available index that can be seeked on, the optimiser will likely use it in a seek. If there isn't, but there's an index that can be scanned, it will scan. If there's no useful index at all, that's when it will table scan/clustered index scan.
p.s. The primary key is not always the best place for the clustered index. It's simply the default place.
To make it even faster I would move the filters from the WHERE clause into the JOINS
For Inner joins that will make no difference at all. For outer joins it will change the logic of the query and likely the results.
and for adhoc queries I would include variables to improve readability.
Which may or may not help, as variables (as opposed to parameters can't have their values sniffed and hence may result in a less optimal plan.
In short, don't try to tell the optimiser what to do, give it useful information (statistics, constraints), good indexes and simple code and it should be able to figure things out itself.
p.s. On really, really complex queries sometimes changing the join order changes the way the optimiser evaluates plans and results in a better plan. It's not that it's taking your join order, it's that the optimiser is timing out before finding a plan it considers good enough and changing the join order results in it evaluating plans in a different order and finding something better in the time it is allowed.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 10, 2011 at 8:24 am
Thanks for that Guys.
Its been worrying me since I read in a book that sql will join tables from left to right. I guess they were simply explaining how the join would work. From that point on, I've been trying to get my queries to affect as few of the records in the table as possible.
To make it even faster I would move the filters from the WHERE clause into the JOINS
For Inner joins that will make no difference at all. For outer joins it will change the logic of the query and likely the results.
Correct me if I am wrong, but I have read that SQL will process the FROM clause followed by the SELECT clause and then the WHERE clause, so having the filters in the FROM clause would mean fewer records would need to be handled in the next step.
May 10, 2011 at 11:02 am
terrykzncs (5/10/2011)
Correct me if I am wrong, but I have read that SQL will process the FROM clause followed by the SELECT clause and then the WHERE clause, so having the filters in the FROM clause would mean fewer records would need to be handled in the next step.
You're wrong. 🙂
What I said is easily testable. Run some queries, look at the execution plan. I don't know where you read that, but I would consider that source to be very questionable.
SQL retrieves the minimum number of rows it needs, based on the FROM and WHERE clauses. It would be absolutely ludicrous if SQL first processed the from clause, retrieved every single row from every single table and then filtered them. If it did, indexes would be useless and the only thing it would do would be a table scan.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 11, 2011 at 12:01 pm
Sorry for taking so long to reply; I'm replying while at work.
Please see below links:
http://www.sql.co.il/books/insidetsql2008/Logical%20Query%20Processing%20Poster.pdf
http://blogs.x2line.com/al/archive/2007/06/30/3187.aspx
Please could you give your views on this as there are probably more of us that have read the book and would appreciate your advice.
May 11, 2011 at 12:36 pm
terrykzncs (5/11/2011)
Please see below links:
http://www.sql.co.il/books/insidetsql2008/Logical%20Query%20Processing%20Poster.pdf%5B/quote%5D
Emphasis Logical processing order. That's not the actual, physical operators that SQL uses, it's the logical flow, based on some ANSI requirement.
The second is not clearly stated but is also talking about the logical processing order.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 12, 2011 at 11:35 am
While it is true that T-SQL is not supposed to consider the order in which you type tables on a JOIN, I ran into an odd problem trying to troubleshoot a query that ran for over a minute (100 seconds, actually).
When I reversed the order of the conditions in the WHERE clause, execution time dropped to milliseconds. Nothing else changed, and I could reproduce the results easily. See post at http://www.sqlservercentral.com/Forums/Topic1089463-360-1.aspx#bm1089981
No one who responded to my post could explain this...
It's the only time I've seen this happen.
You do want to be careful of JOIN order when you are using OUTER joins, though. You can wind up including rows/data you didn't mean to.
Good luck,
Rich
May 12, 2011 at 9:53 pm
Thanks Gail for the explanation. Much appreciated.
To Rich, the only comment I can offer is that in Itzik Ben-Gan's book 'Inside Microsoft SQL Sever 2008:TSQL Querying' (see link in earlier post), he does say that we should consider the order of conditions in the WHERE clause to make the best use of indexes.
Obviously you need to test, test and test some more, but it does help to be aware that these little things do make a difference sometimes.
May 12, 2011 at 11:34 pm
terrykzncs (5/12/2011)
Obviously you need to test, test and test some more, but it does help to be aware that these little things do make a difference sometimes.
If it makes a difference then you've either pretty much found a bug in the optimiser or it's an optimiser timeout (as I described above) and the different order allowed plans to be searched in a different order.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 13, 2011 at 6:00 am
GilaMonster (5/12/2011)
terrykzncs (5/12/2011)
Obviously you need to test, test and test some more, but it does help to be aware that these little things do make a difference sometimes.If it makes a difference then you've either pretty much found a bug in the optimiser or it's an optimiser timeout (as I described above) and the different order allowed plans to be searched in a different order.
Gail, do you have a link or two with more info. on query optimizer timeouts? My friend Google didn't help much, but I may have been searching the wrong terms.
Thanks,
Rich
May 13, 2011 at 6:21 am
GilaMonster (5/10/2011)
terrykzncs (5/10/2011)
I know that SQL joins tables from left to right, but in what order should we join tables for the code to be most efficient?Any order you like. Unless you have a hint like FORCE ORDER, the optimiser's going to ignore the order you specify them in and search around for an optimal plan anyway. SQL does not join left-to-right unless left-to-right happens to also be the most optimal order as calculated by the query optimiser
...
A good example of this is provided by Paul White here.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
May 13, 2011 at 10:34 am
GilaMonster (5/10/2011)
terrykzncs (5/10/2011)
Correct me if I am wrong, but I have read that SQL will process the FROM clause followed by the SELECT clause and then the WHERE clause, so having the filters in the FROM clause would mean fewer records would need to be handled in the next step.You're wrong. 🙂
What I said is easily testable. Run some queries, look at the execution plan. I don't know where you read that, but I would consider that source to be very questionable.
SQL retrieves the minimum number of rows it needs, based on the FROM and WHERE clauses. It would be absolutely ludicrous if SQL first processed the from clause, retrieved every single row from every single table and then filtered them. If it did, indexes would be useless and the only thing it would do would be a table scan.
The correct sequence that SQL uses internally to process the clauses in an SQL query:
FROM, [JOIN CONDITION, JOIN TABLE...], WHERE, GROUP BY, HAVING, SELECT, DISTINCT, ORDER BY, TOP
May 13, 2011 at 1:31 pm
Terry Low-350996 (5/13/2011)
The correct sequence that SQL uses internally to process the clauses in an SQL query:FROM, [JOIN CONDITION, JOIN TABLE...], WHERE, GROUP BY, HAVING, SELECT, DISTINCT, ORDER BY, TOP
SQL typically performs the where and the from together when it can (retrieve only the rows that satisfy the condition from the table if the indexes allow). It may do some of the where clause predicates while it's retrieving the data from the tables, others during a join and some later as a filter.
It can sometimes perform the TOP at the same point (no need to retrieve the entire table if you asked for TOP 1 with no predicates. A sort can sometimes not be performed at all if there's an index that can be used to provide the data sorted.
Any order of execution that's given in terms of the clauses of the query is a logical order. The physical order is shown by an execution plan and can be radically different.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply