May 7, 2010 at 11:58 am
Hugo Kornelis (5/7/2010)
CirquedeSQLeil (5/7/2010)
Paul White NZ (5/7/2010)
CirquedeSQLeil (5/6/2010)
Though I got this correct, it appears there is a typo in the correct answer. My version had the last two flip-flopped based on the source document provided in the answer.What are you referring to here Jason?
It must be hard to spot - I can't see anything wrong...:unsure:
Your list has it correct, Itzik's list has it correct. The correct Answer (and explanation) has inverted the order by and top (order by then top as opposed to the document which says top then order by).
Hi Jason,
It's not my habit to disagree with Itzik, but in this case I have to make an exception.
Lemme ask you a simple question - in a query that has a TOP and an ORDER BY clause, how exactly should SQL Server process the TOP without sorting the rows first? *IF* Microsoft had decided to use a seperate clause to define the order for the TOP instead of overloading the ORDER BY clause (for instance TOP ... OVER (ORDER BY column)), then I would agree with TOP first (which includes ordering by the OVER (ORDER BY) specificion of the TOP), ORDER BY (for presentation) last. But as it stands, it simply does not make sense to do an ORDER BY after a TOP that has already included the same sort operation.
I disagree with the list and Itzik, and I agree with the order as presented in the question.
Note that, for most clauses, the logical processing order is defined by the ANSI standards. However, the TOP clause is proprietary; ANSI does not support it. And Microsoft never officially announced the logical order, so this is all speculation. Both on Itzik's and on my part.
I agree with the order presented in the question. A simple evaluation where one does something like
Select top 10 COLUMN1,column2
From SomeTable
Where Column2 is null
Order By COLUMN1 Desc
would demonstrate that. The Order By is done prior to the Top clause. I brought it up just due to the nuance of order being different.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
May 7, 2010 at 1:30 pm
Here is MSDN article that shows processing order for SELECT:
http://msdn.microsoft.com/en-us/library/ms189499.aspx
--Vadim R.
May 7, 2010 at 3:02 pm
rVadim (5/7/2010)
Here is MSDN article that shows processing order for SELECT:
Thanks for finding that article.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
May 7, 2010 at 11:09 pm
Hugo Kornelis (5/7/2010)
It's not my habit to disagree with Itzik, but in this case I have to make an exception.
Me either 😉 However, I think there is a subtlety here that makes both interpretations correct, but Itzik's more so:
The last two items on Itzik's list are TOP then Presentation ORDER BY. The inference here is that presentation ORDER BY is separate from TOP. My take is that TOP includes an implicit ORDER BY, and the presentation ORDER BY logically comes after that.
The important distinction is that SQL Server does guarantee which rows will be qualified by TOP, given its associated ORDER BY (if any) but it does not guarantee that the same order will be reflected in the final output, even if the TOP is the last iterator in the query plan. In general, the order used to qualify rows for a TOP iterator does not say anything about the order of rows presented to the next stage of the operation.
But as it stands, it simply does not make sense to do an ORDER BY after a TOP that has already included the same sort operation.
This is exactly what I think you are wrong about here. There is no guarantee of the 'row qualification' sort order being preserved. For anyone not prepared to take my word for it (and that should be all of you) please see Conor Cunningham's blog entry on the subject. If I were feeling mischievous, I might mention 'defensive programming' at this point 😛
I disagree with the list and Itzik, and I agree with the order as presented in the question.
I would say that both are correct, but the question is less so - since it is missing the final presentation ORDER BY step (assuming that the ORDER BY in the question is the one associated with the TOP).
FROM
Cartesian Product
ON clause filter
Add Outer Rows
WHERE
GROUP BY
HAVING
SELECT
Evaluate expressions
DISTINCT
TOP (including qualification ORDER BY if present)
Presentation ORDER BY
May 8, 2010 at 3:39 am
Paul White NZ (5/7/2010)
Hugo Kornelis (5/7/2010)
It's not my habit to disagree with Itzik, but in this case I have to make an exception.Me either 😉 However, I think there is a subtlety here that makes both interpretations correct, but Itzik's more so:
The last two items on Itzik's list are TOP then Presentation ORDER BY. The inference here is that presentation ORDER BY is separate from TOP. My take is that TOP includes an implicit ORDER BY, and the presentation ORDER BY logically comes after that.
Ahhh, even more disagreement. Nice! Good training for the grey cells. 🙂
First: My assumption so far was that this whole discussion is about queries without any subqueries, in any of the clauses (and that includes "hidden" subqueries caused by CTE's, views, inline table-valued functions, or the APPLY operator). I think we all agree that the parentheses around subqueries imply what they imply in mathematics: that (logically) the subqeries are evaluated first. I always think of it as a virtual temporary table that is created and populated by evaluating the saubquery, and then the subquery in the original query is replaced by this virtual temporarty table. Again, all logical - this is where the actual evaluation order usually is hugely different from the logical order.
If we disagree on this assumption, if you take this discussion to be about all queries including subqueries at any nesting level, than I am prepared to agree with you on the order of the ORDER BY and TOP clauses, but I'll have to start disagreeing on almost everything else 😛
Second: Given that we now agree that this discussion is about queries with no subqueries, there is in the SQL Sever syntax no way to specify a presentation ORDER BY that differs from the ORDER BY for the TOP. There is one ORDER BY, and it always applies to both the presentation order and the TOP clause.
Third: rVadim has managed to find what I could not - the official word from Microsoft on the logical order of evaluation, as included in Books Online. In that list, ORDER BY goes before TOP.
The important distinction is that SQL Server does guarantee which rows will be qualified by TOP, given its associated ORDER BY (if any) but it does not guarantee that the same order will be reflected in the final output, even if the TOP is the last iterator in the query plan. In general, the order used to qualify rows for a TOP iterator does not say anything about the order of rows presented to the next stage of the operation.
Where in the execution plan an operator stands is indeed totally irrelevant for what Microsoft does or does not guarantee. Where in a query a clause stands is not. And Microsoft does guarantee returning results in the order of the ORDER BY of the outermost query.
So in any subquery, ORDER BY can only apply to TOP, not to presentation order, in those cases the logical evaluation order ends with ORDER BY (for the TOP only), then TOP.
In the outer query, (either a query with no subqueries, or a query with subqueries, after logically replacing all subqueries with virtual temporary tables), ORDER BY applies to both TOP and presentation order; the logical evaluation order still ends with ORDER BY (for both TOP and presentation), then TOP (which doesn't affect row ordering).
There is no guarantee of the 'row qualification' sort order being preserved. For anyone not prepared to take my word for it (and that should be all of you) please see Conor Cunningham's blog entry on the subject. If I were feeling mischievous, I might mention 'defensive programming' at this point 😛
In that very blog entry, Conor also wriites: "You’d need to put an ORDER BY at the top of the query to guarantee the output order returned to the client". In other words, presentation order is guaranteed if the ORDER BY is on the outermost query. And Conor writes: "If an ORDER BY is used in the same scope, it qualifies rows based on the ORDER BY" - so a TOP on the same outermost query will use the ORDER BY that also guarantees presentation order.
I would say that both are correct, but the question is less so - since it is missing the final presentation ORDER BY step (assuming that the ORDER BY in the question is the one associated with the TOP).
FROM
Cartesian Product
ON clause filter
Add Outer Rows
WHERE
GROUP BY
HAVING
SELECT
Evaluate expressions
DISTINCT
TOP (including qualification ORDER BY if present)
Presentation ORDER BY
The presentation ORDER BY is only relevant for the outermost scope, and if a TOP applies to that scope, it can only use the same ORDER BY. So the ORDER BY after the TOP does nothing.
May 8, 2010 at 9:23 pm
Hugo,
I don't disagree with any of that really; as I said before, I think both answers are correct, I just happen to prefer one over the other. My preference is based on:
We are talking about the logical order of processing here, so it seems wrong to me to rely on current implementation details to argue the point. Yes, presentation ORDER BY and TOP ORDER BY will be the same as the product stands - but that seems to be a consequence of the current restrictions of the language, not a logical restriction.
Let us imagine that SQL Server supports the TOP OVER() construction in the next release -which logical description would you prefer then?
Fascinating stuff 🙂
Paul
May 9, 2010 at 4:04 am
Paul White NZ (5/8/2010)
(...)Let us imagine that SQL Server supports the TOP OVER() construction in the next release -which logical description would you prefer then?
Hi Paul,
Good question!
Thinking about it made me realize that the order as described so far does not support windowing functions at all. So instead of adding just the fictional TOP OVER(), I'll add the OVER() clause in general.
Let's start with the evaluation order in Boooks Online:
1. FROM
2. ON
3. JOIN
4. WHERE
5. GROUP BY
6. WITH CUBE or WITH ROLLUP
7. HAVING
8. SELECT
9. DISTINCT
10. ORDER BY
11. TOP
Since windowing functions can not be used in WHERE or HAVING clauses, but only in the SELECT (and in the fictional TOP OVER() we just added), the most logical place to evaluate them would be between HAVING and SELECT. The result would then be:
1. FROM
2. ON
3. JOIN
4. WHERE
5. GROUP BY
6. WITH CUBE or WITH ROLLUP
7. HAVING
8. ordering and grouping for windowing functions used in query
9. SELECT
10. DISTINCT
11. ORDER BY
12. TOP
What makes this interesting is how TOP and DISTINCT start to interrelate. Imagine a table like this:
CREATE TABLE Demo
(KeyCol int NOT NULLL PRIMARY KEY,
OtherCol int NOT NULL);
INSERT INTO Demo (KeyCol, OtherCol)
SELECT 1, 1 UNION ALL
SELECT 2, 1 UNION ALL
SELECT 3, 1 UNION ALL
SELECT 4, 2 UNION ALL
SELECT 5, 2 UNION ALL
SELECT 6, 3 UNION ALL
SELECT 7, 3 UNION ALL
SELECT 8, 4 UNION ALL
SELECT 9, 1;
Before reading on, consider what output you would expect from this query:
SELECT DISTINCT TOP(2) OVER (ORDER BY KeyCol) OtherCol
FROM Demo;
If we first evaluate the TOP(2) OVER (ORDER BY KeyCol), we have an intermediate result of two rows, with value-pairs (1,1) and (2,1); only the second column is selected, and the DISTINCT than reduces it to a single column - which is weird, since we asked for TOP(2).
On the other hand, if we first evaluate the column list and the DISTINCT, we'd get an intermediate result of four single-columns rows, sporting the values 1, 2, 3, and 4; the TOP now becomes very hard to evaluate, because the KeyCol column no longer exists in this result set, and even if SQL Server under the covers would still connect the OtherCol values with their source rows, we'd still not know which of the associated KeyCol values to use.
Hmmm. Maybe this is why Microsoft has not implemented TOP OVER()? :unsure:
May 9, 2010 at 8:54 am
Heh. Awesome stuff, going to read that properly in a bit and reply - but I can't help mentioning that window functions can appear in the ORDER BY clause too 😀
May 10, 2010 at 7:35 am
Just when I thought the "question of the day" is going down hill with non-practical (things that you would never do in real life), or too mechanical (what button do you click to do x) craps, your question came through. Thanks. BTW. Are you Tom Brown from San Diego/Seatle?
- Hoang Lam
May 10, 2010 at 8:50 am
Glad you liked my question.
hlam-936978 (5/10/2010)
BTW. Are you Tom Brown from San Diego/Seatle?- Hoang Lam
Not me - wrong continent - I'm in the UK.
May 11, 2010 at 10:31 am
Thanks for the Q - I must admit I hadn't considxered this
June 3, 2010 at 9:40 am
calvo (5/7/2010)
So, now knowing how a query is processed internally, how can we use this knowledge to improve query performance or perhaps troubleshooting unexpected results?
I think there are two standout things to learn from this:
1. Since JOIN happens before WHERE a more complete ON clause may be more efficient as there will be fewer rows to filter out in the WHERE clause.
2. Since GROUP BY happens before SELECT and DISTINCT happens after SELECT. These:
a) SELECT DISTINCT A,B,C FROM TABLE
b) SELECT A,B,C FROM TABLE GROUP BY A,B,C
Are not necessarily going to have the same efficiency. This surprises me a little as my naive assumption would be that the optimizer would convert the DISTINCT into a GROUP BY.
--
JimFive
June 3, 2010 at 9:47 am
James Goodwin (6/3/2010)
calvo (5/7/2010)
So, now knowing how a query is processed internally, how can we use this knowledge to improve query performance or perhaps troubleshooting unexpected results?I think there are two standout things to learn from this:
1. Since JOIN happens before WHERE a more complete ON clause may be more efficient as there will be fewer rows to filter out in the WHERE clause.
This is not true for an INNER JOIN. The placement of the predicate has no influence on the generated query plan and the efficiency. (Exception - for very complicated many-table joins with little rows, the optimizer might stop searching for a better plan if it estimates that the cost of finding a better plan exceeds the cost saved by that plan; in that case the placement might affect performance, and that might happen in both ways. This is rare, though).
2. Since GROUP BY happens before SELECT and DISTINCT happens after SELECT. These:
a) SELECT DISTINCT A,B,C FROM TABLE
b) SELECT A,B,C FROM TABLE GROUP BY A,B,C
Are not necessarily going to have the same efficiency. This surprises me a little as my naive assumption would be that the optimizer would convert the DISTINCT into a GROUP BY.
Again, incorrect. These will have the same plan, as that conversion is exactly what the optimizer does.
Viewing 13 posts - 31 through 42 (of 42 total)
You must be logged in to reply to this topic. Login to reply