October 31, 2011 at 6:43 am
L' Eomot Inversé (10/27/2011)
tilew-948340 (10/27/2011)
I answered anything because I know nothing about the execution plan but I wanted to see the answer and read the discussion now, so I tried something... and got no luck: my answer was wrong and I still don't understand... So, here I am, with my question: is it important to undersand if you are (well, trying to be) a developper on a very small database?<Good Stuff Omitted for Clarity>
Alternatively, think of it as a very simple programme paradigm: the thing that produces the result calls the things that provide it with data, and those things all call other things that give them the data they need to produce their results, and so on. Apart from the top level, nothing produces any data or does anything at all unless it is asked to by something at the level above.
Tom,
Thanks for your explanation. I think that I understood but your explanation certainly makes it clearer.
Ernie
November 2, 2011 at 4:15 am
Another V Good QoTD from Kiwi!!!
November 7, 2011 at 6:48 am
Have to agree with others, this was a great QotD. I'm happy to say I got it wrong, but I would argue that I was one of those who got it wrong from misunderstanding the term used, "executes".
My understanding of how data is pulled by the root operator from its child operator(s) was correct, I simply thought of operator initialisation and execution as two distinct phases.
Good stuff, hope there's more to come!
January 12, 2012 at 2:03 pm
Simple question:
what's being aggregated if the data in the table have not been scanned yet?
_____________
Code for TallyGenerator
January 12, 2012 at 3:13 pm
Sergiy (1/12/2012)
Simple question:what's being aggregated if the data in the table have not been scanned yet?
Most operators don't process all data at once; they are called repeatedly with the request to return a row. Leaving out the "initialize yourself" calls and focusing on the "give me a row" calls, here is the sequence of operators calling each other for this specific plan.
1. Stream Aggregate is called with a "give me a row" request.
2. Stream Aggregate doesn't have any data yet, so it calls the Nested Loops operator to get a row.
3. Nested Loops has no data yet on either input, so it calls the Table Scan operator to get a row.
4. Table Scan reads first row and returns is to Nested Loops.
5. Nested Loops now calls Index Seek to get a row for its second input.
6. Index Seek reads first index row and returns it to Nested Loops.
7. Nested Loops combines inputs and returns row to Stream Aggregate.
8. Stream Aggregate increases internal results (row counter), then calls Nested Loops again for the next row.
9. Steps 5-8 are repeated until the Index Seek returns a "no more matching rows" condition.
10. Steps 3-9 are repeated until the Table Scan returns a "no more matching rows" condition.
11. At this point, Nested Loops returns "no more matching rows" to its caller.
12. Stream Aggregate receives "no more matching rows" and returns the internal results (row counter) to the caller.
If there had been a GROUP BY, this all would have been a bit more complex - in that case, Stream Aggregate would return a row and reset its internal counter after each change in the GROUP BY values.
So the first operator to execute is the Stream Aggregate (step 1), but the first operator to actually touch any real data is the Table Scan (step 4).
January 12, 2012 at 3:31 pm
Hugo Kornelis (1/12/2012)
Sergiy (1/12/2012)
Simple question:what's being aggregated if the data in the table have not been scanned yet?
Most operators don't process all data at once; they are called repeatedly with the request to return a row. Leaving out the "initialize yourself" calls and focusing on the "give me a row" calls, here is the sequence of operators calling each other for this specific plan.
1. Stream Aggregate is called with a "give me a row" request.
2. Stream Aggregate doesn't have any data yet, so it calls the Nested Loops operator to get a row.
3. Nested Loops has no data yet on either input, so it calls the Table Scan operator to get a row.
4. Table Scan reads first row and returns is to Nested Loops.
5. Nested Loops now calls Index Seek to get a row for its second input.
6. Index Seek reads first index row and returns it to Nested Loops.
7. Nested Loops combines inputs and returns row to Stream Aggregate.
8. Stream Aggregate increases internal results (row counter), then calls Nested Loops again for the next row.
9. Steps 5-8 are repeated until the Index Seek returns a "no more matching rows" condition.
10. Steps 3-9 are repeated until the Table Scan returns a "no more matching rows" condition.
11. At this point, Nested Loops returns "no more matching rows" to its caller.
12. Stream Aggregate receives "no more matching rows" and returns the internal results (row counter) to the caller.
If there had been a GROUP BY, this all would have been a bit more complex - in that case, Stream Aggregate would return a row and reset its internal counter after each change in the GROUP BY values.
So the first operator to execute is the Stream Aggregate (step 1), but the first operator to actually touch any real data is the Table Scan (step 4).
From your explanation it's obvious that Stream Aggregate is actually first time executed only on step 8, AFTER table scan, index seek and nexted loop are completed.
Which makes it the last in the queue, not the first.
_____________
Code for TallyGenerator
January 12, 2012 at 3:48 pm
Hugo Kornelis (1/12/2012)
So the first operator to execute is the Stream Aggregate (step 1), but the first operator to actually touch any real data is the Table Scan (step 4).
As you said - on step 1 the Stream Aggregate does not have any data to work with (see item 2. of your explanation).
Therefore it physically cannot be executed . It just does not have anything to work with.
Most people miss the very important thing about running SQL statement.
There are several stages of such run:
- parsing;
- compiling;
- execution.
What you described in item 1 actually happens on compilation stage.
The Stream aggregate is compiled first, placed into memory and prepared to accept data from other modules of the program (a query is a small program, right?).
It's the optimiser and compiler wjich are executed on this stage, not Stream Aggregate.
But if we are talking about execution, then data collection comes first.
Data must be collected and filtered by FROM, INNER JOIN and WHERE operators to let any other one to work with the collection.
Isn't it obvious?
_____________
Code for TallyGenerator
January 12, 2012 at 4:09 pm
I think the problem is lingual.
"Is executed" can have two meanings.
1. Present tense, passive form - the code is invoked. Can be compared to running "EXEC MyStoredProc" in SQL Server. As soon as you run that command, execution passes to the first statement of your stored procedure - your stored procedure "is executed".
2. Present tense with perfect aspect - execution has finished. When the execution of your stored procedure has finished and control is returned to the caller, the stored procedure "is executed" (though I think most English speakers would use "has executed" here).
Paul's intention, and my interpretation, is the first meaning: execution starts. And with that meaning, Stream Aggregate is the correct answer, This is the first operator that gets control, initializes itself, then passes control to (by calling) the next operator.
Another way to explain this is to remember that operators do not "push" rows up to their parent operator; operators "pull" rows from their descendants. (Or, as Rob Farley worded it, "execution plans suck").
January 12, 2012 at 4:52 pm
Hugo Kornelis (1/12/2012)
I think the problem is lingual."Is executed" can have two meanings.
1. Present tense, passive form - the code is invoked. Can be compared to running "EXEC MyStoredProc" in SQL Server. As soon as you run that command, execution passes to the first statement of your stored procedure - your stored procedure "is executed".
2. Present tense with perfect aspect - execution has finished. When the execution of your stored procedure has finished and control is returned to the caller, the stored procedure "is executed" (though I think most English speakers would use "has executed" here).
Paul's intention, and my interpretation, is the first meaning: execution starts. And with that meaning, Stream Aggregate is the correct answer, This is the first operator that gets control, initializes itself, then passes control to (by calling) the next operator.
Another way to explain this is to remember that operators do not "push" rows up to their parent operator; operators "pull" rows from their descendants. (Or, as Rob Farley worded it, "execution plans suck").
I don't think it's lingual.
The problem is in misunderstanding how SQL works.
As soon as you run that command, execution passes to the first statement of your stored procedure - your stored procedure "is executed".
Wrong!
As soon as you run this command the procedure (query) is parsed.
After parsing is completed optimiser looks for an execution plan (compiled code) for the SQL batch.
If it cannot be found then goes compilation.
Compilation figures out the future execution plan, and it makes it from end to the beginning.
And only if compiler can find all the objects, allocate memory for all variables, etc., and actually compile the code then comes the execution.
On compilation stage it's right, Stream Aggregation has to be compiled first to make an "output pipe" for other operations which are gonna to supply the data for the aggregation.
But on run time (it's when the statement is executed) Stream Aggregation comes last, as all others must be executed first to supply data for the aggregation.
_____________
Code for TallyGenerator
January 12, 2012 at 5:14 pm
Sergiy (1/12/2012)
Wrong!As soon as you run this command the procedure (query) is parsed.
After parsing is completed optimiser looks for an execution plan (compiled code) for the SQL batch.
If it cannot be found then goes compilation.
Compilation figures out the future execution plan, and it makes it from end to the beginning.
And only if compiler can find all the objects, allocate memory for all variables, etc., and actually compile the code then comes the execution.
On compilation stage it's right, Stream Aggregation has to be compiled first to make an "output pipe" for other operations which are gonna to supply the data for the aggregation.
But on run time (it's when the statement is executed) Stream Aggregation comes last, as all others must be executed first to supply data for the aggregation.
You are right about all the preparation steps. You are wrong about how the compiled plan is executed.
Every operator is a routine (or object) that can be invoked to execute one of the following methods: Init() (to initialize), GetNext() (to get the next row from the result of the operator), and Close() (to clean up and shut down).
Once all the preparation is done and SQL Server is ready to actually execute the query, it will simply call the first (top-left) operator of the plan. The first call will be to the Init() method, then the GetNext() method is called until no more rows are returned, and then Close() is called and execution of the query is complete.
All other operators are invoked directly or indirectly from the top-left operator.
See http://msdn.microsoft.com/en-us/library/ms191158.aspx for a description of how operators work.
January 12, 2012 at 6:08 pm
And here we come to the order of SELECT statements execution.
1. FROM
2. INNER JOIN
3. WHERE
...
SELECT comes somewhere 7th or 8th in line.
After that comes GROUP BY.
And that's where the aggregation is sitting.
You cannot argue, that outcome of aggregation depends on the definition of GROUP BY, can you?
Table scan happens in FROM (because there is no WHERE).
Which is the first one in line.
_____________
Code for TallyGenerator
January 12, 2012 at 7:46 pm
Let's modify the query from the QoD a litle bit:
DECLARE @a TABLE (a varchar(20) NOT NULL)
DECLARE @b-2 TABLE (b varchar(20) NOT NULL UNIQUE)
INSERT @a VALUES (1)
INSERT @a VALUES ('a')
INSERT @b-2 VALUES (1)
SELECT AVG(CONVERT(int, a))
FROM @a A
--JOIN @b-2 B ON B.b = A.a
If you run it like this you get an error:
Syntax error converting the varchar value 'a' to a column of data type int.
But if you uncomment the JOIN line it successfully completes the operation.
Why?
The aggregate operation works on a different data set.
Which is provided by previously executed FROM and JOIN.
_____________
Code for TallyGenerator
January 12, 2012 at 9:35 pm
Sergiy,
I can assure you it works very much like Hugo has described; I'm afraid most of what you have asserted is incorrect. The explanation I gave for the answer included this stack trace:
sqlservr!CQScanTableScanNew::GetRow
sqlservr!CQScanNLJoinTrivialNew::GetRow
sqlservr!CQScanCountStarNew::GetRowHelper
sqlservr!CQScanStreamAggregateNew::GetCalculatedRow
sqlservr!CQScanStreamAggregateNew::Open <-- ****
sqlservr!CXStmtQuery::SetupQueryScanAndExpression
sqlservr!CXStmtQuery::InitForExecute
Notice the Stream Aggregate is the deepest operator call on the stack, so it had control before the join or table scan. I have stepped through the modules many, many times using the debugger, so I really can assure you the aggregate code starts to run first. Other references you may have missed, but were provided in the explanation:
http://blogs.msdn.com/b/craigfr/archive/2006/06/08/622315.aspx
http://bradsruminations.blogspot.com/2010/11/second-in-life-of-query-operator.html http://sqlblog.com/blogs/paul_white/archive/2010/08/05/iterators-query-plans-and-why-they-run-backwards.aspx
You can find all sorts of SQL Server internals information on my blog (as in the last link there).
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
January 12, 2012 at 9:47 pm
Sergiy (1/12/2012)
Let's modify the query from the QoD a litle bit...If you run it like this you get an error:
Syntax error converting the varchar value 'a' to a column of data type int.
But if you uncomment the JOIN line it successfully completes the operation.
Why? The aggregate operation works on a different data set. Which is provided by previously executed FROM and JOIN.
The key to understanding this is that Compute Scalar operators define expressions (in this case, the interesting one defines [Expr1004] = Scalar Operator(CONVERT(int,@A.[a] as [A].[a],0))). As a performance optimization, the execution engine (more specifically the Expression Services component) generally defers evaluation of the expression until a higher operator in the tree (one that is lower on the call stack) actually needs the value to do something with it.
With the JOIN commented out, [Expr1004] is defined and then immediately required by the Stream Aggregate to compute these expressions: [Expr1005] = Scalar Operator(COUNT_BIG([Expr1004])), [Expr1006] = Scalar Operator(SUM([Expr1004])). Notice that both [Expr1005] and [Expr1006] reference [Expr1004], and causing [Expr1004] to be evaluated, generating the error message.
With the JOIN in place, the row is eliminated by the join (no matching row in table B) before the expression is evaluated by the aggregate, so no error occurs. To get the error with the JOIN in place, simply add a row to table B:
DECLARE @a TABLE (a varchar(20) NOT NULL)
DECLARE @b-2 TABLE (b varchar(20) NOT NULL UNIQUE)
INSERT @a VALUES (1)
INSERT @a VALUES ('a')
INSERT @b-2 VALUES (1)
INSERT @b-2 VALUES ('a') -- *** added this row so the join passes the value up the tree
SELECT AVG(CONVERT(int, a))
FROM @a A
JOIN @b-2 B ON B.b = A.a
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
January 12, 2012 at 10:00 pm
SQL Kiwi (1/12/2012)
With the JOIN in place, the row is eliminated by the join (no matching row in table B) before the expression is evaluated by the aggregate, so no error occurs.
Thank you for confirming my point!
That's exactly what I'm trying to prove:
JOIN is executed (eliminating the extra row) BEFORE aggregation even starts to evaluate any upcoming values.
Are you ready to correct the answer on your QoD?
😉
_____________
Code for TallyGenerator
Viewing 15 posts - 46 through 60 (of 103 total)
You must be logged in to reply to this topic. Login to reply