January 12, 2012 at 10:07 pm
Sergiy (1/12/2012)
That's exactly what I'm trying to prove:JOIN is executed (eliminating the extra row) BEFORE aggregation even starts to evaluate any upcoming values.
You are confusing the point in time at which an iterator first starts executing, with later times when rows flow, generally one at a time, up the plan. The aggregate starts executing first, as I have already proved. You really need to read those articles I linked to, to clarify your thinking here.
January 12, 2012 at 10:21 pm
Here is the full execution tree for my batch:
StmtText StmtId NodeId Parent PhysicalOp LogicalOp Argument DefinedValues EstimateRows EstimateIO EstimateCPU AvgRowSize TotalSubtreeCost OutputList Warnings Type Parallel EstimateExecutions
-------- ------ ------ ------ ---------- --------- -------- ------------- ------------ ---------- ----------- ---------- ---------------- ---------- -------- ---- -------- ------------------
Compute Scalar(DEFINE: ([Expr1002]=If ([Expr1009]=0) then NULL else ([Expr1010]/Convert([Expr1009])))) 0 0 Compute Scalar Compute Scalar DEFINE: ([Expr1002]=If ([Expr1009]=0) then NULL else ([Expr1010]/Convert([Expr1009]))) [Expr1002]=If ([Expr1009]=0) then NULL else ([Expr1010]/Convert([Expr1009])) 1 0 2.5E-007 11 0.0411916 [Expr1002] PLAN_ROW 0 1
|--Stream Aggregate(DEFINE: ([Expr1009]=Count(*), [Expr1010]=SUM(Convert([A].[a])))) 0 1 0 Stream Aggregate Aggregate [Expr1009]=Count(*), [Expr1010]=SUM(Convert([A].[a])) 1 0 2.5E-007 11 0.0411916 [Expr1009], [Expr1010] PLAN_ROW 0 1
|--Nested Loops(Inner Join, OUTER REFERENCES: ([A].[a])) 0 2 1 Nested Loops Inner Join OUTER REFERENCES: ([A].[a]) 1 0 1.254E-005 45 0.0411914 [A].[a] PLAN_ROW 0 1
|--Table Scan(OBJECT: (@A AS [A])) 0 3 2 Table Scan Table Scan OBJECT: (@A AS [A]) [A].[a] 1 0.0375785 7.96E-005 27 0.0376581 [A].[a] PLAN_ROW 0 1
|--Index Seek(OBJECT: (@B AS ), SEEK: (.=[A].[a]) ORDERED FORWARD) 0 4 2 Index Seek Index Seek OBJECT: (@B AS ), SEEK: (.=[A].[a]) ORDERED FORWARD 1 0.0032034 7.9603E-005 27 0.00352071 PLAN_ROW 0 3
It's obvious that the 1st statement (Node 0) in the execution plan calculates the division of 2 expressions calculated by Stream Aggregate in Node 1.
All those talks about getting rows, streaming etc. cannot be used here as there are just 2 values in memory which must be divided one by another.
Node 0 cannot be eecuted before Node 1 is completed.
Simple as that.
So, the operation mentioned 1st in the plan is actually executed last.
Higher node number indicates higher priority of execution.
According to the plan, the 1st operation would be index seek on table @B.
Then comes table scan on table @a.
THis assumption is perfectly confirmed by "Scan" events in Profiler:
Show Plan All13V1\SQL10X1200000005000000E7000000CA00000010000000530074006D007400540065007800740038000000040000000C000000530074006D00740049006400380000902012-01-13 16:12:23.367
Scan:Started2643352458902012-01-13 16:12:23.367
Scan:Started2659352515902012-01-13 16:12:23.367
Scan:Started2659352515902012-01-13 16:12:23.367
Single scan goes against the index having a single value - table @b-2, and double scan goes against the table having 2 values - Table @a.
Just like the execution plan has indicated.
I don't see any reason why different logic should be applied to the rest of the execution plan.
ready to fix the answer?
_____________
Code for TallyGenerator
January 12, 2012 at 10:39 pm
Sergiy (1/12/2012)
It's obvious that the 1st statement (Node 0) in the execution plan calculates the division of 2 expressions calculated by Stream Aggregate in Node 1. All those talks about getting rows, streaming etc. cannot be used here as there are just 2 values in memory which must be divided one by another. Node 0 cannot be eecuted before Node 1 is completed.Simple as that.
Not at all. Code execution starts at the root of the plan, which you have labelled Node 0. The code that executes in sqlserver.exe at this point is sqlservr!CXStmtQuery::InitForExecute which then calls sqlservr!CXStmtQuery::SetupQueryScanAndExpression. Control then passes to sqlservr!CQScanStreamAggregateNew::Open.
There really is no clearer demonstration than watching SQL Server execute code in a debugger - this is a much, much deeper view than the execution plan provides. Tens of thousands of lines of code are executed before sqlservr!CQScanTableScanNew eventually calls into the Storage Engine to set up the scan you see reported by Profiler. By the way, with a debugger attached, one can see the code that fires Trace Events like Scan:Started executing. Again, this is a long, long time (in processor terms!) after the Stream Aggregate initialized and then started executing, before asking the Nested Loops join for the first row.
ready to fix the answer?
Certainly not. If you are not prepared to accept the hard evidence presented here, the arguments presented by three current MVPs,
articles from people like Craig Freedman (who wrote parts of the SQL Server code over many years at Microsoft), Books Online, Microsoft Press books like the Inside SQL Server series...and so on...I can only suggest you pop in next time your are in New Zealand: I will attach a debugger to SQL Server and show you the order in which things happen.
January 12, 2012 at 10:43 pm
SQL Kiwi (1/12/2012)
You really need to read those articles I linked to, to clarify your thinking here.
There are so many holes and false assumptions in those articles...
Some of them are indicated by the test plan I posted:
why index seek on @b-2 comes before table scan on @a?
It's a fact, confirmed by the trace. No place for speculations.
Now try to find an explanation.
When you find it (I'm sure you will) you'll see that the explanation for "TOP 2" puzzle in one of the articles is totally wrong.
I'm sorry, gotta go now, probably next time, when I have more time, will continue.
But I believe you are good enough to get to the point without my further help.
_____________
Code for TallyGenerator
January 12, 2012 at 10:53 pm
SQL Kiwi (1/12/2012)
If you are not prepared to accept the hard evidence presented here, the arguments presented by three current MVPs,articles from people like Craig Freedman (who wrote parts of the SQL Server code over many years at Microsoft), Books Online, Microsoft Press books like the Inside SQL Server series...and so on...
I was just talking to another guy who was developing SQL Server 2008.
Had to provide him with hard evidences that his believes about table variables are wrong.
After reading articles from MVP and in BoL about "in-memory tables variables" I din't trust anyone's statements without verification.
I can only suggest you pop in next time your are in New Zealand: I will attach a debugger to SQL Server and show you the order in which things happen.
You mean "when I'm in Wellington"?
I'm not so far away from you, just in that big blob up there on the map of NZ.
🙂
_____________
Code for TallyGenerator
January 12, 2012 at 10:58 pm
Sergiy (1/12/2012)
There are so many holes and false assumptions in those articles...
No, they're just telling you things you don't want to listen to. If you don't believe the people that wrote the software you are running, what hope is there?
Some of them are indicated by the test plan I posted:
why index seek on @b-2 comes before table scan on @a?
It's a fact, confirmed by the trace. No place for speculations.
Now try to find an explanation.
I have no idea what you are referring to here. The scan certainly emits the first row, which the join asked for. The join then asks the seek for a row so it can check for a match. It's really not that challenging a concept. You seem to be assuming (among many things!) that the scan completes before the join starts. Where do the rows go? Think about this and you will realize the error in your thinking. You also need to realize that FROM clauses and WHERE clauses don't execute. They are logical concepts, ones you are confusing with the physical sequence of events that occurs as a result of executing a query plan.
I'm sorry, gotta go now, probably next time, when I have more time, will continue.
No hurry. The offer of a personal demonstration stands. 😉
January 12, 2012 at 11:00 pm
Sergiy (1/12/2012)
After reading articles from MVP and in BoL about "in-memory tables variables" I din't trust anyone's statements without verification.
So attach a debugger and verify it, like I did.
January 12, 2012 at 11:03 pm
Sergeiy,
As Paul said, there's no better way to see what's actually going on than to watch it in a debugger. I gather you haven't done that (because you don't know what actualy happens). I can assure you that observation of a real life experiment is far more reliable than believing something you have been (or perhaps just think you have been) told, so I suggest you use a debugger to observe what actually happens, rather than what some book may have told you.
Tom
January 13, 2012 at 12:37 am
Sergiy, I decided not to repeat comments already made by Paul. But here is a point he did not yet address, so allow me to do so:
Sergiy (1/12/2012)
And here we come to the order of SELECT statements execution.1. FROM
2. INNER JOIN
3. WHERE
...
SELECT comes somewhere 7th or 8th in line.
After that comes GROUP BY.
And that's where the aggregation is sitting.
You cannot argue, that outcome of aggregation depends on the definition of GROUP BY, can you?
That's the logical order of processing. It is documented in Books Online (I assume that's wherre you took this list from?) here: http://msdn.microsoft.com/en-us/library/ms189499.aspx. The text above this list even explicitly states that the actual physical execution may vary from this list.
January 13, 2012 at 12:54 am
Sergiy (1/12/2012)
JOIN is executed (eliminating the extra row) BEFORE aggregation even starts to evaluate any upcoming values.
Yes - before aggregation starts to evaluate any upcoming values. But after it starts executing.
In extremely simplified pseudo-code, here is what the Stream Aggregate operator does:
// Initialize work variables
int RowCount := 0;
bool EndOfData := false;
// Lots more code
// Return next row to caller
object DataReturned;
while (not (EndOfData))
{
// More code
Execute NestedLoops(DataReturned OUTPUT, EndOfData OUTPUT);
if not(EndOfData)
{
RowCount +:= 1;
// Lots more code
}
// Yet more code
}
// Return data to caller
Execute CreateRow (RowCount);
// Clean up
Destroy DataReturned;
// Lots more code
(These are actually three different methods, invoked in the above order)
As you see, all the initialization code and some of the code in GetNext() is executed before the NestedLoops operator is first called (bolded in the pseudo-code above). The rest of the code (including the processing of the actual data) is then executed after execution returns from NestedLoops. For this particular execution plan, the first statements executed are part of StreamAggregate, and the last statements executed are part of StreamAggregate as well. Execution of the other operators is somewhere in the middle.
January 13, 2012 at 1:59 am
SQL Kiwi (1/12/2012)
You seem to be assuming (among many things!) that the scan completes before the join starts.
I'm not assuming anything.
I state what I see from the Profiler trace: index seek on @b-2 comes first, right after after reading the execution plan. Then come 2 scans of table scan on @a. And then come the rest.
Where do the rows go? Think about this and you will realize the error in your thinking.
Where do the rows come from? Think about this and you will realize the error in your thinking.
🙂
You claim that opening handlers IS execution.
OK, let's pretend for a minute we've agreed on that.
Despite the fact that your question was about stream aggregation, and all explanations in all articles indicate that no actual stream aggregation is happening until the first row is fetched and processed by scans and join.
But anyway - we called the method and started execution of the stream aggregation.
Then what?
We need data.
So we start JOIN.
Open handlers, initiate and call scans.
Scans have no one to call, so they start to fetch rows.
Where are we with aggregate at this time?
Is it executed?
Hm, not quite. It still is waiting for some data from JOIN.
It's still being executed.
OK, we're done with table and index scans.
Rows are fetched and filtered.
Scans are now executed. And closed.
Inner loop has got its data and is now doing its thing.
What about the stream aggregate?
It's still waiting for the input from JOIN, still (according to your definitions) being executed.
When JOIN is done we again pass the data one level up and close the method.
All done, JOIN is executed and finished.
Where are we with the aggregate?
It's got its record(s) and now has something to do.
But it still has to figure out the answer, execute some code, while all other parts of the query are already executed and discarded.
So, here we go.
If to read "is executed" as "being processed" then table scans are executed at the same time as the stream aggregate. And the stream aggregate is executed before, in parallel and after the scans.
If to read "is executed" as "completed" then the scans are executed before the stream aggregate.
I leave to you to choose which option you prefer.
Either one does not match your answer on the QoD.
_____________
Code for TallyGenerator
January 13, 2012 at 2:25 am
Sergiy (1/13/2012)
So, here we go.If to read "is executed" as "being processed" then table scans are executed at the same time as the stream aggregate. And the stream aggregate is executed before, in parallel and after the scans.
If to read "is executed" as "completed" then the scans are executed before the stream aggregate.
And if you read "is executed" as "is invoked" / "has its execution started" (which, interestingly, is triggered by the EXECUTE keyword in T-SQL (and many other languages)), then the stream aggregate is executed before all the other operators.
On a side note, I would use the term "is being executed" for "being processed", "has been executed" for "completed", and "is executed" for "is invoked" - But I'm not a native English speaker, so maybe this is just a shortcoming in my mastery of the English language.
January 13, 2012 at 2:59 am
Sergiy (1/12/2012)
Here is the full execution tree for my batch: [...snip...]According to the plan, the 1st operation would be index seek on table @B.
Then comes table scan on table @a.
Are you seriously saying that in the following plan, you think the seek is executed first?
The seek predicate is @B.b = @a.[a]. It cannot possibly perform that index seek before a value for 'a' has been read from table a.
January 13, 2012 at 3:41 am
You claim that opening handlers IS execution. OK, let's pretend for a minute we've agreed on that.
Yes, code execution (as shown in the stack trace from the debugger) is indeed execution. The code that first executes is the code that executes first. Consider that an earlier stack trace shows this code is executing:
sqlservr!CQScanStreamAggregateNew::Open
sqlservr!CXStmtQuery::SetupQueryScanAndExpression
sqlservr!CXStmtQuery::InitForExecute
There's really nothing clearer than that to show the stream aggregate code executes first.
Despite the fact that your question was about stream aggregation, and all explanations in all articles indicate that no actual stream aggregation is happening until the first row is fetched and processed by scans and join.
The question was not about Stream Aggregation. That the Stream Aggregate code is executed first is the correct answer to the question (which is titled Query Plan Execution Order, nothing about stream aggregates). It is a follow-on from my previous question Table Access Order[/url]. Together, the two questions explore the difference between which plan operator executes first, and which one accesses data first.
But anyway - we called the method and started execution of the stream aggregation. Then what?
Doesn't matter 'then what'. The stream aggregate code is executed first, as you say.
January 13, 2012 at 6:48 am
SQL Kiwi (1/13/2012)
You claim that opening handlers IS execution. OK, let's pretend for a minute we've agreed on that.
Yes, code execution (as shown in the stack trace from the debugger) is indeed execution. The code that first executes is the code that executes first. Consider that an earlier stack trace shows this code is executing:
sqlservr!CQScanStreamAggregateNew::Open
sqlservr!CXStmtQuery::SetupQueryScanAndExpression
sqlservr!CXStmtQuery::InitForExecute
There's really nothing clearer than that to show the stream aggregate code executes first.
It says "InitForExecute", I cannot see any execution happening here.
Let's translate the question into common task.
You're at the station fueling your car.
The action takes 3 operations:
1. Pump pushing the fuel;
2. Counter counting the volume;
3. Nozzle pouring the fuel into the car.
Now - which one is executed first?
You say - pouring is coming 1st, because before counter can start counting you must open the cap and put the nozzle into the hole.
All right, but it's just a preparation for pouring, no pouring is happening before the pump starts pumping and counter actually starts counting.
I know what you're gonna say. My brain is damaged by science as well. 🙂
But it's weekend.
Go out there, talk to some real people and try to explain them that pouring happens before pumping.
See what they reply.
_____________
Code for TallyGenerator
Viewing 15 posts - 61 through 75 (of 103 total)
You must be logged in to reply to this topic. Login to reply