November 29, 2012 at 1:36 am
Hello
I have a CTE with several select and a final select that runs in less than a second giving 2 fields and 5K rows. This is great. But...
When I add group by Field1, Field2 then running time becomes 75 seconds and it is quite strange that grouping just 5K rows requires 74 seconds.
If I don't group and just choose to insert the rows into a #temptable the time is again 75 seconds and I am pretty sure the time is due to executing the select, not filling and displaying the #temptable.
Examining the query plans, in the latter 2 cases the query plan is completely changed (evidently for the worse).
How can I tell the SQL Engine to use the faster plan?
I have been trying on both 2005 and 2008R2 with similar results.
Thankx
W.
P.S.
Here's my query.
At the moment it is unnecessary complicated because I tried to split the work into middle queries to "force" the Engine to behave better. At the end you see where I can add the "group by" or the "INTO #temptable" (removing the last select of course) to get the strange behaviour.
;with distribuzioni as(
SELECT v.Versione_Id as Versione_Id,
v.Risorsa_Id as Risorsa_Id,
dp.Dispositivo_Id as Dispositivo_Id,
d.Distinta_Id as Distinta_Id,
p.statoPacchetto as StatoPacchetto,
ROW_NUMBER() OVER(PARTITION BY v.Versione_Id, dp.Dispositivo_Id, d.Distinta_Id ORDER BY p.sottomissione) AS N
FROM
Versioni AS v
INNER JOIN TagsApplicati AS ta ON v.Risorsa_Id = ta.Taggato_Id
INNER JOIN Dispositivi AS dp ON ta.NomeTag = dp.MACAddress
INNER JOIN Colli AS cl ON v.Versione_Id = cl.Versione_Id
INNER JOIN Distinte d ON d.ColloSpedizione_Id = cl.Collo_Id
INNER JOIN SottoDistinte sd ON d.Distinta_Id = sd.Distinta_Id
INNER JOIN Pacchetti p ON sd.Pacchetto_Id = p.Pacchetto_Id
INNER JOIN ComandiTrasporto ct ON ct.Pacchetto_Id = p.Pacchetto_Id
AND ct.Dispositivo_Id = dp.Dispositivo_Id
WHERE p.Sottomissione > '20120601'
GROUP BY v.Risorsa_Id, v.Versione_Id,dp.Dispositivo_Id, d.Distinta_Id, p.statoPacchetto,p.sottomissione
)
,indiciUltimiRetry as(
SELECT Risorsa_Id,
Versione_Id,
Dispositivo_Id,
Distinta_Id,
MAX(N) ultimoRetry
FROM distribuzioni
GROUP BY Risorsa_Id,
Versione_Id,
Dispositivo_Id,
Distinta_Id
)
,StatiUltimiRetry as(
SELECT dis.Risorsa_Id,
dis.Versione_Id,
dis.Dispositivo_Id,
dis.Distinta_Id,
dis.StatoPacchetto,
iur.ultimoRetry
FROM distribuzioni dis
INNER JOIN indiciUltimiRetry iur ON dis.Risorsa_Id = iur.Risorsa_Id
AND dis.Versione_Id = iur.Versione_Id
AND dis.Dispositivo_Id = iur.Dispositivo_Id
AND dis.Distinta_Id = iur.Distinta_Id
AND dis.N = iur.ultimoRetry
)
,NumDistribuzioniXperVersione as( --questo incredibile passo in piu' è fatto solo per *velocizzare* la query
SELECT
sur.Risorsa_Id as Risorsa_Id
,sur.versione_Id
,SUM(CASE WHEN(sur.StatoPacchetto = 10) THEN(1) ELSE(0) END) as NumDistribuzioniC
,SUM(CASE WHEN(sur.StatoPacchetto = 9) THEN(1) ELSE(0) END) as NumDistribuzioniE
,SUM(CASE WHEN(sur.StatoPacchetto = 8) THEN(1) ELSE(0) END) as NumDistribuzioniA
,SUM(CASE WHEN(sur.StatoPacchetto < 8 OR sur.StatoPacchetto > 10) THEN(1) ELSE(0) END) as NumAltriEsiti
FROM StatiUltimiRetry sur
GROUP BY sur.Risorsa_Id
,sur.Versione_Id
)
,NumDistribuzioniXperRisorsa as(
SELECT Risorsa_Id,
SUM(NumDistribuzioniC) as NumDistribuzioniC,
SUM(NumDistribuzioniE) as NumDistribuzioniE,
SUM(NumDistribuzioniA) as NumDistribuzioniA,
SUM(NumAltriEsiti) as NumAltriEsiti
FROM NumDistribuzioniXperVersione as ndxpv
GROUP BY Risorsa_Id
)
,tempx as(
SELECT sur.Risorsa_Id,sur.Versione_Id,sur.Dispositivo_Id, sur.Distinta_Id, sur.StatoPacchetto, sur.ultimoRetry, ndxr.NumDistribuzioniC, ndxr.NumDistribuzioniE, ndxr.NumDistribuzioniA, ndxr.NumAltriEsiti
--into #tempTable
FROM StatiUltimiRetry sur INNER JOIN NumDistribuzioniXperRisorsa ndxr
ON sur.Risorsa_Id = ndxr.Risorsa_id
)
SELECT Risorsa_Id,Versione_Id--, AVG(ultimoretry)
FROM tempx
--group by risorsa_id,Versione_Id
November 29, 2012 at 1:40 am
Please post table definitions, index definitions and execution plan, as per http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 29, 2012 at 5:19 am
When you add something like GROUP BY, the plan is not the same because the processing is not the same. You can't apply a simple SELECT (or even a very complex SELECT like you have here) to a GROUP BY because the SELECT doesn't include the aggregation functions required by the GROUP BY operation. You're seeing different behavior because you're changing what is done.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
November 29, 2012 at 6:25 am
1) I am sorry I didnt follow the directives to post a request about optimization. I didnt do that mostly because I (think I) already have a good plan that is not used all the times. I didnt want to bother someone to increase the quality of a complicated query till I am not sure i really have a problem.
2) I can accept that adding a "group by" changes the boundary conditions of this problem... but what about adding "INTO #temptable" ? why should the SELECT part change if I want to use the result to fill a temp table?
As soon as I can I'll post the rest of the data. Is there a way to add an attachment so that i can add the *.sqlplan files?
Thankx a lot
W.
November 29, 2012 at 6:33 am
Check this, look at the properties of the SELECT operator for all the plans you have. Considering the complexity of what you're doing, I suspect you'll see that the reason for early termination on the plans is Timeout. This means the optimizer is running out of time to find a good plan. This results in plan instability so even something small (ish) like adding the INSERT into TEMPDB affects the plan. But, I would completely expect to see a performance hit for loading this into tempdb. The changes in the plan are a little surprising, but not that much.
You can post links. Scroll down while posting and you'll see a number of other options below.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
November 29, 2012 at 7:01 am
The optimizer struggles with chained CTE's where elements in the chain are at different levels of aggregation, which is exactly what you have here. Try running the results into a #temp table at various points. Personally, I'd do something like this (untested):
;with distribuzioni as(
SELECT v.Versione_Id as Versione_Id,
v.Risorsa_Id as Risorsa_Id,
dp.Dispositivo_Id as Dispositivo_Id,
d.Distinta_Id as Distinta_Id,
p.statoPacchetto as StatoPacchetto,
ROW_NUMBER() OVER(PARTITION BY v.Versione_Id, dp.Dispositivo_Id, d.Distinta_Id ORDER BY p.sottomissione) AS N
FROM
Versioni AS v
INNER JOIN TagsApplicati AS ta ON v.Risorsa_Id = ta.Taggato_Id
INNER JOIN Dispositivi AS dp ON ta.NomeTag = dp.MACAddress
INNER JOIN Colli AS cl ON v.Versione_Id = cl.Versione_Id
INNER JOIN Distinte d ON d.ColloSpedizione_Id = cl.Collo_Id
INNER JOIN SottoDistinte sd ON d.Distinta_Id = sd.Distinta_Id
INNER JOIN Pacchetti p ON sd.Pacchetto_Id = p.Pacchetto_Id
INNER JOIN ComandiTrasporto ct ON ct.Pacchetto_Id = p.Pacchetto_Id
AND ct.Dispositivo_Id = dp.Dispositivo_Id
WHERE p.Sottomissione > '20120601'
GROUP BY v.Risorsa_Id, v.Versione_Id,dp.Dispositivo_Id, d.Distinta_Id, p.statoPacchetto,p.sottomissione
)
--,indiciUltimiRetry as(
--SELECT Risorsa_Id,
--Versione_Id,
--Dispositivo_Id,
--Distinta_Id,
--MAX(N) ultimoRetry
--FROM distribuzioni
--GROUP BY Risorsa_Id,
--Versione_Id,
--Dispositivo_Id,
--Distinta_Id
--)
--,StatiUltimiRetry as(
SELECT
d.Risorsa_Id,
d.Versione_Id,
d.Dispositivo_Id,
d.Distinta_Id,
d.StatoPacchetto
INTO #StatiUltimiRetry
FROM (
SELECT dis.Risorsa_Id,
dis.Versione_Id,
dis.Dispositivo_Id,
dis.Distinta_Id,
dis.StatoPacchetto,
--iur.ultimoRetry
dis.N,
iur_ultimoRetry = MAX(N) OVER(PARTITION BY Risorsa_Id, Versione_Id, Dispositivo_Id, Distinta_Id)
FROM distribuzioni dis
) d WHERE dis.N = iur_ultimoRetry
--INNER JOIN indiciUltimiRetry iur ON dis.Risorsa_Id = iur.Risorsa_Id
--AND dis.Versione_Id = iur.Versione_Id
--AND dis.Dispositivo_Id = iur.Dispositivo_Id
--AND dis.Distinta_Id = iur.Distinta_Id
--AND dis.N = iur.ultimoRetry
--)
,NumDistribuzioniXperVersione as( --questo incredibile passo in piu' è fatto solo per *velocizzare* la query
SELECT
sur.Risorsa_Id as Risorsa_Id
,sur.versione_Id
,SUM(CASE WHEN(sur.StatoPacchetto = 10) THEN(1) ELSE(0) END) as NumDistribuzioniC
,SUM(CASE WHEN(sur.StatoPacchetto = 9) THEN(1) ELSE(0) END) as NumDistribuzioniE
,SUM(CASE WHEN(sur.StatoPacchetto = 8) THEN(1) ELSE(0) END) as NumDistribuzioniA
,SUM(CASE WHEN(sur.StatoPacchetto < 8 OR sur.StatoPacchetto > 10) THEN(1) ELSE(0) END) as NumAltriEsiti
FROM #StatiUltimiRetry sur
GROUP BY sur.Risorsa_Id
,sur.Versione_Id
)
,NumDistribuzioniXperRisorsa as(
SELECT Risorsa_Id,
SUM(NumDistribuzioniC) as NumDistribuzioniC,
SUM(NumDistribuzioniE) as NumDistribuzioniE,
SUM(NumDistribuzioniA) as NumDistribuzioniA,
SUM(NumAltriEsiti) as NumAltriEsiti
FROM NumDistribuzioniXperVersione as ndxpv
GROUP BY Risorsa_Id
)
,tempx as(
SELECT sur.Risorsa_Id,sur.Versione_Id,sur.Dispositivo_Id, sur.Distinta_Id, sur.StatoPacchetto, sur.ultimoRetry, ndxr.NumDistribuzioniC, ndxr.NumDistribuzioniE, ndxr.NumDistribuzioniA, ndxr.NumAltriEsiti
--into #tempTable
FROM #StatiUltimiRetry sur INNER JOIN NumDistribuzioniXperRisorsa ndxr
ON sur.Risorsa_Id = ndxr.Risorsa_id
)
SELECT Risorsa_Id,Versione_Id--, AVG(ultimoretry)
FROM tempx
--group by risorsa_id,Versione_Id
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
November 29, 2012 at 7:05 am
Thankx for your reply.
Here you (should) find 4 attachments:
- the 2 plans in the different conditions (with and without INTO in the final select)
- the query
- the scripts to generate tables, their indexes, keys and statistics
I hope I did everything correctly this time.
About your comment, i don't see any timeout.
The plans are estimated or evaluated (i am attaching the used ones) very fast, no problems there.
The difference is just approximately 70 seconds more in the execution.
Besides, i don't see any early termination. The results of the query is the same, 5K rows.
Thank you again
W.
November 29, 2012 at 7:16 am
Yep. Looked at the first plan NO_INTO, and it's a timeout from the optimizer. That means, even the "good" plan is not necessarily an adequate plan. But that does explain the volatility of the plans you're seeing.
To check this yourself, go to the first operator in the plan, the SELECT operator. Right click on it and select Properties from the menu choice. Near the bottom look for "Reason for early termination of statement optimization". First one is a timeout. I didn't even check the other because if it adds additional functionality, it's timing out too.
I agree with the post above, breaking this apart might be a better approach.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
November 29, 2012 at 8:18 am
Hi again
your suggestions were indeed correct.
I split the CTE in 2 smaller CTEs and everything is working perfectly now with great times. :w00t:
So what's the teaching here (for the community)? Better not to chain too many select in a CTE when there are groupings at "different level"?
Well, thank you very much to you two, at least i didnt waste these two last days of work!
Wentu
November 29, 2012 at 8:26 am
j_wentu (11/29/2012)
So what's the teaching here (for the community)? Better not to chain too many select in a CTE when there are groupings at "different level"?
This gets my vote, by observation and experience.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
November 29, 2012 at 8:29 am
ChrisM@Work (11/29/2012)
j_wentu (11/29/2012)
So what's the teaching here (for the community)? Better not to chain too many select in a CTE when there are groupings at "different level"?This gets my vote, by observation and experience.
Pretty much.
If you're seeing timeouts in the optimizer, then the plans are inherently unstable. The only way to fix them is to reduce complexity.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply