Adding "group by" or even "INTO #tmpTable" completely change Execution Plan (for the worse)

  • Hello

    I have a CTE with several select and a final select that runs in less than a second giving 2 fields and 5K rows. This is great. But...

    When I add group by Field1, Field2 then running time becomes 75 seconds and it is quite strange that grouping just 5K rows requires 74 seconds.

    If I don't group and just choose to insert the rows into a #temptable the time is again 75 seconds and I am pretty sure the time is due to executing the select, not filling and displaying the #temptable.

    Examining the query plans, in the latter 2 cases the query plan is completely changed (evidently for the worse).

    How can I tell the SQL Engine to use the faster plan?

    I have been trying on both 2005 and 2008R2 with similar results.

    Thankx

    W.

    P.S.

    Here's my query.

    At the moment it is unnecessary complicated because I tried to split the work into middle queries to "force" the Engine to behave better. At the end you see where I can add the "group by" or the "INTO #temptable" (removing the last select of course) to get the strange behaviour.

    ;with distribuzioni as(

    SELECT v.Versione_Id as Versione_Id,

    v.Risorsa_Id as Risorsa_Id,

    dp.Dispositivo_Id as Dispositivo_Id,

    d.Distinta_Id as Distinta_Id,

    p.statoPacchetto as StatoPacchetto,

    ROW_NUMBER() OVER(PARTITION BY v.Versione_Id, dp.Dispositivo_Id, d.Distinta_Id ORDER BY p.sottomissione) AS N

    FROM

    Versioni AS v

    INNER JOIN TagsApplicati AS ta ON v.Risorsa_Id = ta.Taggato_Id

    INNER JOIN Dispositivi AS dp ON ta.NomeTag = dp.MACAddress

    INNER JOIN Colli AS cl ON v.Versione_Id = cl.Versione_Id

    INNER JOIN Distinte d ON d.ColloSpedizione_Id = cl.Collo_Id

    INNER JOIN SottoDistinte sd ON d.Distinta_Id = sd.Distinta_Id

    INNER JOIN Pacchetti p ON sd.Pacchetto_Id = p.Pacchetto_Id

    INNER JOIN ComandiTrasporto ct ON ct.Pacchetto_Id = p.Pacchetto_Id

    AND ct.Dispositivo_Id = dp.Dispositivo_Id

    WHERE p.Sottomissione > '20120601'

    GROUP BY v.Risorsa_Id, v.Versione_Id,dp.Dispositivo_Id, d.Distinta_Id, p.statoPacchetto,p.sottomissione

    )

    ,indiciUltimiRetry as(

    SELECT Risorsa_Id,

    Versione_Id,

    Dispositivo_Id,

    Distinta_Id,

    MAX(N) ultimoRetry

    FROM distribuzioni

    GROUP BY Risorsa_Id,

    Versione_Id,

    Dispositivo_Id,

    Distinta_Id

    )

    ,StatiUltimiRetry as(

    SELECT dis.Risorsa_Id,

    dis.Versione_Id,

    dis.Dispositivo_Id,

    dis.Distinta_Id,

    dis.StatoPacchetto,

    iur.ultimoRetry

    FROM distribuzioni dis

    INNER JOIN indiciUltimiRetry iur ON dis.Risorsa_Id = iur.Risorsa_Id

    AND dis.Versione_Id = iur.Versione_Id

    AND dis.Dispositivo_Id = iur.Dispositivo_Id

    AND dis.Distinta_Id = iur.Distinta_Id

    AND dis.N = iur.ultimoRetry

    )

    ,NumDistribuzioniXperVersione as( --questo incredibile passo in piu' è fatto solo per *velocizzare* la query

    SELECT

    sur.Risorsa_Id as Risorsa_Id

    ,sur.versione_Id

    ,SUM(CASE WHEN(sur.StatoPacchetto = 10) THEN(1) ELSE(0) END) as NumDistribuzioniC

    ,SUM(CASE WHEN(sur.StatoPacchetto = 9) THEN(1) ELSE(0) END) as NumDistribuzioniE

    ,SUM(CASE WHEN(sur.StatoPacchetto = 8) THEN(1) ELSE(0) END) as NumDistribuzioniA

    ,SUM(CASE WHEN(sur.StatoPacchetto < 8 OR sur.StatoPacchetto > 10) THEN(1) ELSE(0) END) as NumAltriEsiti

    FROM StatiUltimiRetry sur

    GROUP BY sur.Risorsa_Id

    ,sur.Versione_Id

    )

    ,NumDistribuzioniXperRisorsa as(

    SELECT Risorsa_Id,

    SUM(NumDistribuzioniC) as NumDistribuzioniC,

    SUM(NumDistribuzioniE) as NumDistribuzioniE,

    SUM(NumDistribuzioniA) as NumDistribuzioniA,

    SUM(NumAltriEsiti) as NumAltriEsiti

    FROM NumDistribuzioniXperVersione as ndxpv

    GROUP BY Risorsa_Id

    )

    ,tempx as(

    SELECT sur.Risorsa_Id,sur.Versione_Id,sur.Dispositivo_Id, sur.Distinta_Id, sur.StatoPacchetto, sur.ultimoRetry, ndxr.NumDistribuzioniC, ndxr.NumDistribuzioniE, ndxr.NumDistribuzioniA, ndxr.NumAltriEsiti

    --into #tempTable

    FROM StatiUltimiRetry sur INNER JOIN NumDistribuzioniXperRisorsa ndxr

    ON sur.Risorsa_Id = ndxr.Risorsa_id

    )

    SELECT Risorsa_Id,Versione_Id--, AVG(ultimoretry)

    FROM tempx

    --group by risorsa_id,Versione_Id

  • Please post table definitions, index definitions and execution plan, as per http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • When you add something like GROUP BY, the plan is not the same because the processing is not the same. You can't apply a simple SELECT (or even a very complex SELECT like you have here) to a GROUP BY because the SELECT doesn't include the aggregation functions required by the GROUP BY operation. You're seeing different behavior because you're changing what is done.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • 1) I am sorry I didnt follow the directives to post a request about optimization. I didnt do that mostly because I (think I) already have a good plan that is not used all the times. I didnt want to bother someone to increase the quality of a complicated query till I am not sure i really have a problem.

    2) I can accept that adding a "group by" changes the boundary conditions of this problem... but what about adding "INTO #temptable" ? why should the SELECT part change if I want to use the result to fill a temp table?

    As soon as I can I'll post the rest of the data. Is there a way to add an attachment so that i can add the *.sqlplan files?

    Thankx a lot

    W.

  • Check this, look at the properties of the SELECT operator for all the plans you have. Considering the complexity of what you're doing, I suspect you'll see that the reason for early termination on the plans is Timeout. This means the optimizer is running out of time to find a good plan. This results in plan instability so even something small (ish) like adding the INSERT into TEMPDB affects the plan. But, I would completely expect to see a performance hit for loading this into tempdb. The changes in the plan are a little surprising, but not that much.

    You can post links. Scroll down while posting and you'll see a number of other options below.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • The optimizer struggles with chained CTE's where elements in the chain are at different levels of aggregation, which is exactly what you have here. Try running the results into a #temp table at various points. Personally, I'd do something like this (untested):

    ;with distribuzioni as(

    SELECT v.Versione_Id as Versione_Id,

    v.Risorsa_Id as Risorsa_Id,

    dp.Dispositivo_Id as Dispositivo_Id,

    d.Distinta_Id as Distinta_Id,

    p.statoPacchetto as StatoPacchetto,

    ROW_NUMBER() OVER(PARTITION BY v.Versione_Id, dp.Dispositivo_Id, d.Distinta_Id ORDER BY p.sottomissione) AS N

    FROM

    Versioni AS v

    INNER JOIN TagsApplicati AS ta ON v.Risorsa_Id = ta.Taggato_Id

    INNER JOIN Dispositivi AS dp ON ta.NomeTag = dp.MACAddress

    INNER JOIN Colli AS cl ON v.Versione_Id = cl.Versione_Id

    INNER JOIN Distinte d ON d.ColloSpedizione_Id = cl.Collo_Id

    INNER JOIN SottoDistinte sd ON d.Distinta_Id = sd.Distinta_Id

    INNER JOIN Pacchetti p ON sd.Pacchetto_Id = p.Pacchetto_Id

    INNER JOIN ComandiTrasporto ct ON ct.Pacchetto_Id = p.Pacchetto_Id

    AND ct.Dispositivo_Id = dp.Dispositivo_Id

    WHERE p.Sottomissione > '20120601'

    GROUP BY v.Risorsa_Id, v.Versione_Id,dp.Dispositivo_Id, d.Distinta_Id, p.statoPacchetto,p.sottomissione

    )

    --,indiciUltimiRetry as(

    --SELECT Risorsa_Id,

    --Versione_Id,

    --Dispositivo_Id,

    --Distinta_Id,

    --MAX(N) ultimoRetry

    --FROM distribuzioni

    --GROUP BY Risorsa_Id,

    --Versione_Id,

    --Dispositivo_Id,

    --Distinta_Id

    --)

    --,StatiUltimiRetry as(

    SELECT

    d.Risorsa_Id,

    d.Versione_Id,

    d.Dispositivo_Id,

    d.Distinta_Id,

    d.StatoPacchetto

    INTO #StatiUltimiRetry

    FROM (

    SELECT dis.Risorsa_Id,

    dis.Versione_Id,

    dis.Dispositivo_Id,

    dis.Distinta_Id,

    dis.StatoPacchetto,

    --iur.ultimoRetry

    dis.N,

    iur_ultimoRetry = MAX(N) OVER(PARTITION BY Risorsa_Id, Versione_Id, Dispositivo_Id, Distinta_Id)

    FROM distribuzioni dis

    ) d WHERE dis.N = iur_ultimoRetry

    --INNER JOIN indiciUltimiRetry iur ON dis.Risorsa_Id = iur.Risorsa_Id

    --AND dis.Versione_Id = iur.Versione_Id

    --AND dis.Dispositivo_Id = iur.Dispositivo_Id

    --AND dis.Distinta_Id = iur.Distinta_Id

    --AND dis.N = iur.ultimoRetry

    --)

    ,NumDistribuzioniXperVersione as( --questo incredibile passo in piu' è fatto solo per *velocizzare* la query

    SELECT

    sur.Risorsa_Id as Risorsa_Id

    ,sur.versione_Id

    ,SUM(CASE WHEN(sur.StatoPacchetto = 10) THEN(1) ELSE(0) END) as NumDistribuzioniC

    ,SUM(CASE WHEN(sur.StatoPacchetto = 9) THEN(1) ELSE(0) END) as NumDistribuzioniE

    ,SUM(CASE WHEN(sur.StatoPacchetto = 8) THEN(1) ELSE(0) END) as NumDistribuzioniA

    ,SUM(CASE WHEN(sur.StatoPacchetto < 8 OR sur.StatoPacchetto > 10) THEN(1) ELSE(0) END) as NumAltriEsiti

    FROM #StatiUltimiRetry sur

    GROUP BY sur.Risorsa_Id

    ,sur.Versione_Id

    )

    ,NumDistribuzioniXperRisorsa as(

    SELECT Risorsa_Id,

    SUM(NumDistribuzioniC) as NumDistribuzioniC,

    SUM(NumDistribuzioniE) as NumDistribuzioniE,

    SUM(NumDistribuzioniA) as NumDistribuzioniA,

    SUM(NumAltriEsiti) as NumAltriEsiti

    FROM NumDistribuzioniXperVersione as ndxpv

    GROUP BY Risorsa_Id

    )

    ,tempx as(

    SELECT sur.Risorsa_Id,sur.Versione_Id,sur.Dispositivo_Id, sur.Distinta_Id, sur.StatoPacchetto, sur.ultimoRetry, ndxr.NumDistribuzioniC, ndxr.NumDistribuzioniE, ndxr.NumDistribuzioniA, ndxr.NumAltriEsiti

    --into #tempTable

    FROM #StatiUltimiRetry sur INNER JOIN NumDistribuzioniXperRisorsa ndxr

    ON sur.Risorsa_Id = ndxr.Risorsa_id

    )

    SELECT Risorsa_Id,Versione_Id--, AVG(ultimoretry)

    FROM tempx

    --group by risorsa_id,Versione_Id

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thankx for your reply.

    Here you (should) find 4 attachments:

    - the 2 plans in the different conditions (with and without INTO in the final select)

    - the query

    - the scripts to generate tables, their indexes, keys and statistics

    I hope I did everything correctly this time.

    About your comment, i don't see any timeout.

    The plans are estimated or evaluated (i am attaching the used ones) very fast, no problems there.

    The difference is just approximately 70 seconds more in the execution.

    Besides, i don't see any early termination. The results of the query is the same, 5K rows.

    Thank you again

    W.

  • Yep. Looked at the first plan NO_INTO, and it's a timeout from the optimizer. That means, even the "good" plan is not necessarily an adequate plan. But that does explain the volatility of the plans you're seeing.

    To check this yourself, go to the first operator in the plan, the SELECT operator. Right click on it and select Properties from the menu choice. Near the bottom look for "Reason for early termination of statement optimization". First one is a timeout. I didn't even check the other because if it adds additional functionality, it's timing out too.

    I agree with the post above, breaking this apart might be a better approach.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Hi again

    your suggestions were indeed correct.

    I split the CTE in 2 smaller CTEs and everything is working perfectly now with great times. :w00t:

    So what's the teaching here (for the community)? Better not to chain too many select in a CTE when there are groupings at "different level"?

    Well, thank you very much to you two, at least i didnt waste these two last days of work!

    Wentu

  • j_wentu (11/29/2012)


    So what's the teaching here (for the community)? Better not to chain too many select in a CTE when there are groupings at "different level"?

    This gets my vote, by observation and experience.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (11/29/2012)


    j_wentu (11/29/2012)


    So what's the teaching here (for the community)? Better not to chain too many select in a CTE when there are groupings at "different level"?

    This gets my vote, by observation and experience.

    Pretty much.

    If you're seeing timeouts in the optimizer, then the plans are inherently unstable. The only way to fix them is to reduce complexity.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply