Strange UNION ALL behaviour

  • One of my procedures started to perform badly and I found out that it was due to a concatenation step with UNION ALL.

    If I executed the two queries separately I got times around 0.5 seconds each, if I executed the two queries unioned together with UNION ALL I got times around 35 seconds.

    How is it possible??? I know that UNION does a sort/distinct step, but I thought that UNION ALL was just a plain concatenation of the two queries. How can I get such different execution times?

    -- Gianluca Sartori

  • Are the actual query plans identical for the queries with and without UNION ALL ?

  • No, the two queries get absolutely different plans when unioned together.

    That's what I don't understand. How can UNION ALL change the plan so dramatically?

    As a side note, yesterday (when I first noticed this behaviour) the server was very heavily loaded. Today it's not, and if I run the unioned query, I get very similar times to the non unioned version. Another thing I don't understand.

    I attached the query plans.

    -- Gianluca Sartori

  • I'll start with a health warning - please test anything I say as I'm still a novice at this (and dear Experts, please be gentle if you need to shoot me down!)

    Health warning aside, the Unioned query isn't quite the same as the two separate queries. In the separate queries you insert the results of each into a table variable then select the combined results. In the Unioned query you just do the select. I've had times where

    INSERT @results

    SELECT ...

    and

    --INSERT @results

    SELECT ...

    performed very differently and generated quite different plans.

    Also, you said it performs better today. Does it still use the same plan for the union version? The union version added in a couple of lazy spools - could writing to tempdb on a heavily loaded server impact the performance?

  • Gianluca Sartori (7/1/2010)


    No, the two queries get absolutely different plans when unioned together. That's what I don't understand. How can UNION ALL change the plan so dramatically?

    The simple answer is because the whole batch is optimised all-at-once. The optimiser does not split the query into two (at the UNION ALL) and optimise each part separately.

    There is an iterative process of applying rules, transformations, and heuristics. After each step it takes a breath to see if a 'good enough' plan has been found. As you know, the goal of the optimiser is not to find the absolute best plan out of the millions of alternatives - you would not like the compilation times!

    An important aspect of optimisation is knowing when to stop. If the optimiser considers that it will take more time to find an incrementally better plan than finding such a plan would save (over the current best plan) it stops looking.

    Even if optimisation proceeds through all possible phases, the introduction of a UNION ALL does change the options available. A UNION ALL can be implemented by the Concatenation iterator or by a Merge iterator. The optimiser may evaluate both alternatives, and subtle changes in the space of possible plans explored can result in different plans being produced.

    As a side note, yesterday (when I first noticed this behaviour) the server was very heavily loaded. Today it's not, and if I run the unioned query, I get very similar times to the non unioned version. Another thing I don't understand.

    This implies that the optimiser did find a plan that would normally execute quickly.

    As Paul Jones mentioned the extra spools might have been the cause of the extra execution time - if tempdb was under extreme pressure. Normally, worktables do not get written out to disk.

    Another explanation is that your query was simply waiting on a memory grant. There are a number of hash and sort operations in the plan, each of which requires a memory grant (though some may be re-used by later iterators). If you only ran the queries once, or were just plain unlucky, it might be that the UNION ALL version just happened to get stuck waiting for a memory grant.

    You would need to examine sys.dm_exec_query_memory_grants and sys.dm_os_waiting_tasks to determine the cause of the wait, while the query was executing.

    Paul

  • Paul White NZ (7/2/2010)


    The simple answer is because the whole batch is optimised all-at-once. The optimiser does not split the query into two (at the UNION ALL) and optimise each part separately.

    Thank you Paul! I was really shocked by the difference in execution times. I'm still talking about execution times and not execution plans, because I couldn't capture the plans at the time the issue arised.

    I simply tried the two unioned queries separately to see which part needed to be optimised, to discover that they both performed well.

    I decided to fix the thing using a table variable, but I didn't save the plans. Do you think it's still possible to get the plan from some DMV?

    Paul White NZ (7/2/2010)


    Another explanation is that your query was simply waiting on a memory grant.

    I'm under the impression that it's not so. It was late at night, so I couldn't test things completely, but this behaviour was repeatable and I tried executing the unioned version many times. Anyway, I could be wrong.

    -- Gianluca Sartori

  • Gianluca Sartori (7/2/2010)


    I decided to fix the thing using a table variable, but I didn't save the plans. Do you think it's still possible to get the plan from some DMV?

    I doubt the plan is still cached - but it is possible. Check sys.dm_exec_cached_plans.

    Gianluca Sartori (7/2/2010)


    I'm under the impression that it's not so. It was late at night, so I couldn't test things completely, but this behaviour was repeatable and I tried executing the unioned version many times. Anyway, I could be wrong.

    Fair enough - I was assuming the plans you posted were from the time the problem occurred.

    Your query (at least the part that was captured in the XML) is complex enough that you might not be able to rely on the optimiser always producing a reasonable plan. If consistent performance is critical to you, I would be tempted to use hints to restrict the number of possible plans open to the optimiser.

  • Gianluca Sartori (7/2/2010)


    I decided to fix the thing using a table variable, but I didn't save the plans.

    Just a thought... My recommendation would be to exercise the query that returns the largest number of rows first and use that to spontaneously create a Temp Table using SELECT/INTO. Then do an insert to simulate the rest of the union all.

    As you know, Temp Tables and Table Variables both usually start out in memory and will make the leap to TempDB disk space only if the size warrents. The SELECT/INTO followed by an INSERT/SELECT will be a fair bit faster than creating a table (variable or otherwise) and doing two INSERT/SELECTs.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Paul White NZ (7/4/2010)


    I doubt the plan is still cached - but it is possible. Check sys.dm_exec_cached_plans.

    That's where I looked for it. Never mind.

    Paul White NZ (7/4/2010)


    I would be tempted to use hints to restrict the number of possible plans open to the optimiser.

    I was wondering if this could help. I'll give it a go and let you know.

    Thank you.

    -- Gianluca Sartori

  • Jeff Moden (7/4/2010)


    My recommendation would be to exercise the query that returns the largest number of rows first and use that to spontaneously create a Temp Table using SELECT/INTO. Then do an insert to simulate the rest of the union all.

    Both queries returns 30 rows at most, so it should be no problem.

    Thank you, I'll try and see how it performs.

    -- Gianluca Sartori

  • Are you using table variables? Or temp tables?

    There are no statistics for table variables. The number of records are always estimated to 1 in the query optimizer for table variables.


    N 56°04'39.16"
    E 12°55'05.25"

  • At the moment I'm using table variables just to simulate the UNION ALL: in other words I'm using a table variable to store the output of a query, then I append to it the output of a second query.

    In this case table variables are the destination, not the source.

    Anyway, thanks for your suggestion.

    -- Gianluca Sartori

  • Gianluca Sartori (7/5/2010)


    Both queries returns 30 rows at most, so it should be no problem.

    BWAA-HAAA!!! I never thought I'd see you make such a justification. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • OMG! I see it can be read that way... 😀

    I simply mean it would be quite problematic finding the query that returns the largest amount of rows, because both queries return around 30 rows.

    What would really make me happy is a working solution with UNION ALL, so I guess I'll go for adding some hints: that's why I don't think I'll keep the table variable, nor add a SELECT/INTO.

    -- Gianluca Sartori

  • Ah... I get it. It did seem the other way, though. Too funny... 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply