Huge speed difference with clause "If exists"

  • Because the filter has to be applied to the XML itself. The way the query is written, you're filtering on the aggregation and aggregation occurs within the XML. So you have to get that done, and then it filters. The EXISTS is going to still work, but only after the filter is applied. It couldn't do otherwise as it's configured.

    This part is purely speculation, because of the EXISTS, the row estimates are unnaturally low, forcing the behavior down different paths than the query run all by itself. That's the cause of the slow performance, well, one. The other is the need to filter after the XML is put together. Between the two, the performance is blown out of the water.

    This is actually kind of fun. I may need to see if I can schedule some time for an experiment or three.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • well there is a difference if you expect one record since you're forced to believe all you're interested in is one row (IF EXISTS is a boolean condition) or if you actually go and check statistics first if you're going for a seek, scan or something else for all X rows of Y which get filtered, Joined and whatever else. Where your EXISTS ends is not where your Statement (execution plan) ends. So while the SQL Server might estimate for a direct call to the view that it should expect based on statistics for each ClovekAutoID to find 40.000 rows but if you tell me you are only interested if rows for a certain ClovekAutoID exist, then I would expect something different from you, too.

    It's like telling "hey, I got a job for you" "Ok, great!" "Have a look if you have any book from XY" "OK, sure! - Yes, I do!" "Great, as you've done that so fast, go ahead and check from all possible books if you have the latest book from that Author" "OK, in this case as I was fast and I'll grab each book to check if this one is the latest one instead of checking through the whole shelf, eventually grouping all books from XY together and THEN decide which one is the newest."

    I wouldn't even try to put this on either Cardinality Engine old or new, if you pass one parameter to check if anything exists, you should be looking for either literally anything or the last row inserted.

    IF EXISTS (SELECT User_ID from SSCDB where FirstName = 'Jeff') is fine if you would like to pay any Jeff a beer, but knowing you might have to serve up to 40.000 beer because we're hoarding Jeff's might end up giving substantial, additional cost.

  • Hmmm... I wonder if the new Cardinality Estimator would actually do a better job at this?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Hmmm... I wonder if the new Cardinality Estimator would actually do a better job at this?

    Not sure, but if I get the time to try this out, it will be on the newer optimizer (I don't keep old servers around, cause I don't have to, HA!).

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I'm not entirely sure if the new CE would do better in such case because as I see it you're in this case with EXISTS expecting 2 different things: a bool check if we're "talking NULL" or not is a nice thing to speed up processing if you might check for values which might even not be there (and then skip further, time consuming processing ) or if you actually want the last row inserted returned ONLY the CE does a perfectly well job, it assumes you're interested in 0 or 1 rows.

    Now if you would change the CE you wouldn't be allowed to loose the bool part for single record checks but you would need to do further processing with any EXISTS statement, what would you do in such case? Do a bool check and still do a density check on the statistics for the table at least? I'm pretty sure many would go for LEFT JOIN IS NULL constructs at that point latest because there you would be only doing the density / selectivity check for given predicates.

    I might be totally wrong on the last part as I've never worked for MS directly but I think if the CE would assume both bool and selectivity should be handled by EXISTS they would really lean very far from the ANSI SQL Standard and I think this one might hurt more than switching to a different CE. If you think of the OPs Question wether it should strictly be bool or not I'd agree it should be "inbetween" but think about

     

    WHERE EXISTS(SELECT Col1 INTERSECT SELECT Col2)

    that one could go horribly wrong, my first bet would be the NULL and NULL part here.

     

    • This reply was modified 5 years, 3 months ago by  DinoRS. Reason: typo
  • Grant: "The other is the need to filter after the XML is put together."

    Do you mean that it materializes the ENTIRE resultset into one giant XML construct, and then checks to see whether, in that resultset, there exists a record? Or maybe, if that XML construct itself, that it just got done building, if THAT exists?

    • This reply was modified 5 years, 3 months ago by  pdanes.
  • Out of curiosity, what if you tried a version that doesn't use "if exists".  Maybe something like this...

    declare @exists bit=0
    select top 1 @exists=1 From BotCBO.vwClovekStrings_PrZk Where ClovekString = '[gmelin]'
    select @exists Existuje

     

  • Good call - that works well. With the index on ID disabled, it is roughly a minute, the same as the original 'bad' version. But with the SSMS-suggested index in effect, it is around one second, same as the 'good' version - without the enclosing If Exists.

    Attached are the execution plans.

    Attachments:
    You must be logged in to view attached files.
  • pdanes wrote:

    Grant: "The other is the need to filter after the XML is put together."

    Do you mean that it materializes the ENTIRE resultset into one giant XML construct, and then checks to see whether, in that resultset, there exists a record? Or maybe, if that XML construct itself, that it just got done building, if THAT exists?

    Yep. Look at the execution plan. You can see it.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 9 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply