A Hazard of Using the SQL Merge Statement

  • P Jones (4/3/2013)


    As another MERGE fan I'd use

    WHEN NOT MATCHED BY SOURCE AND (t.ID = s.ID) THEN DELETE;

    in the original merge statement.

    But I've no complaints about the article - one size doesn't fit all so some will like and some won't.

    Holy moldy cannoli! Why didn't I think of that?

    Actually I think I did try something like it (can't remember for sure) but I do recall seeing a message something like Al Kessler reports a little ways after your post.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • shalinder.verma (4/3/2013)


    Phil Parkin (4/3/2013)


    shalinder.verma (4/3/2013)


    I agree with David. The title was misleading and I got upset before reading the article and got angry after reading the article, save 'holy moly guacamole'. πŸ˜›

    That said, it does containt some useful information for people who are about to start using MERGE statement.

    Back to the topic - This could be a very specific case for MERGE statement where one is not exactly merging the whole recordset but the subset of a recordset.

    You got upset and angry with an article about MERGE? Take a few deep breaths and relax dude, it's not worth the angst.

    No, please don't get me wrong. I said all with humor!

    chill

    Well that's a relief! I thought for a minute there I was going to need to don my teflon cape and start raging about the (things that really are) bugs that are in MERGE!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Mike McIver (4/3/2013)


    David is absolutely 10% correct.

    The "hazard" is given here:

    "Hastily, we consult MSDN"

    and

    "A quick scan of the article"

    and

    "Obviously this must be the ticket, so let’s give that a try"

    Phil is 100% correct about David's suggested titles. Here are my suggestions:

    "We all know what assume spells"

    "How's this egg look on my face"

    "A solution in search of a problem"

    David looses 90% for the snipe at Microsoft ... long live Bill Gates! David needs to attend the Redmond Reeducation Camp.

    Glad you caught that Mike. That was my subtle way of saying what a bozo I was for not reading BOL more carefully!

    And thanks for the support.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • al_kessler (4/3/2013)


    P Jones (4/3/2013)


    As another MERGE fan I'd use

    WHEN NOT MATCHED BY SOURCE AND (t.ID = s.ID) THEN DELETE;

    in the original merge statement.

    But I've no complaints about the article - one size doesn't fit all so some will like and some won't.

    I had the same sort of thought! However, when I tried it I learned:

    "Msg 5334, Level 16, State 2, Line 68

    The identifier 's.ID' cannot be bound. Only target columns are allowed in the 'WHEN NOT MATCHED BY SOURCE' clause of a MERGE statement."

    So, back to the original posted solution.

    This was a fine timely article as I have used MERGE for inserts and updates but was just beginning as trigger that needs to use the delete aspect.

    Thanks for the heads up and saving me a fair bit of grief and frustration. Assuming egg on one's face would negatively affect cholesterol levels, thanks for helping keep mine low?

    If the bound errors happens than you will have to replace it with an EXISTS condition:

    ... and exists (select * from #test2 where id =t.id)

    The author was kind of superficial about this new statement.

    Don't just give the hungry man a fish, teach him how to catch it as well.

    the sqlist

  • dwain.c (4/3/2013)


    P Jones (4/3/2013)


    As another MERGE fan I'd use

    WHEN NOT MATCHED BY SOURCE AND (t.ID = s.ID) THEN DELETE;

    in the original merge statement.

    But I've no complaints about the article - one size doesn't fit all so some will like and some won't.

    Holy moldy cannoli! Why didn't I think of that?

    Actually I think I did try something like it (can't remember for sure) but I do recall seeing a message something like Shalinder reports a little ways after this post.

    As al_kessler already pointed out, that suggestion doesn't work. Also, this approach often leads to a less optimal query plan, compared to the cte approach presented by dwain: merge always scanns the tables (as explained in BOL), any additional conditions on either of the merge actions will have to be processed conditionally per row. For the cte approach, first the number of rows is limited (by the cte) to only those that actually need to be processed and only then the conditions are evaluated. i.e. the query plan is usually leaner for the cte approach, and the number of times the conditions need to be determined is usually less and thus the performance of the cte approach is often better than the suggested alternative, even if the conditions are suitable to the needs of merge.



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

  • the sqlist (4/3/2013)


    al_kessler (4/3/2013)


    P Jones (4/3/2013)


    As another MERGE fan I'd use

    WHEN NOT MATCHED BY SOURCE AND (t.ID = s.ID) THEN DELETE;

    in the original merge statement.

    But I've no complaints about the article - one size doesn't fit all so some will like and some won't.

    I had the same sort of thought! However, when I tried it I learned:

    "Msg 5334, Level 16, State 2, Line 68

    The identifier 's.ID' cannot be bound. Only target columns are allowed in the 'WHEN NOT MATCHED BY SOURCE' clause of a MERGE statement."

    So, back to the original posted solution.

    This was a fine timely article as I have used MERGE for inserts and updates but was just beginning as trigger that needs to use the delete aspect.

    Thanks for the heads up and saving me a fair bit of grief and frustration. Assuming egg on one's face would negatively affect cholesterol levels, thanks for helping keep mine low?

    If the bound errors happens than you will have to replace it with an EXISTS condition:

    ... and exists (select * from #test2 where id =t.id)

    The author was kind of superficial about this new statement.

    Are you kidding me? Superficial is my middle name! πŸ˜€

    Actually I don't have a middle name and I'm not always superficial but this time I'll agree with you.

    On the one hand I wanted to offer some useful information to those unfamiliar with this feature but on the other hand I didn't want to attempt to rewrite BOL. When you think about it, the ability to replace the entire contents of the target table with the source is pretty cool. Better than TRUNCATE/INSERT probably (and no I haven't tried to see if it would work when foreign keys are present but unlinked to the parent tables).

    Limiting the target rows clearly has its uses.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • R.P.Rozema (4/3/2013)


    dwain.c (4/3/2013)


    P Jones (4/3/2013)


    As another MERGE fan I'd use

    WHEN NOT MATCHED BY SOURCE AND (t.ID = s.ID) THEN DELETE;

    in the original merge statement.

    But I've no complaints about the article - one size doesn't fit all so some will like and some won't.

    Holy moldy cannoli! Why didn't I think of that?

    Actually I think I did try something like it (can't remember for sure) but I do recall seeing a message something like Shalinder reports a little ways after this post.

    As al_kessler already pointed out, that suggestion doesn't work. Also, this approach often leads to a less optimal query plan, compared to the cte approach presented by dwain: merge always scanns the tables (as explained in BOL), any additional conditions on either of the merge actions will have to be processed conditionally per row. For the cte approach, first the number of rows is limited (by the cte) to only those that actually need to be processed and only then the conditions are evaluated. i.e. the query plan is usually leaner for the cte approach, and the number of times the conditions need to be determined is usually less and thus the performance of the cte approach is often better than the suggested alternative, even if the conditions are suitable to the needs of merge.

    One of the best parts about authoring is not only teaching a new trick but also learning from those that chime in with more experience in the area like you RP. I did not know that about the query plans and I will certainly keep it in mind when using MERGE in the future. Presumably the CTE also inherits the indexes from the target table to speed things up also.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Hi Dwain -

    I have a technical lack of understanding as to how the cte and Merge work together. Hopefully you can explain it to me. While I see that it does work when I run the code, I don't get how it works.

    How does SQL know to change #Test1 when the TargetRows is the Target of the Merge statement?

    This is what I see when I look at your Merge statement ... the TargetRows cte has 3 rows that of ID = 2 from #Test1. The TargetRows cte is then the Target for the Merge with #Test2 as the Source. #Test2 also has 3 rows all of ID = 2 as well. So when we do the Merge I'd EXPECT to see three records, all from #Test2, BUT "magically" the ID=1 and 3 from #Test1 show up. How does that happen?

    Thank you,

    Mark

  • ... here's a stab at it:

    SELECTDISTINCT

    COALESCE(b.ID, a.ID), COALESCE(b.RowNo, a.RowNo), COALESCE(b.Value, a.Value)

    FROM#Test1 a FULL OUTER JOIN

    #Test2 b

    ONa.ID = b.ID

    ANDb.RowNo IS NOT NULL

    ORDER BY

    1,2

    It may be all wrong ...

  • If the bound errors happens than you will have to replace it with an EXISTS condition:

    ... and exists (select * from #test2 where id =t.id)

    The author was kind of superficial about this new statement.

    Are you kidding me? Superficial is my middle name! πŸ˜€

    Actually I don't have a middle name and I'm not always superficial but this time I'll agree with you.

    On the one hand I wanted to offer some useful information to those unfamiliar with this feature but on the other hand I didn't want to attempt to rewrite BOL. When you think about it, the ability to replace the entire contents of the target table with the source is pretty cool. Better than TRUNCATE/INSERT probably (and no I haven't tried to see if it would work when foreign keys are present but unlinked to the parent tables).

    Limiting the target rows clearly has its uses.

    What I meant was that you totally omitted to mention the AND condition that comes with the MATCH, which actually is meant exactly for that kind of situations. An important detail. Other than that the info was good as an example but that was not the article's intention.

    Don't just give the hungry man a fish, teach him how to catch it as well.

    the sqlist

  • Phil Parkin (4/3/2013)


    I agree that MERGE is working just as it was designed and that people should understand that from reading the docs. But the article includes a technique - using a limited-row CTE as the target table to avoid unwanted deletions - which was new to me and which falls outside of your suggested alternative titles.

    +1 I've been using MERGE pretty much since it came out, but I've avoided deleting as much as possible. Usually business needs haven't called for a delete, just an update or an insert, but I've never thought about using a CTE with it. Thanks for the great article, gives me a different perspective on MERGE.



    Everything is awesome!

  • We tried both the CTE approach and the AND clause on the WHEN NOT MATCHED BY SOURCE part to merge rows spanning only a certain period. The performance of the CTE approach was much better, especially when the number of rows within that period was relatively small compared to the total number of rows in the target table. You will not notice the difference in these tiny example row sets but for a data warehouse the target table may easily contain several million rows and performance does matter.

    And I must admit we did forget that CTE the first time, deleting all rows in the target table outside the selected period. Luckily we had a separate development server so we could restore these rows from our production environment. Don't think you will never make this mistake, of coarse this behavior is by design but the article is not about bad behavior of the MERGE statement but about a caveat you should notice before using this statement. Thanks!

  • If you're excited about the MERGE statement (aren't we all?), you'll probabaly also be excited by row constructor, which will change your INSERT from this:

    INSERT INTO #Test1 -- Target

    SELECT 1, 1, 25 UNION ALL SELECT 1, 2, 32 UNION ALL SELECT 2, 1, 38 UNION ALL SELECT 2, 2, 61 UNION ALL SELECT 2, 4, 43 UNION ALL SELECT 3, 1, 15 UNION ALL SELECT 3, 2, 99 UNION ALL SELECT 3, 3, 54;

    INSERT INTO #Test2 -- Source

    SELECT 2, 1, 45 UNION ALL SELECT 2, 2, 88 UNION ALL SELECT 2, 3, 28;

    to this:

    -- target

    INSERT INTO #Test1 (ID, RowNo, Value)

    VALUES (1,1,25),(1,2,32),(2,1,38),(2,2,61),(2,4,43),(3,1,15),(3,2,99),(3,3,54);

    -- source

    INSERT INTO #Test2 (ID, RowNo, Value)

    VALUES (2,1,45),(2,2,88),(2,3,28);

  • vliet (4/3/2013)


    We tried both the CTE approach and the AND clause on the WHEN NOT MATCHED BY SOURCE part to merge rows spanning only a certain period. The performance of the CTE approach was much better, especially when the number of rows within that period was relatively small compared to the total number of rows in the target table. You will not notice the difference in these tiny example row sets but for a data warehouse the target table may easily contain several million rows and performance does matter.

    And I must admit we did forget that CTE the first time, deleting all rows in the target table outside the selected period. Luckily we had a separate development server so we could restore these rows from our production environment. Don't think you will never make this mistake, of coarse this behavior is by design but the article is not about bad behavior of the MERGE statement but about a caveat you should notice before using this statement. Thanks!

    CTE is not the wonder solution here, old good derived tables(subquerises) work just fine and the examples on the net are plenty and the MERGE syntax is very clear about this. It is good to use subset of the source table, considering that you don't need the whole table content. This works of course on source only.

    Don't just give the hungry man a fish, teach him how to catch it as well.

    the sqlist

  • It is good to use subset of the source table, considering that you don't need the whole table content.

    I don't understand what you mean - please elucidate.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

Viewing 15 posts - 16 through 30 (of 58 total)

You must be logged in to reply to this topic. Login to reply