A Hazard of Using the SQL Merge Statement

  • Jim Sebastiano (4/3/2013)


    If you're excited about the MERGE statement (aren't we all?), you'll probabaly also be excited by row constructor

    I think Dwain knows a thing or two about table constructors [/url]already πŸ˜›

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (4/3/2013)


    I think Dwain knows a thing or two about table constructors [/url]already πŸ˜›

    I guess old habits die hard!

  • the sqlist (4/3/2013)


    vliet (4/3/2013)


    We tried both the CTE approach and the AND clause on the WHEN NOT MATCHED BY SOURCE part to merge rows spanning only a certain period. The performance of the CTE approach was much better, especially when the number of rows within that period was relatively small compared to the total number of rows in the target table. You will not notice the difference in these tiny example row sets but for a data warehouse the target table may easily contain several million rows and performance does matter.

    And I must admit we did forget that CTE the first time, deleting all rows in the target table outside the selected period. Luckily we had a separate development server so we could restore these rows from our production environment. Don't think you will never make this mistake, of coarse this behavior is by design but the article is not about bad behavior of the MERGE statement but about a caveat you should notice before using this statement. Thanks!

    CTE is not the wonder solution here, old good derived tables(subquerises) work just fine and the examples on the net are plenty and the MERGE syntax is very clear about this. It is good to use subset of the source table, considering that you don't need the whole table content. This works of course on source only.

    Trimming the source down to the exact required size is usually part and parcel of preparing the source for use. For the target table, BOL states (under Optimizing MERGE Statement Performance):

    To filter out rows from the source or target tables, use one of the following methods.

    Specify the search condition for row filtering in the appropriate WHEN clause. For example, WHEN NOT MATCHED AND S.EmployeeName LIKE 'S%' THEN INSERT....

    Define a view on the source or target that returns the filtered rows and reference the view as the source or target table. If the view is defined on the target table, any actions against it must satisfy the conditions for updating views. For more information about updating data by using a view, see Modifying Data Through a View.

    Use the WITH <common table expression> clause to filter out rows from the source or target tables. This method is similar to specifying additional search criteria in the ON clause and may produce incorrect results. We recommend that you avoid using this method or test thoroughly before implementing it.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Interesting, and thanks for the piece. I definitely learned something here.

  • This is good article for "MERGE inro" topic - helps to understand the basics. Thanks!

    Obviously , "WHEN NOT MATCHED BY SOURCE THEN DELETE" behaves as designed clearly and NOT a "ticket" case - it was not matched by the given condition and is the right predictable behavior, the source did not have ID-s 1 or 3 then they deleted from the target. If you care about ID-s=2 only, yes, merge via a view pulling this ID only πŸ™‚ .

    I would add any condition like:

    WHEN NOT MATCHED BY SOURCE AND s.ID = t.ID...<something else>

    THEN DELETE πŸ™‚

    Thanks any way!

    EDITING!! S.ID cannot be bound - forgot about it, sorry

    WHEN NOT MATCHED BY SOURCE AND t.ID in (select...) ... <something else>

    THEN DELETE πŸ™‚

  • I would suggest these changes to the article.

    1. At the end of the "Set Up the Test Conditions" section, change the statement:

    For ID=2, we now have four rows where the Value column has been updated in RowNo 1 and 2, RowNo 3 was unchanged (because it didn’t exist in the source) and RowNo 4 was inserted.

    NOTE: It really was ID=2 RowNo=4 that was there in the initial test data and was unchanged in the test, so it should really be:

    For ID=2, we now have four rows where the Value column has been updated in RowNo 1 and 2, RowNo 4 was unchanged (because it didn’t exist in the source) and RowNo 3 was inserted.

    2. In the "Demonstrate the Hazard" section, it states:

    It’s a good thing we wrapped our test MERGE in a transaction before we ran that nasty bit of work in Production!

    The test code in this section is not wrapped in a transaction as it is in the other section, so wrap it. πŸ™‚

  • natalirozin (4/3/2013)


    This is good article for "MERGE inro" topic - helps to understand the basics. Thanks!

    Obviously , "WHEN NOT MATCHED BY SOURCE THEN DELETE" behaves as designed clearly and NOT a "ticket" case - it was not matched by the given condition and is the right predictable behavior, the source did not have ID-s 1 or 3 then they deleted from the target. If you care about ID-s=2 only, yes, merge via a view pulling this ID only πŸ™‚ .

    I would add any condition like:

    WHEN NOT MATCHED BY SOURCE AND s.ID = t.ID...<something else>

    THEN DELETE πŸ™‚

    Thanks any way!

    As has already been pointed out earlier in the thread, your suggested syntax will produce an error:

    "Msg 5334, Level 16, State 2, Line 68

    The identifier 's.ID' cannot be bound. Only target columns are allowed in the 'WHEN NOT MATCHED BY SOURCE' clause of a MERGE statement."

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Good article Dwain. Until you've seen rows unexpectedly deleted by an incorrectly designed MERGE statement I guess you just hum along unaware of the "hazard." I'd say it's like writing about the hazards of using DELETES or UPDATES without a WHERE clause. Anyone but a pure newb will likely have experienced disaster from that mistake already, but MERGE is relatively new and I don't think it's condescending or alarmist to point out a not-so-obvious pitfall.

    Β 

  • the example from a production code i use:

    when not matched by source and target.rootbatch in (select rootb from @rootbatches)

    then delete;

  • yes, i just added edit to my post, forgot about this sorry..

  • Very nice article. Yes the title was misleading but definitely eye-catching. The use of a CTE was a great technique - genius! I shamefully was working with a third table as my target to hold the subset of rows and then deleting and inserting those records into the main table! Now I can't wait to get back in there and try the CTE. I am a huge MERGE fan - I seriously think it the reason I got a raise the year I implemented it - my boss was that impressed!

  • Phil Parkin (4/3/2013)


    It is good to use subset of the source table, considering that you don't need the whole table content.

    I don't understand what you mean - please elucidate.

    What I mean is that if you need to keep the target table in sync with the source, like identical in terms of data or at least the rows, then you will have to use the whole source table to join. If you only need the sync to happen in certain conditions then it is preferable to filter the source out of the rows that are not needed using a view, CTE or a derived table, instead of building complicated MATCH + AND filters. This way the MERGE will have less rows to deal with so it will perform better.

    Hope I was clear enough.

    Don't just give the hungry man a fish, teach him how to catch it as well.

    the sqlist

  • mmahon (4/3/2013)


    Hi Dwain -

    I have a technical lack of understanding as to how the cte and Merge work together. Hopefully you can explain it to me. While I see that it does work when I run the code, I don't get how it works.

    How does SQL know to change #Test1 when the TargetRows is the Target of the Merge statement?

    This is what I see when I look at your Merge statement ... the TargetRows cte has 3 rows that of ID = 2 from #Test1. The TargetRows cte is then the Target for the Merge with #Test2 as the Source. #Test2 also has 3 rows all of ID = 2 as well. So when we do the Merge I'd EXPECT to see three records, all from #Test2, BUT "magically" the ID=1 and 3 from #Test1 show up. How does that happen?

    Thank you,

    Mark

    MM - I'm going to try to explain with a code example.

    CREATE TABLE #Test1 (ID INT, RowNo INT, Value MONEY);

    CREATE TABLE #Test2(ID INT, RowNo INT, Value MONEY);

    INSERT INTO #Test1 -- Target

    SELECT 1, 1, 25 UNION ALL SELECT 1, 2, 32 UNION ALL SELECT 2, 1, 38 UNION ALL SELECT 2, 2, 61

    UNION ALL SELECT 2, 4, 43 UNION ALL SELECT 3, 1, 15 UNION ALL SELECT 3, 2, 99

    UNION ALL SELECT 3, 3, 54;

    INSERT INTO #Test2 -- Source

    SELECT 2, 1, 45 UNION ALL SELECT 2, 2, 88 UNION ALL SELECT 2, 3, 28;

    BEGIN TRANSACTION T1;

    WITH TargetRows AS

    (

    SELECT a.ID, RowNo, Value

    FROM #Test1 a

    INNER JOIN (

    SELECT ID

    FROM #Test2

    GROUP BY ID) b

    ON a.ID = b.ID

    )

    SELECT * FROM TargetRows;

    WITH TargetRows AS

    (

    SELECT a.ID, RowNo, Value

    FROM #Test1 a

    INNER JOIN (

    SELECT ID

    FROM #Test2

    GROUP BY ID) b

    ON a.ID = b.ID

    )

    MERGE TargetRows t -- Target

    USING #Test2 s

    ON t.ID = s.ID AND t.RowNo = s.RowNo

    WHEN MATCHED

    THEN

    UPDATE SET Value = s.Value

    WHEN NOT MATCHED -- Target

    THEN

    INSERT (ID, RowNo, Value)

    VALUES (s.ID, s.RowNo, s.Value)

    WHEN NOT MATCHED BY SOURCE

    THEN DELETE

    OUTPUT $Action, DELETED.*, INSERTED.*;

    SELECT * FROM #Test1 ORDER BY ID, RowNo;

    ROLLBACK TRANSACTION T1;

    SELECT * FROM #Test1 ORDER BY ID, RowNo;

    DROP TABLE #Test1;

    DROP TABLE #Test2;

    First, I have used a SELECT * from the CTE (without a MERGE) to demonstrate the target row set. I think you said you already understand how the CTE isolates only the rows where ID=2 but this makes it clear.

    There are various sources out there explaining how CTEs can be used as the target for UPDATEs, DELETEs and MERGEs (here's a quick read on a case of UPDATE: http://pratchev.blogspot.com/2008/03/updates-with-cte.html). The SQL compiler magically (to me anyway) analyzes the query within the CTE to determine whether it can identify the true target table. Sometimes it cannot, which I believe would be the case in this instance if the JOINs were reversed.

    The MERGE now uses the OUTPUT $action keyword to show exactly what it is doing. In the results you can see 4 rows (all with ID=2), confirming it did not operate on IDs 1 and 3. Two of the rows (RowNo 1 and 2) are updated to the new value as those row numbers exist in both source and target. RowNo 3 does not exist in the target, so it is INSERTed (note how the DELETED.* columns are all NULL). Finally, RowNo 4 exists in the target but not in the source (WHEN NOT MATCHED BY SOURCE THEN DELETE) so it gets eliminated.

    To put it another way, if you were to run the MERGE without the CTE row set limiter (what I called "the hazard"), the result is that the rows of the source table completely replace the contents of the target. When using the CTE as the target, along with WHEN NOT MATCHED BY SOURCE THEN DELETE, all of those (filtered) rows are also completely replaced by the source.

    Hope this explanation is helpful to you.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Mike McIver (4/3/2013)


    ... here's a stab at it:

    SELECTDISTINCT

    COALESCE(b.ID, a.ID), COALESCE(b.RowNo, a.RowNo), COALESCE(b.Value, a.Value)

    FROM#Test1 a FULL OUTER JOIN

    #Test2 b

    ONa.ID = b.ID

    ANDb.RowNo IS NOT NULL

    ORDER BY

    1,2

    It may be all wrong ...

    Mike - This does produce the correct set of rows and is a very interesting approach. Now how would you suggested to DELETE/INSERT them into the target?

    One of the reasons I love MERGE is that it condenses what would normally be a multiple step approach into a single one.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • the sqlist (4/3/2013)


    If the bound errors happens than you will have to replace it with an EXISTS condition:

    ... and exists (select * from #test2 where id =t.id)

    The author was kind of superficial about this new statement.

    Are you kidding me? Superficial is my middle name! πŸ˜€

    Actually I don't have a middle name and I'm not always superficial but this time I'll agree with you.

    On the one hand I wanted to offer some useful information to those unfamiliar with this feature but on the other hand I didn't want to attempt to rewrite BOL. When you think about it, the ability to replace the entire contents of the target table with the source is pretty cool. Better than TRUNCATE/INSERT probably (and no I haven't tried to see if it would work when foreign keys are present but unlinked to the parent tables).

    Limiting the target rows clearly has its uses.

    What I meant was that you totally omitted to mention the AND condition that comes with the MATCH, which actually is meant exactly for that kind of situations. An important detail. Other than that the info was good as an example but that was not the article's intention.

    I'm not sure I omitted it so much as that I was offering another way to do it. In the end, it usually boils down to what is faster or what is easiest to understand. One of the wonderful things about SQL, and one of the things that keeps me coming back to this site, is to see the many different ways that different people approach the same problems. There is always food for thought in those differences.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 15 posts - 31 through 45 (of 58 total)

You must be logged in to reply to this topic. Login to reply