Performance of the SQL MERGE vs. INSERT/UPDATE

Question

Performance of the SQL MERGE vs. INSERT/UPDATE

Viewing 9 posts - 46 through 53 (of 53 total)

You must be logged in to reply to this topic. Login to reply

venoym SSCarpal Tunnel Points: 4161 More actions · Answer 1

Andy DBA (2/6/2015)
Thanks Kevin and Sean for the informative links. After reading them I'm actually less concerned about using MERGE for copy forward operations.
I apologize for not posting sample code for the copy forward technique I'm referring to. That might have made a difference in your replies. I saw it in a forum and assumed it was common knowledge. After looking the code over, I think you might agree that it's not vulnerable to the pitfalls described in those articles. The code is posted below (without the Try/Catch wrapper), but here are the main points:
The procedure is not merging. It will never update. It is using MERGE to INSERT only. (There is no "WHEN MATCHED" and it is "matching" on 1=0)
The MERGE is not inserting any primary keys. The PK on the target is an identity column. SQL Server is coughing up the value, not MERGE. This should have no more concurrency problems than an INSERT would.
Because there are no DELETES or UPDATES, insert triggers will process inserted rows accurately. (The trigger problem I saw in the article seemed to be caused by mixed operations.)
The ONLY reason MERGE is being used instead of INSERT is because it's OUTPUT clause can be used to expose the old PK at the same time as the new PK. (If somebody knows how to do that with a simple INSERT, please, share!)
The following code is not my original work. I wish I could credit the original author but I can't remember where I found this as a posted solution to what seems to be a fairly common business need. As I stated earlier, IMHO the best solution is to add a column to the parent table, but when that's not an option I feel that this is the best solution. I will be very impressed if someone posts a better one:
--Begin TRY block here
DECLARE @IDXLATE TABLE (Old_ID INT, New_ID INT);
MERGE Parent AS Dest
USING (
SELECT *
FROM Parent
WHERE filtercolumn = filtervalue --Condition to select records to copy forward
) AS Srce
ON 0 = 1 --Never update, always insert
WHEN NOT MATCHED BYTARGET THEN
INSERT (datacolumn1 --data column to copy forward
,datacolumn2
,datacolumn3
,updatedcolumn) --column that will change in new record
VALUES (datacolumn1,
,datacolumn2
,datacolumn3
,@newvalue) --data for column changed in new record (not shown above could be a new calendar year, for example)
OUTPUT INSERTED.ID, Srce.ID INTO @IDXLATE(New_ID,Old_ID); --ID is IDENTITY int PK column, @IDXLATE will get Old to New ID mapping
--Note that in one T-SQL statement we've performed our new insert and also captured map of Old ID to New ID!
--Detail Tables
INSERT INTO dbo.Child1
(ParentFK
,datacolumns)
SELECT
New_ID
,datacolumns
FROM dbo.Child1 C1
INNER JOIN @IDXLATE s ON C1.ParentFK = s.Old_ID
INSERT INTO dbo.Child2
(ParentFK
,datacolumns)
SELECT
New_ID
,datacolumns
FROM dbo.Child1 C2
INNER JOIN @IDXLATE s ON C2.ParentFK = s.Old_ID
--etc
--End TRY block

INSERT, UPDATE, and DELETE also have the OUTPUT clause. The OUTPUT Clause goes back to (I think) SQL 2000 and maybe older.

https://msdn.microsoft.com/en-us/library/ms177564.aspx

Sean Lange SSC Guru Points: 286590 More actions · Answer 2

venoym (2/9/2015)
Andy DBA (2/6/2015)
Thanks Kevin and Sean for the informative links. After reading them I'm actually less concerned about using MERGE for copy forward operations.
I apologize for not posting sample code for the copy forward technique I'm referring to. That might have made a difference in your replies. I saw it in a forum and assumed it was common knowledge. After looking the code over, I think you might agree that it's not vulnerable to the pitfalls described in those articles. The code is posted below (without the Try/Catch wrapper), but here are the main points:
The procedure is not merging. It will never update. It is using MERGE to INSERT only. (There is no "WHEN MATCHED" and it is "matching" on 1=0)
The MERGE is not inserting any primary keys. The PK on the target is an identity column. SQL Server is coughing up the value, not MERGE. This should have no more concurrency problems than an INSERT would.
Because there are no DELETES or UPDATES, insert triggers will process inserted rows accurately. (The trigger problem I saw in the article seemed to be caused by mixed operations.)
The ONLY reason MERGE is being used instead of INSERT is because it's OUTPUT clause can be used to expose the old PK at the same time as the new PK. (If somebody knows how to do that with a simple INSERT, please, share!)
The following code is not my original work. I wish I could credit the original author but I can't remember where I found this as a posted solution to what seems to be a fairly common business need. As I stated earlier, IMHO the best solution is to add a column to the parent table, but when that's not an option I feel that this is the best solution. I will be very impressed if someone posts a better one:
--Begin TRY block here
DECLARE @IDXLATE TABLE (Old_ID INT, New_ID INT);
MERGE Parent AS Dest
USING (
SELECT *
FROM Parent
WHERE filtercolumn = filtervalue --Condition to select records to copy forward
) AS Srce
ON 0 = 1 --Never update, always insert
WHEN NOT MATCHED BYTARGET THEN
INSERT (datacolumn1 --data column to copy forward
,datacolumn2
,datacolumn3
,updatedcolumn) --column that will change in new record
VALUES (datacolumn1,
,datacolumn2
,datacolumn3
,@newvalue) --data for column changed in new record (not shown above could be a new calendar year, for example)
OUTPUT INSERTED.ID, Srce.ID INTO @IDXLATE(New_ID,Old_ID); --ID is IDENTITY int PK column, @IDXLATE will get Old to New ID mapping
--Note that in one T-SQL statement we've performed our new insert and also captured map of Old ID to New ID!
--Detail Tables
INSERT INTO dbo.Child1
(ParentFK
,datacolumns)
SELECT
New_ID
,datacolumns
FROM dbo.Child1 C1
INNER JOIN @IDXLATE s ON C1.ParentFK = s.Old_ID
INSERT INTO dbo.Child2
(ParentFK
,datacolumns)
SELECT
New_ID
,datacolumns
FROM dbo.Child1 C2
INNER JOIN @IDXLATE s ON C2.ParentFK = s.Old_ID
--etc
--End TRY block
INSERT, UPDATE, and DELETE also have the OUTPUT clause. The OUTPUT Clause goes back to (I think) SQL 2000 and maybe older.
https://msdn.microsoft.com/en-us/library/ms177564.aspx

Why are you bothering with a merge here at all? Why not just remove the merge part of this entirely as it doesn't do anything for you.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Andy DBA SSCommitted Points: 1776 More actions · Answer 3

venoym - The OUTPUT clause was first introduced in MS SQL Server 2005.

venoym and Sean - Please re-read the only bold font sentence in my last post and the sentence following it. I think you're missing the point. MERGE is needed, but not for merging. It's OUTPUT clause lets you retreive data from the INSERT's source along with data from the INSERT's destination. That's the ONLY thing it's being used for. The point of my last post is that this set-based solution is not vulnerable to MERGE's pitfalls when it is used for a true merge because it is only doing an insert and it is letting the db engine generate the primary keys. To use a term coined on this site, this usage of MERGE might be considered "SQL Spackle" because one would not normally think of using it this way.

I think I went over the business requirement that this technique is addresssing in previous posts, but please let me know if it's not clear. I admit this whole exercise may seem a bit contrived because IMHO the best solution is to simply add a column to the parent table (and do a simple INSERT instead of MERGE), but there are situations where that may not be an option. In my particular case, I was replacing a RBAR "solution" that did not need an extra column and I had to provide a "like for like" replacement (in addition to my recommendation to add the column 😉 ).

Sean Lange SSC Guru Points: 286590 More actions · Answer 4

Andy DBA (2/9/2015)
venoym - The OUTPUT clause was first introduced in MS SQL Server 2005.
venoym and Sean - Please re-read the only bold font sentence in my last post and the sentence following it. I think you're missing the point. MERGE is needed, but not for merging. It's OUTPUT clause lets you retreive data from the INSERT's source along with data from the INSERT's destination. That's the ONLY thing it's being used for. The point of my last post is that this set-based solution is not vulnerable to MERGE's pitfalls when it is used for a true merge because it is only doing an insert and it is letting the db engine generate the primary keys. To use a term coined on this site, this usage of MERGE might be considered "SQL Spackle" because one would not normally think of using it this way.
I think I went over the business requirement that this technique is addresssing in previous posts, but please let me know if it's not clear. I admit this whole exercise may seem a bit contrived because IMHO the best solution is to simply add a column to the parent table (and do a simple INSERT instead of MERGE), but there are situations where that may not be an option. In my particular case, I was replacing a RBAR "solution" that did not need an extra column and I had to provide a "like for like" replacement (in addition to my recommendation to add the column 😉 ).

But OUTPUT is available for an INSERT also. That is why I don't understand the point of using a MERGE when you are explicitly not matching. Just change it to an INSERT. https://msdn.microsoft.com/en-us/library/ms177564.aspx

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Andy DBA SSCommitted Points: 1776 More actions · Answer 5

Perhaps a more detailed explanation will make things more clear.

I'm selecting records from a table (PARENT) and inserting them into the same table but with a new PK (and one or more other changed columns eg. calendar year). I also want to select records from a related child table (CHILD) and insert them into the same child table, but with a new FK that points to the new PK inserted above. When I'm done I'll have two sets of data. For argument's sake let's say old 2014 parent records and their children, and new 2015 parent records and their children.

To do so, I need a mapping table so I can SELECT old child records by matching the old PK to their FK and then INSERT them with a new FK set to the new PK value. But PARENT has an IDENTITY PK and the DB engine creates new PKs during the INSERT, I don't know what the new PK will be beforehand and don't know what record the newly inserted PARENT record was copied from.

Make sense so far?

The RBAR approach is to insert to PARENT one agonizing row at a time and use SCOPE_IDENTITY to retrieve the new PK. You have the old PK from the one row you just copied, so you can use it to SELECT CHILD records and INSERT them using the SCOPE_IDENTITY value you just pulled for their new FK. This is "wonderful" because you don't have to use Identity Insert or Tablock holdlock, etc. but it's RBAR!

If I do a set based approach and I INSERT all the PARENT records at once, OUTPUT from that INSERT only gives me the columns that were inserted, including new system generated PK values. That is of no use in this case, because I don't know what original PKs the ones came from so I can't select the CHILD records tied to the original PARENT records.

But if I use MERGE to do the INSERT, its OUTPUT will give me BOTH the old and new PKs together and I can put them into a temp table. I can then join to that temp table to select original CHILD records and INSERT new CHILD records with new FKs pointing to the new parent PKs. That one MERGE T-SQL statement not only inserts all of the PARENT records, it also gives me a mapping table I can use to load the CHILD records.

Finally, because the DB engine is creating new PKs during the MERGE's insert, the insert is not vulnerable to PK violation problems that can be caused by race conditions in MERGE operations where PKs are explicitly being set.

Make sense?

Dwain Camps SSC Guru Points: 86978 More actions · Answer 6

marko.celarc (2/5/2015)
insert into dbo.history_table
(ItemId, Oid, old_value, new_value, TransactionCode)
select
ItemId, 18, old_value, new_Value, '1234567'
from
(
merge into
dbo.mother_table AS target_table
using
(select
itemNo,
1057
from
dbo.some_table
where
something = 3) as source_table (ItemNo, new_stat)
on
(target_table.some_field = source_table.ItemNo)
when matched and target_table.stat <> new_stat then update set stat = new_stat
output inserted.some_field, deleted.stat, inserted.stat)
as spr(ItemId, old_value, new_Value);

I don't see why you need a MERGE within your composable DMS. The UPDATE seems to work exactly the same, assuming I understand what it is you're trying to do.

CREATE TABLE #history_table

(

ItemID INT

,Oid INT

,old_value INT

,new_value INT

,TransactionCode VARCHAR(10)

);

CREATE TABLE #mother_table

(

some_field INT IDENTITY

,stat INT

);

CREATE TABLE #some_table

(

ItemNo INT IDENTITY

,something INT

);

INSERT INTO #some_table (something)

SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3;

INSERT INTO #mother_table (stat)

SELECT 1111 UNION ALL SELECT 2222 UNION ALL SELECT 3333;

SELECT *

FROM #some_table;

SELECT *

FROM #mother_table;

insert into #history_table

(ItemId, Oid, old_value, new_value, TransactionCode)

select

ItemId, 18, old_value, new_Value, '1234567'

from

(

--merge into

--#mother_table AS target_table

--using

--(select

--itemNo,

--1057

--from

--#some_table

--where

--something = 3) as source_table (ItemNo, new_stat)

--on

--(target_table.some_field = source_table.ItemNo)

--when matched and target_table.stat <> new_stat then update set stat = new_stat

UPDATE target_table

SET stat = new_stat

output inserted.some_field, deleted.stat, inserted.stat

FROM #mother_table target_table

JOIN

(select

itemNo,

1057

from

#some_table

where

something = 3) as source_table (ItemNo, new_stat)

ON target_table.some_field = source_table.ItemNo

) as spr(ItemId, old_value, new_Value);

SELECT *

FROM #history_table;

SELECT *

FROM #some_table;

SELECT *

FROM #mother_table;

GO

DROP TABLE #mother_table;

DROP TABLE #some_table;

DROP TABLE #history_table;

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

SqlNightOwl SSCrazy Points: 2333 More actions · Answer 7

We use merge extensively as part of our delivery process. We need to have re-runnable scripts that affect "process" related data (reference type data). If the value isn't there then insert, if the data is there AND it's changed then update it. This keeps the deployment script clean and able to execute multiple times while still affecting the data once.

merge dbo.some_table with (holdlock) tgt using aQueryOrCte src on src.KeyValue = tgt.KeyValue when matched and tgt.Column1 != src.Column1 or tgt.Column2 != src.Column2 then update set Column1 = src.Column1 , Column2 = src.Column2 when not matched then insert ( KeyValue , Column1 , Column2 ) values ( src.KeyValue , src.Column1 , src.Column2 );

It works like a champ! I execute the scripts from a PoSh script running in Octopus, pipe the output to a file and report the results back to the caller.

--Paul Hunter

christopher.gray Old Hand Points: 372 More actions · Answer 8

I thought it was interesting that for the update test, the MERGE seemed to perform somewhat fewer reads and dramatically fewer writes. Even though the difference in elapsed time wasn't large in your test, it's possible the savings would increase for larger workloads. I would be surprised if MERGE was better at updates than UPDATE, but now you have me wondering.

N_Muller SSCarpal Tunnel Points: 4037 More actions · Answer 9

SQL Server, in particular SQL Server 2014, has performance issues with left join, especially when compared to inner joins. I do not know how merge was implemented by Microsoft, but it can't be that much different than update/insert-left join within a transaction. if your target table has hundreds of millions of rows and the source table is only a very small fraction of the size of the target table, it pays to use CTE for the target table with an inner-join to limit the scope of the target table. When I did that, the performance of merge was significantly better than the performance of update/insert-left join. Example:

;with TGTROWS as ( select TGT.* from SRC inner join TGT on ...join clause... ) merge TGTROWS using SRC on ...join clause... when matched update xxx when not matched insert xxx;