February 20, 2023 at 10:02 am
Sorry, but this script won't work well for large tables, especially with heavy use of IN() and NOT IN() operators. MERGE is usually faster than series of consecutive INSERT/UPDATE/DELETE statements.
I've found the opposite, particularly for MERGE statements that just do INSERT/UPDATE, that putting the SQL in separate UPDATE/INSERT statements is much faster than a MERGE.
February 20, 2023 at 11:07 am
I agree - there is no real reason, to avoid MERGE. And if it seems to difficult to understand, you may need just some practice 🙂
I use it very often on very large tables in our DWH (~20 TB) and if I would replace it by multiple INSERT / UPDATE / DELETE statements, everything would run slower.
I know, there are several - many year old - articles, that say, that MERGE is buggy, but if you check the bug lists, it are only very special edge cases (e.g. using MERGE to delete something from tbl1, using the OUTPUT option *without* INTO but putting the whole MERGE in a subselect INTO tbl1, which is something nobody really would consider to do) :
SELECT sub.*
INTO tbl1
FROM (MERGE INTO tbl1 AS t
USING ... AS s
ON s.key1 = t.key1
WHEN ...
WHEN NOT MATCHED
THEN DELETE
OUTPUT $action, Deleted.*
) as sub
God is real, unless declared integer.
February 20, 2023 at 12:31 pm
Here is a test that shows using a separate UPDATE/INSERT is faster than a MERGE.
SET STATISTICS IO, TIME OFF
if object_id('dbo.t1','U') IS NOT NULL drop table dbo.t1;
if object_id('dbo.t2','U') IS NOT NULL drop table dbo.t2;
go
create table dbo.t1(id int primary key clustered not null , value varchar(20)) ;
create table dbo.t2(id int primary key clustered not null , value varchar(20)) ;
insert into dbo.t1
select t.N, SUBSTRING(CONVERT(varchar(40), newid()), 1, 10)
from dbo.fnTally(1,10000000) t;
insert into dbo.t2
select t.N, SUBSTRING(CONVERT(varchar(40), newid()), 1, 20)
from dbo.fnTally(1,11000000) t;
SET STATISTICS IO, TIME ON
GO
BEGIN TRANSACTION
GO
PRINT '************************************* Start Merge'
MERGE
INTO dbo.t1 AS tgt
USING dbo.t2 as src
ON src.id = tgt.id
WHEN MATCHED THEN
UPDATE SET tgt.value = src.value
WHEN NOT MATCHED THEN
INSERT (id, value) VALUES (src.id, src.value)
;
PRINT '************************************* End Merge'
GO
ROLLBACK
GO
BEGIN TRANSACTION
GO
PRINT '************************************* Start Upsert'
UPDATE x
SET x.value = y.value
FROM dbo.t1 x
INNER JOIN dbo.t2 y
ON y.id = x.id
;
INSERT INTO t1(id, Value)
SELECT x.Id, x.Value
FROM dbo.t2 x
WHERE NOT EXISTS(SELECT *
FROM dbo.t1 y
WHERE y.id = x.id)
;
PRINT '************************************* End Upsert'
GO
ROLLBACK;
************************************* Start Merge
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 't1'. Scan count 1, logical reads 30080903, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 't2'. Scan count 1, logical reads 30004, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
SQL Server Execution Times:
CPU time = 24266 ms, elapsed time = 25164 ms.
(11000000 rows affected)
************************************* End Merge
************************************* Start Upsert
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 't1'. Scan count 1, logical reads 36299, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 't2'. Scan count 1, logical reads 27286, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
SQL Server Execution Times:
CPU time = 11015 ms, elapsed time = 11272 ms.
(10000000 rows affected)
Table 't1'. Scan count 1, logical reads 3223826, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 't2'. Scan count 1, logical reads 30004, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 2883241, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
SQL Server Execution Times:
CPU time = 5469 ms, elapsed time = 5500 ms.
(1000000 rows affected)
************************************* End Upsert
The MERGE took 25 seconds compared to 16 seconds for the UPSERT.
February 20, 2023 at 3:19 pm
Thank you everybody for your helpful comments.
A little context here.
As a database consultant I was asked by my client to perform a merge on 27 tables, one of which had 700 columns. Having never used MERGE before, I assumed (correctly or not) that snippets of code would be required for all these columns. To avoid this, I wrote a simple merge which knew nothing about the columns and it worked fine. Later, when it became apparent that MERGE in a SSIS package was required (corporate standard) I wrote a proc to auto-generate all the merges, and semi-automated the package as well. I have no issue with MERGE even though I've never written one so the article's title was a bit inflammatory. But it was a lot of fun writing my own merge.
February 21, 2023 at 8:55 pm
Here is a test that shows using a separate MERGE is faster than a UPDATE/INSERT.
If you change your data to a more realistic scenario, you will come to the opposite conclusion. It is not - for most people - realistic to update all existing 10,000,000 rows. With 1,000,000 updates and 1,000,000 inserts, MERGE 'wins'. Conclusions can only be made if the number of rows is known and is realistic.
************************************* Start Merge
SQL Server Execution Times:
CPU time = 18250 ms, elapsed time = 20696 ms.
(2000001 rows affected)
************************************* End Merge
************************************* Start Upsert
SQL Server Execution Times:
CPU time = 14579 ms, elapsed time = 15593 ms.
(1000001 rows affected)SQL Server Execution Times:
CPU time = 12112 ms, elapsed time = 11685 ms.
(1000000 rows affected)
************************************* End Upsert
February 22, 2023 at 12:32 am
Here is a test that shows using a separate MERGE is faster than a UPDATE/INSERT.
If you change your data to a more realistic scenario, you will come to the opposite conclusion. It is not - for most people - realistic to update all existing 10,000,000 rows. With 1,000,000 updates and 1,000,000 inserts, MERGE 'wins'. Conclusions can only be made if the number of rows is known and is realistic.
************************************* Start Merge SQL Server Execution Times: CPU time = 18250 ms, elapsed time = 20696 ms.
(2000001 rows affected) ************************************* End Merge ************************************* Start Upsert SQL Server Execution Times: CPU time = 14579 ms, elapsed time = 15593 ms.
(1000001 rows affected)SQL Server Execution Times: CPU time = 12112 ms, elapsed time = 11685 ms.
(1000000 rows affected) ************************************* End Upsert
Where is the test??
February 22, 2023 at 12:58 am
I use your script and just show the result.
February 22, 2023 at 1:57 am
I use your script and just show the result.
Ok data setup with the following rows:
insert into dbo.t1
select t.N, SUBSTRING(CONVERT(varchar(40), newid()), 1, 10)
from dbo.fnTally(1,10000000) t;
insert into dbo.t2
select t.N, SUBSTRING(CONVERT(varchar(40), newid()), 1, 20)
from dbo.fnTally(1,1000000) t;
insert into dbo.t2
select t.N+10000000, SUBSTRING(CONVERT(varchar(40), newid()), 1, 20)
from dbo.fnTally(1,1000000) t;
Results MERGE:
************************************* Start Merge
Table 't1'. Scan count 1, logical reads 3090684, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 't2'. Scan count 1, logical reads 9210, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
SQL Server Execution Times:
CPU time = 4750 ms, elapsed time = 4769 ms.
(2000000 rows affected)
************************************* End Merge
Results UPSERT:
************************************* Start Upsert
Table 't1'. Scan count 1, logical reads 41520, physical reads 0, page server reads 0, read-ahead reads 7, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 't2'. Scan count 1, logical reads 4623, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
SQL Server Execution Times:
CPU time = 2047 ms, elapsed time = 2069 ms.
(1000000 rows affected)
Table 't1'. Scan count 5, logical reads 3235233, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 't2'. Scan count 5, logical reads 9340, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
SQL Server Execution Times:
CPU time = 3095 ms, elapsed time = 2033 ms.
(1000000 rows affected)
************************************* End Upsert
So MERGE 4769 ms
UPDATE + INSERT 2069 ms + 2033 ms = 4102 ms
So it is still faster.
February 22, 2023 at 2:08 am
Try to look at the CPU time
February 22, 2023 at 2:12 am
YES! And the elapsed time will change more than CPU time on a production server with many users.
February 22, 2023 at 2:19 am
YES! And the elapsed time will change more than CPU time on a production server with many users.
I hope I don't sound too critical but that's not really true, production database servers are generally I/O bound because the performance of a database server is often limited by the speed at which data can be read from or written to the storage devices. This is especially true for database servers that handle large amounts of data or perform complex queries, which can require accessing multiple disk blocks and performing disk seeks. The CPU utilization of a database server is typically not the bottleneck, because modern CPUs are generally fast enough to handle the processing required for most database operations. Instead, the server's performance is limited by the speed at which it can access data from disk, which is typically slower than the CPU speed.
The Update/Insert is still faster on my machine than the Merge, so I was wondering what data did you use to set up your test?
Viewing 12 posts - 16 through 26 (of 26 total)
You must be logged in to reply to this topic. Login to reply