As I wrote while exploring A Hazard of Using the SQL Merge Statement, I love MERGE because I think it is the greatest thing to happen to SQL querying since sliced bread. Clearly people have been doing the same thing that MERGE does using INSERT and/or UPDATE statements for many years before MERGE became available. But naturally, curiosity got the best of me regarding the performance of this wonderful query tool. Today we’ll try to explore some of the ways MERGE can be used and compare performance against equivalent INSERT and/or UPDATE statements.
MERGE is designed to apply both UPDATE and INSERTs into a target table from a source table. The statement can do both at once, or simply do INSERTs or only UPDATEs. One might even get the impression that INSERT and UPDATE are no longer needed. Why not always use MERGE?
MERGE can also do DELETEs from the target table, but for today we won’t explore that case.
Sample Data and Some Basic Examples
To illustrate our case, let’s set up some very simplistic source and target tables, and populate them with some data that we can demonstrate with.
CREATE TABLE #Target ( ID BIGINT PRIMARY KEY , Value INT ); CREATE TABLE #Source ( ID BIGINT PRIMARY KEY , Value INT ); INSERT INTO #Target VALUES ( 1, 2342 ), ( 2, 345 ); INSERT INTO #Source VALUES ( 1, 975 ), ( 3, 683 );
When we MERGE into #Target, our matching criteria will be the ID field, so the normal case is to UPDATE like IDs and INSERT any new ones like this:
-- Standard MERGE of all #Source rows into #Target MERGE #Target t USING #Source s ON s.ID = t.ID WHEN MATCHED THEN UPDATE SET Value = s.Value WHEN NOT MATCHED THEN INSERT ( ID, Value ) VALUES ( s.ID , s.Value ); SELECT * FROM #Target;
This produces quite predictable results that look like this:
ID Value 1 975 2 345 3 683
Let’s change the values in our #Source table, and then use MERGE to only do an UPDATE.
-- Change the values of our source rows UPDATE #Source SET Value = CASE ID WHEN 1 THEN 555 WHEN 3 THEN 999 END -- MERGE that only updates MERGE #Target t USING #Source s ON s.ID = t.ID WHEN MATCHED THEN UPDATE SET Value = s.Value; SELECT * FROM #Target;
The results now in #Target are:
ID Value 1 555 2 345 3 999
Finally, we know we can also use MERGE to replace INSERT by omitting the MATCHED clause. Let’s INSERT a new row into #Source and do this.
-- Insert a new row into our source INSERT #Source VALUES ( 4, 242 ); -- MERGE that only inserts MERGE #Target t USING #Source s ON s.ID = t.ID WHEN NOT MATCHED THEN INSERT ( ID, Value ) VALUES ( s.ID , s.Value ); SELECT * FROM #Target;
Unsurprisingly, the results now in #Target are these:
ID Value 1 555 2 345 3 999 4 242
Sorry if you know all this stuff and I’ve bored you, but we needed to get these basics out of the way.
Exploring the Performance of MERGE
To effectively test the performance of our alternatives, we’ll need to set up a test harness with a non-trivial number of rows in our #Source and #Target tables. You can open the Test Harness 1.sql file in the resources section of this article and follow along. The basic set-up data is as follows.
TRUNCATE TABLE #Source; TRUNCATE TABLE #Target; WITH Tally ( n ) AS ( SELECT TOP 1000000 ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL ) ) FROM sys.all_columns a CROSS JOIN sys.all_columns b ) INSERT INTO #Target SELECT 2 * n , 1 + ABS(CHECKSUM(NEWID())) % 1000 FROM Tally; WITH Tally ( n ) AS ( SELECT TOP 1000000 ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL ) ) FROM sys.all_columns a CROSS JOIN sys.all_columns b ) INSERT INTO #Source SELECT CASE WHEN n <= 500000 THEN 2 * n - 1 ELSE 2 * n END , 1 + ABS(CHECKSUM(NEWID())) % 1000 FROM Tall
We’ve purposely set up our source table so that the INSERTs it will do when merged with the target are interleaved with existing records for the first 500,000 rows.
Using SQL Profiler, we’ll compare two identical query scripts:
-- MERGE MERGE #Target t USING #Source s ON s.ID = t.ID WHEN MATCHED THEN UPDATE SET Value = s.Value WHEN NOT MATCHED THEN INSERT ( ID, Value ) VALUES ( s.ID , s.Value ); -- TRUNCATE and re-populate Source and Target tables -- UPDATE/INSERT BEGIN TRANSACTION T1; UPDATE t SET Value = s.Value FROM #Target t JOIN #Source s ON s.ID = t.ID; INSERT INTO #Target SELECT s.ID , s.Value FROM #Source s LEFT JOIN #Target t ON s.ID = t.ID WHERE t.ID IS NULL; COMMIT TRANSACTION T1;
Both of these INSERT 500,000 rows and UPDATE 500,000 rows, the latter enclosed in a TRANSACTION. We omit error handling and possible ROLLBACK of the transaction for simplicity. The results that we obtained running the test harness in SQL Profiler (5 runs) are as follows:
Query CPU Reads Writes Duration MERGE 4492 4513786 2578 4864 INSERT/UPDATE 3588 3072489 5496 3847 MERGE 4820 4514303 2899 5253 INSERT/UPDATE 3572 3072979 4007 4035 MERGE 4462 4513793 2571 4770 INSERT/UPDATE 3635 3072463 5497 3829 MERGE 4524 4513795 2931 4800 INSERT/UPDATE 3588 3072474 5505 3665 MERGE 4648 4513814 2939 4955 INSERT/UPDATE 3479 3072491 5522 3716
These indicate that MERGE took about 28% more CPU and 29% more elapsed time than the equivalent INSERT/UPDATE. Not surprising considering all the complexity that MERGE must handle, but possibly forgivable for the additional safety, convenience and simplicity it represents (no need for a TRANSACTION with error handling).
Using MERGE as a Substitute for INSERT
To test the performance of MERGE as a substitute for INSERT, the same test harness that set up the #Target table can be used, but we’ll change the set up for the #Source table as follows.
WITH Tally (n) AS ( SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM sys.all_columns a CROSS JOIN sys.all_columns b ) INSERT INTO #Source SELECT CASE WHEN n <= 500000 THEN 2*n-1 ELSE 2000000+n END ,1+ABS(CHECKSUM(NEWID()))%1000 FROM Tally;
A quick check of the row counts generated from Test Harness #2.sql confirms that both MERGE and INSERT insert exactly 1,000,000 rows. The results from a Profiler trace using these queries give us this:
Query CPU Reads Writes Duration MERGE 5054 6024150 2397 5576 INSERT 4992 6248001 7563 5507 MERGE 5226 6024165 3868 5529 INSERT 5383 6248005 7571 6298 MERGE 5257 6023557 3689 5473 INSERT 4851 6247403 7431 5546 MERGE 5273 6023589 2167 5662 INSERT 4914 6247440 7427 5281 MERGE 5179 6024619 1426 5476 INSERT 5039 6248483 6211 5954
In this case, CPU and elapsed time are probably too close to call. CPU usage was about 3% more for the MERGE and elapsed time was about 3% less for the MERGE.
Using MERGE as a Substitute for UPDATE
Once again, to test this we’ll change the test harness only for the #Source table so that it creates the rows with exactly the same IDs as the #Target table.
WITH Tally (n) AS ( SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM sys.all_columns a CROSS JOIN sys.all_columns b ) INSERT INTO #Source SELECT 2*n ,1+ABS(CHECKSUM(NEWID()))%1000 FROM Tally;
After confirming that both MERGE and UPDATE update 1,000,000 rows, our Profile trace from running Test Harness #3.sql gave us these results.
Query CPU Reads Writes Duration MERGE 1903 7982 2568 2010 UPDATE 1763 7954 2568 1840 MERGE 1904 7986 2576 2303 UPDATE 1809 7955 2560 1974 MERGE 1903 7968 2576 1951 UPDATE 1763 7940 2568 2005 MERGE 1918 7957 2568 2009 UPDATE 1731 7464 2584 1809 MERGE 1903 8005 2560 2023 UPDATE 1732 7977 2584 2063
Our MERGE was just about 8% more costly in CPU and 6% more in elapsed time. The advantage probably goes to the UPDATE for this case.
Conclusions
While the MERGE statement is a wonderfully useful query syntax that does offer some safety features over the traditional approach of UPDATE then INSERT, it does come with a slight performance penalty. Certainly the performance characteristics may vary depending on your special circumstances, so this article should only be treated as a guideline. While I will probably continue to use it simply for the convenience it offers in the INSERT/UPDATE case, it is unlikely I’d consider using it as a replacement either for INSERT or for UPDATE.
All performance tests in this article were run on a Lenovo Laptop with an Intel Core i-5 CPU @2.60 GHz and 4GB of RAM, running Windows 7 (64 bit) OS, SQL Server 2012 (64 bit).
We hope you found this article informative.
Dwain Camps
SQL Enthusiast
Skype: dwaincamps