Big Performance Problem with Merge Statement

Question

Big Performance Problem with Merge Statement

David Moutray

SSCertifiable

Points: 5956
More actions
June 20, 2013 at 2:12 pm

#279498

Problem Summary:
Merge Statement takes several times longer to execute than equivalent Update, Insert and Delete as separate statements. Why?
Details:
I have a relatively large table (about 35,000,000 records, approximately 13 GB uncompressed and 4 GB with page compression - including indexes). A MERGE statement pretty consistently takes two or three minutes to perform an update, insert and delete. At one extreme, updating 82 (yes 82) records took 1 minute, 45 seconds. At the other extreme, updating 100,000 records took about five minutes.
When I changed the MERGE to the equivalent separate UPDATE, INSERT & DELETE statements (embedded in an explicit transaction) the entire update took only 17 seconds. The query plans for the separate UPDATE, INSERT & DELETE statements look very similar to the query plan for the combined MERGE. However, all the row count estimates for the MERGE statement are way off.
Obviously, I am going to use the separate UPDATE, INSERT & DELETE statements. 🙂 However, I would like to understand what is happening here. I've read some forum posts that talk about various bugs in the MERGE operation. Has anyone run into this particular problem before? Does anyone know why this might happen?
The actual query plans for the four statements ( combined MERGE and the separate UPDATE, INSERT & DELETE ) are attached. SQL Code to create the source and target tables and the actual queries themselves are below. I've also included the statistics created by my test run. Nothing else was running on the server when I ran the test.
Please let me know if you need any further information, and thank you very much for your help! 🙂
Server Configuration:
SQL Server 2008 R2 SP1, Enterprise Edition
3 x Quad-Core Xeon Processor
Max Degree of Parallelism = 8
148 GB RAM
SQL Code:
Target Table:
USE TPS;
IF OBJECT_ID('dbo.ParticipantResponse') IS NOT NULL
DROP TABLE dbo.ParticipantResponse;
CREATE TABLE dbo.ParticipantResponse
( RegistrationID INT NOT NULL
,ParticipantID INT NULL
,ResponseDate SMALLDATETIME NULL
,CallNumber INT NULL
,ElementCategory NVARCHAR(80) NULL
,ElementNameID INT NULL
,ElementName NVARCHAR(80) NULL
,Response VARCHAR(3000) NULL
,SourceTable VARCHAR(30) NOT NULL
,SourceTableID INT NOT NULL
,SpecialistName VARCHAR(80) NULL
,ID BIGINT NOT NULL
,TransferKey INT NOT NULL
,CONSTRAINT [PK_ParticipantResponse]
PRIMARY KEY ( SourceTableID
,ID
)
)
WITH ( DATA_COMPRESSION = PAGE );
/* Index IX_PersonElement*/
CREATE NONCLUSTERED INDEX IX_PersonElement
ON dbo.ParticipantResponse (RegistrationID ASC,
ParticipantID ASC,
ElementName ASC
)
WITH ( MAXDOP = 1, SORT_IN_TEMPDB = ON, ONLINE = OFF, DATA_COMPRESSION = PAGE );
/*Index IX_ParticipantResponse*/
CREATE NONCLUSTERED INDEX IX_ParticipantResponse
ON dbo.ParticipantResponse (RegistrationID ASC,
ParticipantID ASC,
ResponseDate ASC,
ElementName ASC
)
WITH ( MAXDOP = 1, SORT_IN_TEMPDB = ON, ONLINE = OFF, DATA_COMPRESSION = PAGE );
/*Index IX_ParticipantID*/
CREATE NONCLUSTERED INDEX IX_ParticipantID
ON dbo.ParticipantResponse (ParticipantID ASC )
WITH ( MAXDOP = 1, SORT_IN_TEMPDB = ON, ONLINE = OFF, DATA_COMPRESSION = PAGE );
/*Index IX_ParticipantCall*/
CREATE NONCLUSTERED INDEX IX_ParticipantCall
ON dbo.ParticipantResponse (ParticipantID ASC,
CallNumber ASC
)
WITH ( MAXDOP = 1, SORT_IN_TEMPDB = ON, ONLINE = OFF, DATA_COMPRESSION = PAGE );
/*Index IX_ElementParticipant*/
CREATE NONCLUSTERED INDEX IX_ElementParticipant
ON dbo.ParticipantResponse (ElementName ASC,
ParticipantID ASC
)
WITH ( MAXDOP = 1, SORT_IN_TEMPDB = ON, ONLINE = OFF, DATA_COMPRESSION = PAGE );
Source Table:
IF OBJECT_ID('ETL.ParticipantResponseBuild') IS NOT NULL
DROP TABLE ETL.ParticipantResponseBuild;
CREATE TABLE ETL.ParticipantResponseBuild
( RegistrationID INT NOT NULL
,ParticipantID INT NULL
,ResponseDate SMALLDATETIME NULL
,CallNumber INT NULL
,ElementCategory NVARCHAR(80) NULL
,ElementNameID INT NULL
,ElementName NVARCHAR(80) NULL
,Response VARCHAR(3000) NULL
,SourceTable VARCHAR(30) NOT NULL
,SourceTableID INT NOT NULL
,SpecialistName VARCHAR(80) NULL
,ID BIGINT NOT NULL
,TransferKey INT NOT NULL
);
ALTER TABLE ETL.ParticipantResponseBuild
ADD PRIMARY KEY ( SourceTableID, ID );
CREATE INDEX [IX_TransferKey] ON ETL.ParticipantResponseBuild ( TransferKey )
WITH ( MAXDOP = 1, SORT_IN_TEMPDB = ON );
CREATE INDEX [IX_ParticipantID] ON ETL.ParticipantResponseBuild ( ParticipantID )
WITH ( MAXDOP = 1, SORT_IN_TEMPDB = ON );
Combined MERGE Statement:
USE TPS;
DECLARE @LastKeyCompleted INT = 476161;
SET STATISTICS IO,TIME ON;
MERGE INTO TPS.dbo.ParticipantResponse WITH (HOLDLOCK) AS Production
USING TPS.ETL.ParticipantResponseBuild AS Build
ON Production.SourceTableID = Build.SourceTableID
AND Production.ID = Build.ID
WHEN MATCHED AND Build.TransferKey > @LastKeyCompleted
THEN UPDATE SET
RegistrationID = Build.RegistrationID
,ParticipantID = Build.ParticipantID
,ResponseDate = Build.ResponseDate
,CallNumber = Build.CallNumber
,ElementCategory = Build.ElementCategory
,ElementNameID = Build.ElementNameID
,ElementName = Build.ElementName
,Response = Build.Response
,SourceTable = Build.SourceTable
,SpecialistName = Build.SpecialistName
,TransferKey = Build.TransferKey
WHEN NOT MATCHED BY TARGET
THEN INSERT
( RegistrationID ,ParticipantID ,ResponseDate ,CallNumber ,ElementCategory ,ElementNameID ,ElementName
,Response ,SourceTable ,SourceTableID ,SpecialistName ,ID ,TransferKey )
VALUES
( Build.RegistrationID ,Build.ParticipantID ,Build.ResponseDate ,Build.CallNumber ,Build.ElementCategory
,Build.ElementNameID ,Build.ElementName ,Build.Response ,Build.SourceTable ,Build.SourceTableID
,Build.SpecialistName ,Build.ID ,Build.TransferKey )
WHEN NOT MATCHED BY SOURCE AND Production.ParticipantID IN ( SELECT ParticipantID FROM ETL.ParticipantResponseBuild )
THEN DELETE;
Statistics for MERGE Statement:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 16 ms, elapsed time = 83 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'ParticipantResponse'. Scan count 1, logical reads 162202, physical reads 0, read-ahead reads 137853, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 5, logical reads 41074, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ParticipantResponseBuild'. Scan count 35964835, logical reads 72281824, physical reads 0, read-ahead reads 1, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(9731 row(s) affected)
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 138981 ms, elapsed time = 143911 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Separate UPDATE, INSERT & DELETE Statements:
USE TPS;
DECLARE @LastKeyCompleted INT = 476161;
SET STATISTICS IO,TIME ON;
BEGIN TRANSACTION
UPDATE dbo.ParticipantResponse
SET RegistrationID = Build.RegistrationID
,ParticipantID = Build.ParticipantID
,ResponseDate = Build.ResponseDate
,CallNumber = Build.CallNumber
,ElementCategory = Build.ElementCategory
,ElementNameID = Build.ElementNameID
,ElementName = Build.ElementName
,Response = Build.Response
,SourceTable = Build.SourceTable
,SpecialistName = Build.SpecialistName
,TransferKey = Build.TransferKey
FROM dbo.ParticipantResponse AS Production
JOIN ETL.ParticipantResponseBuild AS Build
ON Production.SourceTableID = Build.SourceTableID
AND Production.ID = Build.ID
WHERE Build.TransferKey > @LastKeyCompleted;
INSERT dbo.ParticipantResponse
( RegistrationID ,ParticipantID ,ResponseDate ,CallNumber ,ElementCategory ,ElementNameID ,ElementName
,Response ,SourceTable ,SourceTableID ,SpecialistName ,ID ,TransferKey )
SELECT Build.RegistrationID
,Build.ParticipantID
,Build.ResponseDate
,Build.CallNumber
,Build.ElementCategory
,Build.ElementNameID
,Build.ElementName
,Build.Response
,Build.SourceTable
,Build.SourceTableID
,Build.SpecialistName
,Build.ID
,Build.TransferKey
FROM dbo.ParticipantResponse AS Production
RIGHT JOIN ETL.ParticipantResponseBuild AS Build
ON Production.SourceTableID = Build.SourceTableID
AND Production.ID = Build.ID
WHERE Production.SourceTableID IS NULL;
DELETE dbo.ParticipantResponse
FROM dbo.ParticipantResponse AS Production
LEFT JOIN ETL.ParticipantResponseBuild AS Build
ON Production.SourceTableID = Build.SourceTableID
AND Production.ID = Build.ID
WHERE Build.SourceTableID IS NULL
AND Production.ParticipantID IN ( SELECT ParticipantID
FROM ETL.ParticipantResponseBuild
);
COMMIT TRANSACTION
Statistics for Separate UPDATE, INSERT & DELETE Statements:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 77 ms, elapsed time = 77 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'ParticipantResponse'. Scan count 0, logical reads 39541, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 5, logical reads 41074, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ParticipantResponseBuild'. Scan count 1, logical reads 794, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(9731 row(s) affected)
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 483 ms, elapsed time = 483 ms.
Table 'ParticipantResponseBuild'. Scan count 9, logical reads 873, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ParticipantResponse'. Scan count 9, logical reads 53986, physical reads 0, read-ahead reads 15, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 13, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(0 row(s) affected)
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 15290 ms, elapsed time = 1955 ms.
Table 'ParticipantResponse'. Scan count 2313, logical reads 7497, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 5, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ParticipantResponseBuild'. Scan count 2, logical reads 332, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(0 row(s) affected)
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 93 ms, elapsed time = 102 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 87 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

Viewing 15 posts - 1 through 15 (of 28 total)

You must be logged in to reply to this topic. Login to reply

David Moutray SSCertifiable Points: 5956 More actions · Answer 1

Nothing? :w00t: I was sure that one of those brilliant SQL veterans like Gail Shaw, Jeff Moden, Grant Fritchey, Steve Jones or Adam Machanic would have some insight into this problem. :hehe:

Help me, Obi-Wan, you're my only hope! 😀

spaghettidba SSC Guru Points: 105731 More actions · Answer 2

David Moutray (6/21/2013)
Nothing? :w00t: I was sure that one of those brilliant SQL veterans like Gail Shaw, Jeff Moden, Grant Fritchey, Steve Jones or Adam Machanic would have some insight into this problem. :hehe:
Help me, Obi-Wan, you're my only hope! 😀

Gail, Grant and Steve are on stage at SQLInTheCity in London right now.

If you content yourself with my help, I can try to take a look. 🙂

-- Gianluca Sartori

David Moutray SSCertifiable Points: 5956 More actions · Answer 3

Oh, I am perfectly content with your help. I was hoping one of these people might subscribe to Google alerts. 🙂 I am pretty sure Adam Machanic does, for one. 😀

spaghettidba SSC Guru Points: 105731 More actions · Answer 4

OK, looking at your code, looks like you're trying to do everything in a single pass (which is often a good idea).

In this case, you're using the WHEN NOT MATCHED predicate, which requires a Full Outer Join to include matched rows and unmatched rows in a single pass.

The individual DELETE and UPDATE statements do not suffer from this issue, so they actually perform better.

Hope this helps

Gianluca

-- Gianluca Sartori

spaghettidba SSC Guru Points: 105731 More actions · Answer 5

Also, the plans don't look very similar.

The DELETE plan is way different. It's a parallel plan, while the MERGE plan is serial. There's a hash match and... it's a different plan!

I think most of the magic is in there.

-- Gianluca Sartori

David Moutray SSCertifiable Points: 5956 More actions · Answer 6

Hmmm ... if I understand you correctly, this problem severely limits the usefulness of MERGE.

The WHEN NOT MATCHED clause is required for INSERTs and DELETEs, but getting all of that data in a single pass essentially requires a full table scan. That might save you time if the individual DELETE and INSERT queries require a table scan anyway. It might also be OK if the table is fairly small.

If you are trying to do a targeted UPDATE/INSERT/DELETE on a large table, though, you are better off with separate statements.

Perhaps Microsoft implemented the MERGE operator only because it is part of the ANSI standard, and they wanted to check that box and to be able to say, "Yeah, our software does that, too." How well does MERGE perform in other implementations of SQL, Oracle? Does anyone know?

This seems like a pretty fundamental limitation on an operation that is trying to everything at the same time. :ermm:

spaghettidba SSC Guru Points: 105731 More actions · Answer 7

I dont' think it's a limitation in the MERGE statement itself. It's just that the optimizer decided to implement yours in an inefficient way.

Actually, anti-joins are often a pain, not only with MERGE.

SQL Server's MERGE has some limitations and some bugs, but I can tell for sure that Oracle's MERGE is not any better.

-- Gianluca Sartori

David Moutray SSCertifiable Points: 5956 More actions · Answer 8

spaghettidba (6/21/2013)
Also, the plans don't look very similar.
The DELETE plan is way different. It's a parallel plan, while the MERGE plan is serial. There's a hash match and... it's a different plan!
I think most of the magic is in there.

Interesting. That goes well with something I've found: if you make the query too complex, the optimizer has trouble creating an optimal execution plan. If that happens, it is better to break your query up into multiple steps.

Thank you for your insight. 😎

The Dixie Flatline SSC Guru Points: 53353 More actions · Answer 9

For best performance with a MERGE, its a good idea to have both source and target tables indexed on the columns used to determine a MATCH. Other suggestions are found below.

http://msdn.microsoft.com/en-us/library/cc879317%28v=sql.105%29.aspx

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Jeff Moden SSC Guru Points: 1004752 More actions · Answer 10

David Moutray (6/21/2013)
Nothing? :w00t: I was sure that one of those brilliant SQL veterans like Gail Shaw, Jeff Moden, Grant Fritchey, Steve Jones or Adam Machanic would have some insight into this problem. :hehe:
Help me, Obi-Wan, you're my only hope! 😀

To be absolutely honest, I've heard/read of several problems with MERGE and so never made the "leap" to use it in SQL Server. The only time I've used MERGE was way-back-when I had to use Oracle. The only reason I even considered using it then was because UPDATE in Oracle doesn't have a FROM clause and it was a whole lot simpler to use MERGE as a surrogate for UPDATEs there.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150467 More actions · Answer 11

The main issue is the cardinality estimation for the Filter in the MERGE plan: estimated rows 32 million, actual 10 thousand.

The sorts in the plan require a memory grant, which is sized based on the estimated number of rows. Once the merge update actions have been Split into separate deletes and inserts, the estimate at the sort is 57.5 million rows. Ironically, none of the non-clustered indexes are changed in the example plan you gave, so the sorts actually encounter zero rows!

As you may know, queries cannot start executing until workspace memory has been granted. Your example MERGE plan ended up acquiring over 11GB of memory (!) though it may have asked for even more to begin with. It is highly likely the query had to wait a considerable time before 11GB could be granted. You can monitor memory grants using sys.dm_exec_query_memory_grants. If you had been using SQL Server 2012, the wait time would also have been reported in the execution plan.

The wait for memory grant is the reason the 80-row run took so long, and the wildly inaccurate cardinality estimate is the reason for the ridiculous memory grant.

The cardinality estimate is wrong because this Filter eliminates rows for which no merge action (insert, update, or delete) is necessary. The unusual structure of the MERGE statement makes estimating the number of rows that require no changes very difficult, and the cardinality estimator gets it very wrong. The offending part of the MERGE statement is:

WHEN NOT MATCHED BY SOURCE

AND Production.ParticipantID IN ( SELECT ParticipantID FROM ETL.ParticipantResponseBuild )

I understand what you are trying to do here, and why you wrote this clause this way but that doesn't change the fact that it is unusual and makes an already-difficult cardinality estimation problem just about impossible. To the optimizer, it looks very much as if almost all rows will result in an insert, update, or delete.

If you cannot reformulate the MERGE to handle the DELETE option using more transparent logic, use the MERGE for the INSERT and UPDATE and perform a separate DELETE. Or just use the three separate statements, of course, if you find that performs better. I would not say "never use MERGE" but it can require skilled tuning in many cases, and is rarely the best choice where parallel execution is needed. Cardinality estimation is much easier in the case of separate statements.

Other things, if you are interested:

MERGE is not optimized for large set processing, despite the common intuition that one statement ought to be faster than three. On the other hand, MERGE does contain some very specific optimizations for OLTP (in particular, the elimination of Halloween Protection in certain cases). There are particular considerations for MERGE that make it unsuitable for large sets. Some of these are due to implementation decisions, some are optimizer limitations, and others just come down to the immense complexity of the merging operation itself.

The Compute Scalar that determines the merge action is a Compute Sequence. This operator cannot tolerate parallel execution, so a parallel MERGE plan will stop and start parallel execution either side of it. The costs of stopping and restarting often result in MERGE plans that do not use parallelism, where it might otherwise be expected.

You can improve the cardinality estimates in some areas of the plans by adding OPTION (RECOMPILE) to the queries that reference the local variable @LastKeyCompleted. This hint allows the optimizer to see the current value of the variable and optimize accordingly. Otherwise, the plan is based on a guess. Recompiling may not take long here compared to the run time of the query, so it could be a price worth paying to get a plan tailored to the current value of @LastKeyCompleted.

The construct OUTER JOIN ... WHERE NULL is almost never preferable to writing a NOT EXISTS. The logical requirement is an Anti Semi Join, performing a full join and then rejecting NULLs is an odd way to express it. This topic has been written about many times, always with the same results. Use NOT EXISTS.

HOLDLOCK = SERIALIZABLE. There are good reasons to use this with MERGE for highly concurrent systems, but it would be quite unusual for ETL. Be sure you need this hint.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

spaghettidba SSC Guru Points: 105731 More actions · Answer 12

Paul, your answers could be collected in a book and it would be an absolute best-seller!

Awesome, as usual.

-- Gianluca Sartori

Mark Cowne One Orange Chip Points: 26991 More actions · Answer 13

spaghettidba (6/24/2013)
Paul, your answers could be collected in a book and it would be an absolute best-seller!
Awesome, as usual.

+1

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

Jeff Moden SSC Guru Points: 1004752 More actions · Answer 14

spaghettidba (6/24/2013)
Paul, your answers could be collected in a book and it would be an absolute best-seller!
Awesome, as usual.

+1000 to that. What would be even better is if MS really started listening to him.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)