The T-SQL merge statement is very handy for inserting or updating lots of rows in a single statement. However, if there are errors with even just one row of the data you are trying to merge, then the merge statement fails in its entirety and does not offer any clue as to which row(s) caused the problem. This makes feedback for the user less helpful than it could be. A simplified merge might look like this:
CREATE TYPE CustomerUpdatesType AS TABLE (
customer_id int null,
title nvarchar(16) null,
first_name nvarchar(32) null,
last_name nvarchar(32) null,
phone_number nvarchar(10) null
);
DECLARE @CustomerUpdates CustomerUpdatesType;
INSERT INTO @CustomerUpdates(customer_id, title, first_name, last_name, phone_number)
VALUES
(1,'Mr','Bob','Jones','555-1234'), -- Update where customer_id = 1
(2,'Mrs','Jessica','Smith','555-2345'), --Update where customer_id = 2
(null,'Rev','Brian','Green','555-4567') -- Insert a new customer
;
MERGE INTO CUSTOMERS dest
USING @CustomerUpdates source on source.customer_id = dest.customer_id
WHEN MATCHED THEN
UPDATE
SET dest.title = source.title
,dest.first_name = source.first_name
,dest.last_name = source.last_name
,dest.phone_number = source.phone_number
WHEN NOT MATCHED THEN
INSERT(title,first_name,last_name,phone_number)
VALUES(source.title, source.first_name,source.last_name,source.phone_number)
;
Validation before the merge can help, but the merge statement itself can be more useful too. The technique outlined here to make merging more helpful involves wrapping the merge statement in a stored procedure and passing in the data as a table valued parameter. If the data is all valid then the stored procedure follows the fast path and quickly merges the data into the target table as per normal. However, if there are errors with the data that cause the merge statement to fail, then the stored procedure adopts the slow path, trying one data row at a time, noting any errors for user feedback. The trick here is using recursion so that the slow path is just the wrapper stored procedure calling itself, meaning that the exact same merge statement and all its business logic is being reused.
To illustrate the technique, start with the basic merge statement from above, wrap it in a stored procedure with a table valued parameter typed to accept the data to be merged. This is the fast path that will be how the stored procedure works when there are no data errors.
CREATE PROCEDURE BulkCustomerUpdate @CustomerUpdates CustomerUpdatesType READONLY
AS
BEGIN
MERGE INTO CUSTOMERS dest
USING @CustomerUpdates source on source.customer_id = dest.customer_id
WHEN MATCHED THEN
UPDATE
SET dest.title = source.title
,dest.first_name = source.first_name
,dest.last_name = source.last_name
,dest.phone_number = source.phone_number
WHEN NOT MATCHED THEN
INSERT(title,first_name,last_name,phone_number)
VALUES(source.title, source.first_name,source.last_name,source.phone_number)
;
END
;
If there are errors with the fast path merge, we want to be able to say which row(s) caused the problem. Adding an identity column called "update_id" to the user defined table type fulfills this requirement.
CREATE TYPE CustomerUpdatesType AS TABLE (
update_id int identity(1,1) not null,
customer_id int null,
title nvarchar(16) null,
first_name nvarchar(32) null,
last_name nvarchar(32) null,
phone_number nvarchar(10) null
);
To allow the stored procedure to recurse into itself and process rows one at a time, add an integer type parameter called @single_update_id to the stored procedure and default it to null. On the fast path @single_update_id will be null, implying that the merge should process the whole of table valued parameter @CustomerUpdates.
CREATE PROCEDURE BulkCustomerUpdate @CustomerUpdates CustomerUpdatesType READONLY,
@single_update_id BIGINT = NULL
AS
Wrap the merge statement in a Begin Try/End Try block. The USING <table_source> clause of the merge needs to respect the @single_update_id parameter if it is not null, so <table_source> becomes a sub-select with an ISNULL(). The same merge statement is used in both the fast ( error free ) path and in the slow ( row by row ) path. Depending on the business logic required, the merge statement can become quite complicated and only having a single statement used in all scenarios is helpful.
BEGIN TRY
-- Merge may implement complex business rules, delete clauses etc.
MERGE INTO Customers dest
USING (SELECT * FROM @CustomerUpdates WHERE update_id = ISNULL(@single_update_id,update_id)) source
ON source.customer_id = dest.customer_id
WHEN MATCHED THEN
UPDATE
SET dest.title = source.title
,dest.first_name = source.first_name
,dest.last_name = source.last_name
,dest.phone_number = source.phone_number
WHEN NOT MATCHED THEN
INSERT(title,first_name,last_name,phone_number)
VALUES(source.title, source.first_name,source.last_name,source.phone_number)
;
END TRY
After the Begin Try/End Try block is the Begin Catch/End Catch block, the slow path that finds the errors on the data by looping through the rows to be merged one at a time. When on the slow path, there are a few details that need to be weighed against the business requirements, for instance transactions and error reporting.For transactions, should a single error cause all of the data to be rejected or should as much data as possible be merged? For error reporting, this example only provides print statements and RAISERROR to the application layer but it might be useful if the stored procedure returned a result set to the application layer indicating errors.
Inside the Begin Catch/End Catch block, a while loop iterates over the merge data by update_id and calls EXEC to recurse into this same stored procedure with parameter @single_update_id set.
BEGIN CATCH
-- ...SNIP...
SELECT @next_update_id = MIN(update_id) FROM @CustomerUpdates;
WHILE (@next_update_id IS NOT NULL)
BEGIN
BEGIN TRY
-- Recurse into this stored procedure again, with just one row to be merged.
EXEC BulkCustomerUpdate @CustomerUpdates = @CustomerUpdates, @single_update_id = @next_update_id;
END TRY
-- ...SNIP... Some error handling snipped for clarity.
SELECT @next_update_id = MIN(update_id) FROM @CustomerUpdates WHERE update_id > @next_update_id;
END -- /WHILE
-- ...SNIP...
END CATCH
If a nested call to the stored procedure encounters an error while it is merging a single row, that should be reported to the user and the application layer.
BEGIN
-- @single_update_id IS NOT NULL means we've been here before. Raise the error to the application layer.
RAISERROR('Update_id %d, Error number %d, message %s', 1, 1, @single_update_id, @errCode, @errMessage) WITH NOWAIT;
SET @errMessage = 'Update_id ' + CONVERT(NVARCHAR(255),@single_update_id) + ' caused error ' + CONVERT(NVARCHAR(255),@errCode) + ', ' + @errMessage;
PRINT @errMessage;
END
And apart from some transaction and error handling, that is just about all there is to it. A full worked example with source code, sample data and a test case is attached.