Starting with SQL Server 2008, we can use the MERGE statement to solve some every day scenarios in more elegant ways. A short definition of this new feature would be: it allows us to write a single statement to synchronize data in two tables by executing any combination of INSERT, UPDATE and DELETE operations according to specified criteria. The result of this statement isn’t something that wasn’t possible before, but the difference is that we don’t have to write more complex stored procedures that include multiple JOINs and statements like IF EXISTS.
A BASIC SCENARIO
Let’s demo the MERGE statement on a typical example of an ETL process that updates a Data Warehouse. ETL stands for Extract, Transform and Load, i.e. a series of steps to import data from various sources, transform the data into a common schema and then load it into a Data Warehouse. The last step of such a process is where we can use the MERGE statement. Suppose that our Data Warehouse contains a table called Buyers that, for simplicity, contains only these columns:
CREATE TABLE [dbo].[Buyers]( [ID] [INT] PRIMARY KEY, [FirstName] [NVARCHAR](255) , [Companyname] [NVARCHAR](255) , [TotalPurchaseAmount] [MONEY] NULL)
We will create a similar table called BuyersLoad with identical structure:
CREATE TABLE [dbo].[BuyersLoad]( [ID] [INT] PRIMARY KEY, [FirstName] [NVARCHAR](255) , [Companyname] [NVARCHAR](255) , [TotalPurchaseAmount] [MONEY] NULL)
During the nightly ETL run, we connect to any available source and load data that was added or modified since the last run. That data is temporary stored in the BuyersLoad table that serves as a staging area. When the data is loaded into the BuyersLoad table, we will execute SQL code that will implement the following logic: for each record in BuyersLoad, check whether a record with the same PK exists in the Buyers table; if it exists, then update the record, if it doesn’t exist – insert a new record And here’s how we can do this with the MERGE statement:
MERGE [dbo].[Buyers] AS t USING [dbo].[BuyersLoad] AS s ON t.[ID]=s.[ID] WHEN MATCHED THEN UPDATE SET t.[FirstName] = s.[FirstName], t.[Companyname]=s.[Companyname], t.[TotalPurchaseAmount]=s.[TotalPurchaseAmount] WHEN NOT MATCHED THEN INSERT ([ID], [FirstName], [Companyname],[TotalPurchaseAmount] ) VALUES (s.[ID], s.[FirstName], s.[Companyname], s.[TotalPurchaseAmount] ) ;
Let’s explain the parts of that statement: The first part defines the target table:
MERGE [dbo].[Buyers] AS t
The second line defines the source table and the JOIN with the target:
USING [dbo].[BuyersLoad] AS s ON t.[ID]=s.[ID]
Then there’s the part that specifies what happens when a source record ID matches a record ID in the target table:
WHEN MATCHED THEN UPDATE SET t.[FirstName] = s.[FirstName], t.[Companyname]=s.[Companyname], t.[TotalPurchaseAmount]=s.[TotalPurchaseAmount]
Finally, this part specifies the code that executes when there’s no match:
WHEN NOT MATCHED THEN INSERT ([ID], [FirstName], [Companyname],[TotalPurchaseAmount] ) VALUES (s.[ID], s.[FirstName], s.[Companyname], s.[TotalPurchaseAmount] ) ;
Let’s add some data into our tables and test our MERGE statement:
INSERT INTO dbo.Buyers ( ID ,FirstName ,Companyname ,TotalPurchaseAmount) VALUES ( 1 , N'Homer Simpson', N'Homer Inc.' , 1000.) INSERT INTO dbo.Buyers ( ID ,FirstName ,Companyname ,TotalPurchaseAmount) VALUES ( 2 , N'Bart Simpson', N'Bart Inc.' , 1000.) INSERT INTO dbo.BuyersLoad ( ID ,FirstName ,Companyname ,TotalPurchaseAmount) VALUES ( 2 , N'Lisa Simpson', N'Bart&Lisa Inc.' , 2000.) INSERT INTO dbo.BuyersLoad ( ID ,FirstName ,Companyname ,TotalPurchaseAmount) VALUES ( 3 , N'Marge Simpson', N'Marge Co.' , 5000.) SELECT * FROM dbo.Buyers SELECT * FROM dbo.BuyersLoad
Now we have 2 records in each table:
After we execute our MERGE statement, the data looks like this:
This was neat and well structured, but it’s not much different from a traditional solution that could look like this:
UPDATE dbo.Buyers SET [FirstName] = s.[FirstName], [Companyname]=s.[Companyname], [TotalPurchaseAmount]=s.[TotalPurchaseAmount] FROM dbo.BuyersLoad s WHERE dbo.Buyers.ID=s.ID INSERT INTO buyers ([ID], [FirstName], [Companyname],[TotalPurchaseAmount] ) SELECT s.[ID], s.[FirstName], s.[Companyname], s.[TotalPurchaseAmount] FROM dbo.BuyersLoad s WHERE s.ID NOT IN (SELECT ID FROM dbo.Buyers) ;
A MORE COMPLEX SCENARIO
According to MSDN, the real advantage of the MERGE statement is shown in cases when the two tables have a complex mixture of matching characteristics and INSERT/UPDATE/DELETE operations. As the complexity increases, the difference in performance of the MERGE statement compared to traditional solutions increases. At the same time, a MERGE statement stays well structured and more readable than multiple consecutive INSERT/UPDATE/DELETE statements.
Let’s add one more table and use it for the next demo:
CREATE TABLE [dbo].[BuyersOrders]( [OrdID] [INT] NOT NULL, [BuyerID] [INT] NOT NULL, [ProductID] [INT] NOT NULL, [FirstName] [NVARCHAR](255) , [TotalPurchaseAmount] [MONEY] NULL)
As you can guess, this table contains orders placed by Buyers and we will use them to implement the following logic: for all incoming Order records, update the TotalPurchaseAmount column for existing Buyers, or add new Buyer records, or DELETE those buyers who caused us damage by making negative totals. Here’s some sample data for this:
INSERT INTO dbo.BuyersOrders( OrdID, BuyerID, ProductID, TotalPurchaseAmount ) VALUES ( 100, 2, 234 , 1500.) INSERT INTO dbo.BuyersOrders( OrdID, BuyerID, ProductID, TotalPurchaseAmount ) VALUES ( 101, 3, 267 , 500.) INSERT INTO dbo.BuyersOrders( OrdID, BuyerID, ProductID, TotalPurchaseAmount ) VALUES ( 102, 1, 267 , -30000.) INSERT INTO dbo.BuyersOrders( OrdID, BuyerID, ProductID, FirstName, TotalPurchaseAmount ) VALUES ( 103, 4, 789 , 'Maggie Simpson', 100.) SELECT * FROM dbo.Buyers SELECT * FROM dbo.BuyersOrders
Looking at the data in the BuyersOrders table, we can see that Marge and Lisa were good buyers, Homer was naughty and stole some money from us, and we have a new buyer called Maggie. Our MERGE into the Buyers table should therefore update 2 records, delete one and add one. let’s see how we can write that:
MERGE [dbo].[Buyers] AS t USING [dbo].[BuyersOrders] AS s ON t.[ID]=s.[BuyerID] WHEN MATCHED AND t.[TotalPurchaseAmount] + s.[TotalPurchaseAmount] < 0.0 THEN DELETE WHEN MATCHED THEN UPDATE SET t.[TotalPurchaseAmount] = t.[TotalPurchaseAmount] + s.[TotalPurchaseAmount] WHEN NOT MATCHED THEN INSERT ([ID], [FirstName], [Companyname],[TotalPurchaseAmount] ) VALUES (s.[BuyerID], s.[FirstName], '', s.[TotalPurchaseAmount] ) ; SELECT * FROM dbo.Buyers SELECT * FROM dbo.BuyersOrders
There we go. We have packed all INSERT, UPDATE and DELETE logic into one big but readable statement. Oh, and Homer Simpson has been kicked out of the shopping club, too.
CAUTION – READ THIS BEFORE USE
The most important point to remember is that the MERGE statement actually performs multiple different operations separately, and thus can suffer from issues due to concurrency. I recommend everyone who starts using this statement to first read Aaron Bertrand’s post about various bugs and issues that might or might not be fixed in near future.