UPSERT question for performance and efficiency

Question

UPSERT question for performance and efficiency

Lord Slaagh

SSCertifiable

Points: 5745
More actions
July 11, 2024 at 8:51 pm
Go to Answer
#4435359

Hello SSC!
First, I would like to thank everyone for responding to my posts. It is greatly appreciated!
I have a question about an UPSERT proc that I am in the process of writing.
I have a table with over 200 columns that I basically need to perform a MERGE on. The DELETEs are already done in a preliminary step. I have a source table with just ID's that I compared to the destination table. It's fast and easy. UPDATE not so much. So, using the same table and ID's, I can simply match the ID's and update all columns regardless of it needing an update. I would rather only UPDATE what needs to be updated, but how would you write that with 200+ columns?
Second: Related to performance, would a MERGE perform better than an UPDATE/INSERT? I suppose it depends on the amount of data, but isn't that why you would want to USE MERGE, to handle large rows efficiently?
Any assistance would be greatly appreciated!

The are no problems, only solutions. --John Lennon
ScottPletcher

SSC Guru

Points: 101123
More actions
July 11, 2024 at 10:09 pm
Answer
#4435383

I've generally found MERGE to be less efficient than UPSERT. Others maybe not.
You could generate the necessary WHERE clauses and CASE clauses to conditionally UPDATE 200 columns.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

frederico_fonseca SSCoach Points: 16167 More actions · Answer 1

for true prod code I tend to avoid merge - performance vs insert/update can vary - either faster or slower and no quick rule to decide on it - its a test test test case.

and merge has/had a few issues - see https://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/

in terms of deciding if a row needs to be updated, and with anything more than a few columns, using hashbytes (SHA-1 good enough for this if used in conjunction with concat (which deals with nulls) while ensuring all columns are concatenated with an extra delimiter), it is normally faster than comparing each individual column.

merge also has another problem - there is no warranty that updates will happen before inserts - so if you have a unique key on the table (lets use 'ABC' and '9999-12-31', and one row is getting updated (changing 9999-12-31 to 2024-05-01) and a new row is getting inserted (with 9999-12-31) it may fail with a unique key constraint if the insert happens before the update - so cases like these MUST be done so that updates happen before inserts - so merge not possible to use.

DesNorton SSC-Insane Points: 24401 More actions · Answer 2

Create some sample data

DECLARE @SAMPLE_SIZE BIGINT = 1000;

DECLARE
      @MinInt       int = 10000
    , @MaxInt       int = 50000
    , @StartDate    date = DATEADD(YY, DATEDIFF(YY, 0, GETDATE())-3, 0)
    , @EndDate      date = GETDATE()
    , @MinAmount    decimal(18,2) =  100.00
    , @MaxAmount    decimal(18,2) = 1000.00;

IF OBJECT_ID(N'tempdb..#Table1', N'U') IS NOT NULL  DROP TABLE #Table1;
CREATE TABLE #Table1 (
    RowID          int           NOT NULL
  , RandomInt      int               NULL
  , RandomDatetime datetime          NULL
  , RandomDecimal  decimal(18,2)     NULL
  , RandomHex12    char(12)          NULL
);

IF OBJECT_ID(N'tempdb..#Table2', N'U') IS NOT NULL  DROP TABLE #Table2;
CREATE TABLE #Table2 (
    RowID          int           NOT NULL
  , RandomInt      int               NULL
  , RandomDatetime datetime          NULL
  , RandomDecimal  decimal(18,2)     NULL
  , RandomHex12    char(12)          NULL
);

WITH H2  (N) AS ( SELECT 1 FROM (VALUES
                     (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
                    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
                    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
                    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
                    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
                    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
                    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
                    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
                    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
                    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
                    ) AS V(N))                              --   100 rows
    , H4  (N) AS (SELECT 1 FROM H2 AS a CROSS JOIN H2 AS b) --10,000 rows
    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM H4)
INSERT INTO #Table1 ( RowID, RandomInt, RandomDatetime, RandomDecimal, RandomHex12 )
SELECT
    NM.N
  , [RandomInt]      = ABS(CHECKSUM(NEWID())%(@MaxInt-@MinInt))+@MinInt
  , [RandomDatetime] = RAND(CHECKSUM(NEWID()))*DATEDIFF(DD,@StartDate,@EndDate)+CONVERT(DATETIME,@StartDate)
  , [RandomDecimal]  = CONVERT(DECIMAL(18,2),RAND(CHECKSUM(NEWID()))*(@MaxAmount-@MinAmount)+@MinAmount)
  , [RandomHex12]    = RIGHT(NEWID(), 12)
FROM NUMS AS NM
OPTION (RECOMPILE);

INSERT INTO #Table2 ( RowID, RandomInt, RandomDatetime, RandomDecimal, RandomHex12 )
SELECT t1.RowID
     , RandomInt      = CASE WHEN t1.RowID %25 = 7 THEN NULL WHEN t1.RowID %19 = 4 THEN t1.RandomInt      +1 ELSE t1.RandomInt      END
     , RandomDatetime = CASE WHEN t1.RowID %23 = 7 THEN NULL WHEN t1.RowID %31 = 4 THEN t1.RandomDatetime +1 ELSE t1.RandomDatetime END
     , RandomDecimal  = CASE WHEN t1.RowID %21 = 7 THEN NULL WHEN t1.RowID %53 = 4 THEN t1.RandomDecimal  +1 ELSE t1.RandomDecimal  END
     , RandomHex12    = CASE WHEN t1.RowID %19 = 7 THEN NULL                                                 ELSE t1.RandomHex12    END
FROM #Table1 AS t1
WHERE t1.RowID %17 < 14;

ALTER TABLE #Table1 ADD PRIMARY KEY (RowID);
ALTER TABLE #Table2 ADD PRIMARY KEY (RowID);

UPDATE the changed data

UPDATE t2
   SET t2.RandomInt      = t1.RandomInt
     , t2.RandomDatetime = t1.RandomDatetime
     , t2.RandomDecimal  = t1.RandomDecimal
     , t2.RandomHex12    = t1.RandomHex12
OUTPUT 'Old Value >>>' AS [Old Value >>>], DELETED.*, 'New Value >>>' AS [New Value >>>], INSERTED.*
FROM #Table1 AS t1
INNER JOIN #Table2 AS t2 ON t1.RowID = t2.RowID
INNER JOIN (
            SELECT t1.RowID, chk = CHECKSUM(*) FROM #Table1 AS t1
            EXCEPT
            SELECT t2.RowID, chk = CHECKSUM(*) FROM #Table2 AS t2
           ) AS t3 ON t1.RowID = t3.RowID;

INSERT the new data

INSERT INTO #Table2 ( RowID, RandomInt, RandomDatetime, RandomDecimal, RandomHex12 )
OUTPUT 'Inserted Value >>>' AS [Inserted Value >>>], INSERTED.*
SELECT t1.RowID
     , t1.RandomInt
     , t1.RandomDatetime
     , t1.RandomDecimal
     , t1.RandomHex12
FROM #Table1 AS t1
WHERE NOT EXISTS (SELECT 1 FROM #Table2 AS dst
                  WHERE dst.RowID = t1.RowID
                  );

How to post data/code on a forum to get the best help.

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 3

If you really want performance, create a wrapper proc that determines if you're doing an INSERT/UPDATE/DELETE and then a proc for each. MERGE generally stinks on performance. Period.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

ratbak Ten Centuries Points: 1382 More actions · Answer 4

would a MERGE perform better than an UPDATE/INSERT?

Probably not: https://www.sqlservercentral.com/articles/performance-of-the-sql-merge-vs-insertupdate

Lord Slaagh SSCertifiable Points: 5745 More actions · Answer 5

Hello SSC,

This is strange, I am trying to like these posts, but when I click "like" it sets to "unlike(1)". I do not want to unlike any posts, you guys are great.

Am I doing something wrong?

The are no problems, only solutions. --John Lennon

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 6

Think of it as a toggle. Instead of off to on, it's like to unlike. So you hit like, it marks it as a like, but changes it to unlike so you can undo it if you want.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Lord Slaagh SSCertifiable Points: 5745 More actions · Answer 7

Thank you, Grant! This was helpful. I did not want to unlike everyone, you guys are great and want to express that!

The are no problems, only solutions. --John Lennon