UPSERT question for performance and efficiency

  • Hello SSC!

    First, I would like to thank everyone for responding to my posts. It is greatly appreciated!

    I have a question about an UPSERT proc that I am in the process of writing.

    I have a table with over 200 columns that I basically need to perform a MERGE on. The DELETEs are already done in a preliminary step. I have a source table with just ID's that I compared to the destination table. It's fast and easy. UPDATE not so much. So, using the same table and ID's, I can simply match the ID's and update all columns regardless of it needing an update. I would rather only UPDATE what needs to be updated, but how would you write that with 200+ columns?

    Second: Related to performance, would a MERGE perform better than an UPDATE/INSERT? I suppose it depends on the amount of data, but isn't that why you would want to USE MERGE, to handle large rows efficiently?

    Any assistance would be greatly appreciated!

     

     

    The are no problems, only solutions. --John Lennon

  • I've generally found MERGE to be less efficient than UPSERT.  Others maybe not.

    You could generate the necessary WHERE clauses and CASE clauses to conditionally UPDATE 200 columns.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • for true prod code I tend to avoid merge - performance vs insert/update can vary - either faster or slower and no quick rule to decide on it - its a test test test case.

    and merge has/had a few issues - see https://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/

    in terms of deciding if a row needs to be updated, and with anything more than a few columns, using hashbytes (SHA-1 good enough for this if used in conjunction with concat (which deals with nulls) while ensuring all columns are concatenated with an extra delimiter), it is normally faster than comparing each individual column.

    merge also has another problem - there is no warranty that updates will happen before inserts - so if you have a unique key on the table (lets use 'ABC' and '9999-12-31', and one row is getting updated (changing 9999-12-31 to 2024-05-01) and a new row is getting inserted (with 9999-12-31) it may fail with a unique key constraint if the insert happens before the update - so cases like these MUST be done so that updates happen before inserts - so merge not possible to use.

     

  • Create some sample data

    DECLARE @SAMPLE_SIZE BIGINT = 1000;

    DECLARE
    @MinInt int = 10000
    , @MaxInt int = 50000
    , @StartDate date = DATEADD(YY, DATEDIFF(YY, 0, GETDATE())-3, 0)
    , @EndDate date = GETDATE()
    , @MinAmount decimal(18,2) = 100.00
    , @MaxAmount decimal(18,2) = 1000.00;

    IF OBJECT_ID(N'tempdb..#Table1', N'U') IS NOT NULL DROP TABLE #Table1;
    CREATE TABLE #Table1 (
    RowID int NOT NULL
    , RandomInt int NULL
    , RandomDatetime datetime NULL
    , RandomDecimal decimal(18,2) NULL
    , RandomHex12 char(12) NULL
    );

    IF OBJECT_ID(N'tempdb..#Table2', N'U') IS NOT NULL DROP TABLE #Table2;
    CREATE TABLE #Table2 (
    RowID int NOT NULL
    , RandomInt int NULL
    , RandomDatetime datetime NULL
    , RandomDecimal decimal(18,2) NULL
    , RandomHex12 char(12) NULL
    );

    WITH H2 (N) AS ( SELECT 1 FROM (VALUES
    (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ) AS V(N)) -- 100 rows
    , H4 (N) AS (SELECT 1 FROM H2 AS a CROSS JOIN H2 AS b) --10,000 rows
    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM H4)
    INSERT INTO #Table1 ( RowID, RandomInt, RandomDatetime, RandomDecimal, RandomHex12 )
    SELECT
    NM.N
    , [RandomInt] = ABS(CHECKSUM(NEWID())%(@MaxInt-@MinInt))+@MinInt
    , [RandomDatetime] = RAND(CHECKSUM(NEWID()))*DATEDIFF(DD,@StartDate,@EndDate)+CONVERT(DATETIME,@StartDate)
    , [RandomDecimal] = CONVERT(DECIMAL(18,2),RAND(CHECKSUM(NEWID()))*(@MaxAmount-@MinAmount)+@MinAmount)
    , [RandomHex12] = RIGHT(NEWID(), 12)
    FROM NUMS AS NM
    OPTION (RECOMPILE);

    INSERT INTO #Table2 ( RowID, RandomInt, RandomDatetime, RandomDecimal, RandomHex12 )
    SELECT t1.RowID
    , RandomInt = CASE WHEN t1.RowID %25 = 7 THEN NULL WHEN t1.RowID %19 = 4 THEN t1.RandomInt +1 ELSE t1.RandomInt END
    , RandomDatetime = CASE WHEN t1.RowID %23 = 7 THEN NULL WHEN t1.RowID %31 = 4 THEN t1.RandomDatetime +1 ELSE t1.RandomDatetime END
    , RandomDecimal = CASE WHEN t1.RowID %21 = 7 THEN NULL WHEN t1.RowID %53 = 4 THEN t1.RandomDecimal +1 ELSE t1.RandomDecimal END
    , RandomHex12 = CASE WHEN t1.RowID %19 = 7 THEN NULL ELSE t1.RandomHex12 END
    FROM #Table1 AS t1
    WHERE t1.RowID %17 < 14;

    ALTER TABLE #Table1 ADD PRIMARY KEY (RowID);
    ALTER TABLE #Table2 ADD PRIMARY KEY (RowID);

    UPDATE the changed data

    UPDATE t2
    SET t2.RandomInt = t1.RandomInt
    , t2.RandomDatetime = t1.RandomDatetime
    , t2.RandomDecimal = t1.RandomDecimal
    , t2.RandomHex12 = t1.RandomHex12
    OUTPUT 'Old Value >>>' AS [Old Value >>>], DELETED.*, 'New Value >>>' AS [New Value >>>], INSERTED.*
    FROM #Table1 AS t1
    INNER JOIN #Table2 AS t2 ON t1.RowID = t2.RowID
    INNER JOIN (
    SELECT t1.RowID, chk = CHECKSUM(*) FROM #Table1 AS t1
    EXCEPT
    SELECT t2.RowID, chk = CHECKSUM(*) FROM #Table2 AS t2
    ) AS t3 ON t1.RowID = t3.RowID;

    INSERT the new data

    INSERT INTO #Table2 ( RowID, RandomInt, RandomDatetime, RandomDecimal, RandomHex12 )
    OUTPUT 'Inserted Value >>>' AS [Inserted Value >>>], INSERTED.*
    SELECT t1.RowID
    , t1.RandomInt
    , t1.RandomDatetime
    , t1.RandomDecimal
    , t1.RandomHex12
    FROM #Table1 AS t1
    WHERE NOT EXISTS (SELECT 1 FROM #Table2 AS dst
    WHERE dst.RowID = t1.RowID
    );
  • If you really want performance, create a wrapper proc that determines if you're doing an INSERT/UPDATE/DELETE and then a proc for each. MERGE generally stinks on performance. Period.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • would a MERGE perform better than an UPDATE/INSERT?

    Probably not: https://www.sqlservercentral.com/articles/performance-of-the-sql-merge-vs-insertupdate

  • Hello SSC,

    This is strange, I am trying to like these posts, but when I click "like" it sets to "unlike(1)". I do not want to unlike any posts, you guys are great.

    Am I doing something wrong?

    The are no problems, only solutions. --John Lennon

  • Think of it as a toggle. Instead of off to on, it's like to unlike. So you hit like, it marks it as a like, but changes it to unlike so you can undo it if you want.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thank you, Grant! This was helpful. I did not want to unlike everyone, you guys are great and want to express that!

    The are no problems, only solutions. --John Lennon

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply