ETL data load DML SP slowness suggestion

  • Hi All,

    We are using Microsoft SQL Azure SQL 2022 (12.0.2000.8). Some of our ETL SPs are running very longer more than 30 minutes. The reason is it is doing a lot of DML - almost 15 (Sometime 100) million Insert, update and delete. Table size is (196113141 rows and size 120 805 792 KB)

    Here is the sample code: I have tried to match the original code, but have changed the date to be deleted from 90 days to 0 to test it with current date "DATEADD(DAY, -90, SYSDATETIME())"

    Here is the table and sample insert with 15000000 records, I have tried to load all not null columns.

    Test I did with 10 millions for table [TOTAL_RAW] & 1 million for table DLSTYTOTAL. It is working good. It takes more time with 100 millions and log is getting full as well in simple recovery mode.

    I believe this needs to done batch by batch . For example 100 millions into 10 times (10 million each time). Can anyone help and share better code.

    This table does not have any index and tigers. It has only PK.

    use master
    drop database [DB_Load_Test]
    create database [DB_Load_Test]
    go
    use [DB_Load_Test]

    CREATE TABLE [TOTAL_RAW](
    [STYTOTALID] [int] IDENTITY(1,1) NOT NULL,
    [STYLE] [decimal](5, 0) NOT NULL,
    [SYTOT1] [decimal](7, 0) NULL,
    [SYTOT2] [decimal](7, 0) NULL,
    [SYTOT3] [decimal](7, 0) NULL,
    [SYTOT4] [decimal](7, 0) NULL,
    [SYCNT] [decimal](4, 0) NULL,
    [SYSHP$] [decimal](7, 0) NULL,
    [SYREC] [decimal](7, 0) NULL,
    [SYREC$] [decimal](7, 0) NULL,
    [SYFUL] [decimal](7, 0) NULL,
    [SYFUL$] [decimal](7, 0) NULL,
    [SYMKD] [decimal](7, 0) NULL,
    [SYMKD$] [decimal](7, 0) NULL,
    [SYDIS] [decimal](7, 0) NULL,
    [SYDIS$] [decimal](7, 0) NULL,
    [SYSAL] [decimal](7, 0) NULL,
    [SYSAL$] [decimal](7, 0) NULL,
    [SYONH] [decimal](7, 0) NULL,
    [SYSHK] [decimal](7, 0) NULL,
    [SYDMG] [decimal](7, 0) NULL,
    [SYTRN] [decimal](7, 0) NULL,
    [SYTRN$] [decimal](7, 0) NULL,
    [SYOPIK] [decimal](7, 0) NULL,
    [SYINV] [decimal](7, 0) NULL,
    [SYRANK] [decimal](5, 0) NULL,
    [SYCOLR] [decimal](2, 0) NULL,
    [SYSIZE] [decimal](2, 0) NULL,
    [SYSTLE] [decimal](5, 0) NULL,
    [SYSIZS] [decimal](2, 0) NULL,
    [InsertedDateTime] [datetime2](7) NOT NULL,
    [UpdatedDateTime] [datetime2](7) NULL,
    [IsCurrent] [bit] NULL,
    [DELDATTIM] [datetime2](7) NULL,
    [DELFLAG] [char](1) NULL,
    CONSTRAINT [PK_C1810429_TOTAL_RAW] PRIMARY KEY CLUSTERED
    (
    [STYTOTALID] ASC,
    [STYLE] ASC
    )WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
    ) ON [PRIMARY]

    CREATE TABLE [DLSTYTOTAL](
    [STYLE] [decimal](5, 0) NULL,
    [SYTOT1] [decimal](7, 0) NULL,
    [SYTOT2] [decimal](7, 0) NULL,
    [SYTOT3] [decimal](7, 0) NULL,
    [SYTOT4] [decimal](7, 0) NULL,
    [SYCNT] [decimal](4, 0) NULL,
    [SYSHP$] [decimal](7, 0) NULL,
    [SYREC] [decimal](7, 0) NULL,
    [SYREC$] [decimal](7, 0) NULL,
    [SYFUL] [decimal](7, 0) NULL,
    [SYFUL$] [decimal](7, 0) NULL,
    [SYMKD] [decimal](7, 0) NULL,
    [SYMKD$] [decimal](7, 0) NULL,
    [SYDIS] [decimal](7, 0) NULL,
    [SYDIS$] [decimal](7, 0) NULL,
    [SYSAL] [decimal](7, 0) NULL,
    [SYSAL$] [decimal](7, 0) NULL,
    [SYONH] [decimal](7, 0) NULL,
    [SYSHK] [decimal](7, 0) NULL,
    [SYDMG] [decimal](7, 0) NULL,
    [SYTRN] [decimal](7, 0) NULL,
    [SYTRN$] [decimal](7, 0) NULL,
    [SYOPIK] [decimal](7, 0) NULL,
    [SYINV] [decimal](7, 0) NULL,
    [SYRANK] [decimal](5, 0) NULL,
    [SYCOLR] [decimal](2, 0) NULL,
    [SYSIZE] [decimal](2, 0) NULL,
    [SYSTLE] [decimal](5, 0) NULL,
    [SYSIZS] [decimal](2, 0) NULL,
    [DELDATTIM] [datetime2](7) NULL,
    [DELFLAG] [char](1) NULL
    )
    /*
    -- create index

    CREATE NONCLUSTERED INDEX [TOTAL_IsCurrent_InsertedDateTime_inc_STYLE_DELDATTIM_STYTOTALID] ON [TOTAL_RAW]
    ([IsCurrent],[InsertedDateTime]) include ([STYLE],DELDATTIM,[STYTOTALID]) with (online=on)

    CREATE NONCLUSTERED INDEX [TOTAL_STYLE_DELDATTIM]ON [TOTAL_RAW]
    ([STYLE],[DELDATTIM])
    */-- select count (*) from [TOTAL_RAW] --
    -- select count (*) from DLSTYTOTAL --

    --===== Populate the table.
    INSERT INTO [TOTAL_RAW] (STYLE,InsertedDateTime)
    SELECT TOP (15000000)
    -- SELECT TOP (196113141)
    STYLE = ABS(CHECKSUM(NEWID())%8)+1
    ,InsertedDateTime = DATEADD(mi,-ABS(CHECKSUM(NEWID())%(60*24*365*5)),SYSDATETIME())

    FROM sys.all_columns ac1
    CROSS JOIN sys.all_columns ac2
    ;
    GO

    INSERT INTO DLSTYTOTAL (STYLE)
    SELECT TOP (1000000)
    -- SELECT TOP (50000000)
    STYLE = ABS(CHECKSUM(NEWID())%8)+1

    FROM sys.all_columns ac1
    CROSS JOIN sys.all_columns ac2
    ;
    GO

    update [TOTAL_RAW] set IsCurrent = 0 where STYTOTALID>500000
    -- select top 10 * from [TOTAL_RAW] where IsCurrent = 0

    SP code:

    -- Exec [SP_Merge_TOTAL_RAW_DBA]  
    go
    create PROCEDURE [SP_Merge_TOTAL_RAW_DBA]

    --@ID INT,
    --@RunID nvarchar(50),
    --@RAWSPName nvarchar(50)
    AS
    BEGIN


    BEGIN TRY

    Declare
    @ROWCOUNT int,
    @UpdatedRowCount int,
    @InsertedRowCount int,
    @DeletedRowCount int,
    @LocalID int,
    @LocalRunID nvarchar(50),
    @StageCount int,
    @DeleteCount int,
    @LocalRAWSPName nvarchar(50);

    --SET @LocalID = @ID;
    --SET @LocalRunID = @RunID;
    --SET @LocalRAWSPName = @RAWSPName;

    SET @UpdatedRowCount = 0;
    SET @InsertedRowCount = 0;
    SET @DeletedRowCount= 0;
    SET @StageCount = (Select Count(1) FROM [DLSTYTOTAL]);

    /*Insert SubAuditLog Started Entry*/
    --EXEC [SP_SETSubAuditLog] @LocalID,@LocalRunID,@LocalRAWSPName,NULL,NULL,NULL,'Started'

    /*Check Delete Row Count*/
    SET @DeleteCount = (Select Count(1) FROM [TOTAL_RAW]
    WHERE InsertedDateTime < DATEADD(DAY, -0, SYSDATETIME()) and IsCurrent = 0)

    BEGIN TRANSACTION

    /*Check For Stage Table Count Then Execute*/IF @StageCount >= 1
    BEGIN

    /*INSERT THE RECORDS THAT ARE NEVER EXISTED BEFORE
    ISCURRENT = 1, INSERTEDDATETIME = CURRENT_TIMESTAMP AND UPDATEDDATETIME = NULL*/
    INSERT INTO [TOTAL_RAW] (
    [STYLE]
    ,[SYTOT1]
    ,[SYTOT2]
    ,[SYTOT3]
    ,[SYTOT4]
    ,[SYCNT]
    ,[SYSHP$]
    ,[SYREC]
    ,[SYREC$]
    ,[SYFUL]
    ,[SYFUL$]
    ,[SYMKD]
    ,[SYMKD$]
    ,[SYDIS]
    ,[SYDIS$]
    ,[SYSAL]
    ,[SYSAL$]
    ,[SYONH]
    ,[SYSHK]
    ,[SYDMG]
    ,[SYTRN]
    ,[SYTRN$]
    ,[SYOPIK]
    ,[SYINV]
    ,[SYRANK]
    ,[SYCOLR]
    ,[SYSIZE]
    ,[SYSTLE]
    ,[SYSIZS]
    ,[DELDATTIM]
    ,[DELFLAG]
    ,[InsertedDateTime]
    ,[IsCurrent]
    )

    SELECT
    STG.[STYLE]
    ,STG.[SYTOT1]
    ,STG.[SYTOT2]
    ,STG.[SYTOT3]
    ,STG.[SYTOT4]
    ,STG.[SYCNT]
    ,STG.[SYSHP$]
    ,STG.[SYREC]
    ,STG.[SYREC$]
    ,STG.[SYFUL]
    ,STG.[SYFUL$]
    ,STG.[SYMKD]
    ,STG.[SYMKD$]
    ,STG.[SYDIS]
    ,STG.[SYDIS$]
    ,STG.[SYSAL]
    ,STG.[SYSAL$]
    ,STG.[SYONH]
    ,STG.[SYSHK]
    ,STG.[SYDMG]
    ,STG.[SYTRN]
    ,STG.[SYTRN$]
    ,STG.[SYOPIK]
    ,STG.[SYINV]
    ,STG.[SYRANK]
    ,STG.[SYCOLR]
    ,STG.[SYSIZE]
    ,STG.[SYSTLE]
    ,STG.[SYSIZS]
    ,STG.[DELDATTIM]
    ,STG.[DELFLAG]
    ,DATEADD(MICROSECOND, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), SYSDATETIME())
    ,1
    FROM [DLSTYTOTAL] STG Left Join [TOTAL_RAW] TGT
    on STG.STYLE = TGT.STYLE AND STG.DELDATTIM = TGT.DELDATTIM
    Where TGT.DELDATTIM is null

    SET @InsertedRowCount= @@RowCount;

    /*SET ISCURRENT = 0, UpdatedDateTime TO CURRENTDATE FOR RECORDS THAT ARE NOT ACTIVE BASED ON THE DELDATTIM DESC*/With CTE AS (
    SELECT *,ROW_NUMBER() OVER(PARTITION BY STYLE ORDER BY DELDATTIM desc ) AS ROWNO from [TOTAL_RAW]
    WHERE IsCurrent = 1 )
    UPDATE CTE
    SET ISCURRENT = 0,UpdatedDateTime = SYSDATETIME()
    where ROWNO > 1

    SET @UpdatedRowCount= @@RowCount;

    END

    IF @DeleteCount >= 1
    BEGIN

    /*IF RECORDS ARE OLDER THAN 90 DAYS DELETE FOR RAW TABLE */
    EXEC ('CREATE VIEW [STYTOTAL_VIEW] as SELECT * FROM [TOTAL_RAW]
    WHERE InsertedDateTime < DATEADD(DAY, -0, SYSDATETIME()) and IsCurrent = 0 ')

    EXEC ('DELETE FROM [STYTOTAL_VIEW]')
    SET @DeletedRowCount=@@ROWCOUNT ;

    EXEC ('DROP VIEW [STYTOTAL_VIEW]')

    END

    /*COMMIT AND UPDATE IN THE SUBAUDITLOG TABLE*/
    COMMIT
    --EXECUTE [SP_SETSubAuditLog] @LocalID,@LocalRunID,@LocalRAWSPName,@InsertedRowCount,@UpdatedRowCount,@DeletedRowCount,'Completed';

    END TRY

    BEGIN CATCH

    /********************************************************
    When there is an error, create a record in Audit.ErrorLog
    using another stored procedure & passing in parameter values
    ********************************************************/

    /*Declare any variables that are not already DECLARED above*/
    DECLARE @ProcedureName NVARCHAR(50)
    DECLARE @ProcedureDesc NVARCHAR(255)
    DECLARE @ErrorMessage NVARCHAR(4000)
    DECLARE @ErrorSeverity INT
    DECLARE @ErrorState INT

    /*SET parameters to be passed into the error log stored proc if they are not already SET above*/
    SET @ProcedureName = '[SP_Merge_STYTOTAL_RAW]'
    SET @ProcedureDesc = 'Merge statement'
    SET @ErrorMessage = ERROR_MESSAGE()
    SET @ErrorSeverity = ERROR_SEVERITY()
    SET @ErrorState = ERROR_STATE()

    ROLLBACK
    /*
    EXEC [SP_SETSubAuditErrorLog]
    @LocalID,
    @ProcedureName,
    @ProcedureDesc,
    @ErrorMessage,
    @ErrorSeverity,
    @ErrorState
    */
    --EXECUTE [SP_SETSubAuditLog] @LocalID,@LocalRunID,@LocalRAWSPName,NULL,NULL,NULL,'Failed'

    RAISERROR (@ErrorMessage,@ErrorSeverity,@ErrorState)

    END CATCH;


    END;

    Appreciate your time and effort. Thanks for your help.

    • This topic was modified 1 month, 2 weeks ago by  Saran.
    • This topic was modified 1 month, 2 weeks ago by  Saran.
    • This topic was modified 1 month, 2 weeks ago by  Saran.
    • This topic was modified 1 month, 2 weeks ago by  Saran.
  • "This table does not have any index and tigers. It has only PK."

    The FROM and WHERE clauses of the INSERT:

    FROM [DLSTYTOTAL] STG Left Join [TOTAL_RAW] TGT  
    on STG.STYLE = TGT.STYLE
    AND STG.DELDATTIM = TGT.DELDATTIM
    Where TGT.DELDATTIM is null

    Maybe this is intended to INSERT WHERE NOT EXISTS.  It seems possible to switch to NOT EXISTS and change the LEFT JOIN to an INNER JOIN.  Maybe add an index on both tables on (STYLE, DELDATTIM).  And watch out for the tigers 🙂

    ,DATEADD(MICROSECOND, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), SYSDATETIME())

    What is the intention of this line?  This seems like a bad practice imo.  Imo adding a microsecond to make the InsertedDateTime unique for each row is a waste as the table already has a primary key.

    All of the code having to do with determining if there are 90+ day old rows could be eliminated imo.  JFDI = just freakin' do it.  Also, creating a VIEW using dynamic SQL only to later DROP while inside the transaction is also a waste imo.  JFDI.  Wherever possible try to eliminate counting and/or diagnostic code within transactions like this:

    /*Check Delete Row Count*/ 
    SET @DeleteCount = (Select Count(1) FROM [TOTAL_RAW]
    WHERE InsertedDateTime < DATEADD(DAY, -0, SYSDATETIME()) and IsCurrent = 0)

    Inside the transaction it seems possible there could be only 2 DML statements: 1) INSERT WHERE NOT EXISTS, and 2) DELETE WHERE 90+ days.

    Also, inside the CATCH block it seems like an attempt to re-throw the error.  Just THROW.  Also, it's not necessary to declare and re-type the name of the procedure as there's a system function ERROR_PROCEDURE.  Also, if the intention is for there to be a complete rollback of the DML statements then the procedure needs to SET XACT_ABORT ON (which has performance implications especially in the case where there is an actual ROLLBACK)

     

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Thanks Steve Collins for your response and time.

    I have less development knowledge. I have tested it out with 100 million data load and deleted and update individual ad-hoc statements. update  took 3 minutes and delete took 16 minutes. But from the SP within transaction it take 20 to 25 minutes.

    I have changed datetime2 to datetime and changed MICROSECOND to SECOND.

    The insert always quick it took 1 minute since it's joins and insert 1 million data. Any suggestion to speedup the delete. Like Table partitioning with data compression etc?

     

  • can you give as the ACTUAL explain plan of one of those slow executions - use https://www.brentozar.com/pastetheplan/ to upload the plan and give us the resulting link.

     

    that would give us a true indication of what are the issues (on top of what Steve already mentioned)

    that CTE update is also likely not the best for it.

  • Hi frederico fonseca,

    Here is the original plan for onw of long run SP not the one I mentioned in the post. https://www.brentozar.com/pastetheplan/?id=SycJlfMyJe

    Here is my test with 100M data plan without any parameters and condition https://www.brentozar.com/pastetheplan/?id=rJK3SQz1kg

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply