Cursor in stored procedure taking hours to run

  • I have written a stored procedure that updates data in my database with the data that was received from a third party database.

    I run a script that brings the data that was modified in the last 24 hours from the 3rd party system. This table contains about 500,000 records for each day.

    Then the stored procedure uses a cursor to go through the received data and update the main table in my database. If the data was updated, it deletes the record from the main table and inserts the new one.

    If the data from new table is not found, it simply inserts the row into the main table.

    This has been running without any issues for the past 5 years. I think with recent patching of sql/windows, this script is taking 5+ hours to complete (used to be about 30 minutes). It's taking most of the sql resources. I would like to know whether there is away to optimize it. If using cursor is not recommended, is there any other way that I could update main data from the newdata without using cursors.

    Any help is greatly appreciated.

    DECLARE @Unit varchar(20)
    DECLARE @AName varchar(50)
    DECLARE @RDate datetime
    DECLARE @EDate datetime
    DECLARE @UpdateDateTime datetime
    DECLARE @AMn varchar(50)
    DECLARE @AccNu varchar(30)
    DECLARE @AType varchar(50)
    DECLARE @Rank varchar(10)
    Declare @PName varchar(200)
    DECLARE @VID varchar(25)
    Declare @Result_date datetime
    Declare @Result_At varchar(50)
    Declare @IntID varchar(200)
    Declare @IntAID varchar(200)
    Declare @A_RowUpdateDateTime datetime
    Declare @Result_A_date datetime
    Declare @Result_A_At varchar(50)

    DECLARE cursor1 CURSOR
    FOR
    Select Distinct Unit, PName, AName, EnteredDateTime, IntID, VID, AccNu, IntAID,
    IntUID, RowUpdateDateTime, A_RowUpdateDateTime
    from NEWDATA

    OPEN cursor1

    FETCH NEXT FROM cursor1 INTO @Unit , @PName, @AName, @EDate, @AMn, @VID, @AccNu, @IntAID, @IntID, @UpdateDateTime, @A_RowUpdateDateTime

    WHILE @@FETCH_STATUS = 0
    BEGIN
    Set @Result_date = Null
    Set @Result_At = Null
    Set @Result_A_date = Null

    Select top 1 @Result_date= RowUpdateDateTime, @Result_At= AName from [dbo].[DataFin] where Unit= @Unit and PName= @PName
    and IntID = @AMn and AccNu = @AccNu

    Select top 1 @Result_A_date= A_RowUpdateDateTime , @Result_A_At= AName from [dbo].[DataFin] where Unit= @Unit and PName= @PName
    and IntID = @AMn and AccNu = @AccNu

    If @Result_date is NULL --New record. Insert into table
    Begin
    Insert into [dbo].[DataFin]
    (Unit, PName, AName, EnteredDateTime, IntID, VID, AccNu, IntAID,
    IntUID, RowUpdateDateTime, A_RowUpdateDateTime)

    Select
    Unit, PName, AName, EnteredDateTime, IntID, VID, AccNu, IntAID,
    IntUID, RowUpdateDateTime, A_RowUpdateDateTime
    from NEWDATA
    where Unit= @Unit and VID = @VID and AccNu= @AccNu and IntID = @AMn
    End

    Else If (DateDiff(Second,@Result_date,@UpdateDateTime) >0 and @Result_At = @AName)
    Or (DateDiff(Second,@Result_A_date,@A_RowUpdateDateTime) >0 and @Result_AV_At = @AName)

    Begin
    Delete from [dbo].[DataFin] where Unit= @Unit and PName= @PName and IntID = @AMn and AccNu = @AccNu

    Insert into [dbo].[DataFin]
    (Unit, PName, AName, EnteredDateTime, IntID, VID, AccNu, IntAID,
    IntUID, RowUpdateDateTime, A_RowUpdateDateTime)

    Select
    Unit, PName, AName, EnteredDateTime, IntID, VID, AccNu, IntAID,
    IntUID, RowUpdateDateTime, A_RowUpdateDateTime
    from NEWDATA
    where Unit= @Unit and VID = @VID and AccNu= @AccNu and IntID = @AMn
    End

    FETCH NEXT FROM cursor1 INTO @Unit , @PName, @AName, @EDate, @AMn, @VID, @AccNu, @IntAID, @IntID, @UpdateDateTime, @A_RowUpdateDateTime
    END
    CLOSE cursor1;
    DEALLOCATE cursor1
  • don075 wrote:

    I run a script that brings the data that was modified in the last 24 hours from the 3rd party system.

    This table contains about 500,000 records for each day.

    Then the stored procedure uses a cursor to go through the received data and update the main table in my database.

    If the data was updated, it deletes the record from the main table and inserts the new one.

    If the data from new table is not found, it simply inserts the row into the main table.

    This does the same thing:

    BEGIN TRY

    BEGIN TRANSACTION

    DELETE d
    FROM [dbo].[DataFin] d
    WHERE EXISTS(SELECT *
    FROM NEWDATA n
    WHERE n.Unit = d.Unit
    AND n.PName = d.PName
    AND n.IntID = d.AMn
    AND n.AccNu = d.AccNu);

    INSERT INTO [dbo].[DataFin]
    (
    Unit,
    PName,
    AName,
    EnteredDateTime,
    IntID,
    VID,
    AccNu,
    IntAID,
    IntUID,
    RowUpdateDateTime,
    A_RowUpdateDateTime
    )
    SELECT Unit,
    PName,
    AName,
    EnteredDateTime,
    IntID,
    VID,
    AccNu,
    IntAID,
    IntUID,
    RowUpdateDateTime,
    A_RowUpdateDateTime
    FROM NEWDATA;

    COMMIT TRANSACTION;

    END TRY

    BEGIN CATCH
    IF @@TRANCOUNT > 0
    ROLLBACK TRANSACTION;
    THROW;
    END CATCH
  • I'd be interested to hear how long the new version takes.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Jonathan AC Roberts wrote:

    This does the same thing:


    Are you sure? I struggle to understand cursors ate the best of times, but I think the set based solution might delete unchanged rows and then insert them again. This would have the same end result, but if it was 50 million not 500k it might matter. I would definitely get rid of the cursor, but perhaps only delete rows that have a changed RowUpdateDateTime.

    This is the criteria for deleting rows.

    Else If (DateDiff(Second,@Result_date, @UpdateDateTime) >0 and @Result_At = @AName)
    Or (DateDiff(Second,@Result_A_date,@A_RowUpdateDateTime) >0 and @Result_AV_At = @AName)

    I don't think the name comparison matter as the pairs of variables appear to be assigned with identical logic.

    Does this account for the RowUpdateTimes comparison?  Depending on how much change there is with each batch this could well be slower, but I prefer not to update unchanged rows. (I only took out the transaction to focus on the logic).

    DELETE  d
    FROM dbo.DataFin AS d
    WHERE EXISTS (SELECT 1
    FROM dbo.NEWDATA AS n
    WHERE n.Unit = d.Unit
    AND n.PName = d.PName
    AND n.IntID = d.AMn
    AND n.AccNu = d.AccNu
    AND ( n.RowUpdateDateTime > d.RowUpdateDateTime
    OR
    n.A_RowUpdateDateTime > d.RowUpdateDateTime));

    INSERT INTO dbo.DataFin ( Unit,PName,AName,EnteredDateTime,IntID, VID, AccNu,
    IntAID,IntUID,RowUpdateDateTime,A_RowUpdateDateTime)
    SELECT Unit, PName, AName,EnteredDateTime,IntID, VID,AccNu,
    IntAID,IntUID,RowUpdateDateTime,A_RowUpdateDateTime
    FROM dbo.NEWDATA AS n
    WHERE NOT EXISTS ( SELECT 1
    FROM dbo.NEWDATA AS n
    WHERE n.Unit = d.Unit
    AND n.PName = d.PName
    AND n.IntID = d.AMn
    AND n.AccNu = d.AccNu)

     

  • Given the OP stated "I run a script that brings the data that was modified in the last 24 hours from the 3rd party system" I assumed that the data had been modified and so need updating or inserted into this database.

  • Ed B wrote:

    Jonathan AC Roberts wrote:

    This does the same thing:


    Are you sure? I struggle to understand cursors ate the best of times, but I think the set based solution might delete unchanged rows and then insert them again. This would have the same end result, but if it was 50 million not 500k it might matter. I would definitely get rid of the cursor, but perhaps only delete rows that have a changed RowUpdateDateTime.

    This is the criteria for deleting rows.

    Else If (DateDiff(Second,@Result_date, @UpdateDateTime) >0 and @Result_At = @AName)
    Or (DateDiff(Second,@Result_A_date,@A_RowUpdateDateTime) >0 and @Result_AV_At = @AName)

    I don't think the name comparison matter as the pairs of variables appear to be assigned with identical logic.

    Does this account for the RowUpdateTimes comparison?  Depending on how much change there is with each batch this could well be slower, but I prefer not to update unchanged rows. (I only took out the transaction to focus on the logic).

    DELETE  d
    FROM dbo.DataFin AS d
    WHERE EXISTS (SELECT 1
    FROM dbo.NEWDATA AS n
    WHERE n.Unit = d.Unit
    AND n.PName = d.PName
    AND n.IntID = d.AMn
    AND n.AccNu = d.AccNu
    AND ( n.RowUpdateDateTime > d.RowUpdateDateTime
    OR
    n.A_RowUpdateDateTime > d.RowUpdateDateTime));

    INSERT INTO dbo.DataFin ( Unit,PName,AName,EnteredDateTime,IntID, VID, AccNu,
    IntAID,IntUID,RowUpdateDateTime,A_RowUpdateDateTime)
    SELECT Unit, PName, AName,EnteredDateTime,IntID, VID,AccNu,
    IntAID,IntUID,RowUpdateDateTime,A_RowUpdateDateTime
    FROM dbo.NEWDATA AS n
    WHERE NOT EXISTS ( SELECT 1
    FROM dbo.NEWDATA AS n
    WHERE n.Unit = d.Unit
    AND n.PName = d.PName
    AND n.IntID = d.AMn
    AND n.AccNu = d.AccNu)

    Your insert statement contains:

    FROM    dbo.NEWDATA AS n
    WHERE NOT EXISTS ( SELECT 1
    FROM dbo.NEWDATA AS n
    WHERE n.Unit = d.Unit
    AND n.PName = d.PName
    AND n.IntID = d.AMn
    AND n.AccNu = d.AccNu)

    I  think there is an error as alias "d" does not exist. If it is joining the two dbo.NEWDATA tables together it will never return any rows.

  • Yeah it’s meant to be datafin as d, but you’re right. If the newdata only has changed or new data then there is no need to compare the dates either in the cursor or set approach.

  • Thanks for all the replies. Much appreciated.

    I agree that I don't need to compare data in the new table with the existing one.

    I created a temp table to test the provided sample script. Got a small issue with that.

    Here in my script, I am looking only for the VID field in the where criteria.

    My new table contains only 4 new rows.

    Two rows got data for the same VID 66611.

    Here the latest rowupdatetime for VID 66611, got 2023-01-14 12:23.

    I want only this row inserted to the table.

    The script however, inserts both.

     

    CREATE TABLE #t1([VID] [varchar](100) NULL, [OrdDate] [datetime] NOT NULL,
    [Prg1] [varchar](10) NULL, [Prg2] [varchar](10) NULL, [Prg3] [varchar](10), [Prg4] [varchar](10), [Prg5] [varchar](10), [Prg6] [varchar](10),
    [RowUpdatDateTime] DateTime
    )
    ON [PRIMARY]

    insert into #t1 values('11111','2022-01-10 13:37:06.000','P1','P2','P3','','','','2023-01-13 00:00:00.000')
    insert into #t1 values('2222','2022-01-10 13:37:06.000','P1','','','','','','2023-01-13 01:00:00.000')
    insert into #t1 values('33333','2022-01-10 13:37:06.000','','P2','','','','','2023-01-13 01:10:00.000')
    insert into #t1 values('444','2022-01-10 13:37:06.000','P1','P2','P3','P4','P5','P6','2023-01-13 12:23:00.000')
    insert into #t1 values('55511','2022-01-10 13:37:06.000','','','P3','','','','2023-01-13 15:00:00.000')
    insert into #t1 values('66611','2022-01-10 13:37:06.000','','P2','P3','P4','','','2023-01-13 17:37:00.000')
    insert into #t1 values('77711','2022-01-10 13:37:06.000','','','P3','P4','P5','','2023-01-13 03:03:00.000')
    insert into #t1 values('118811','2022-01-10 13:37:06.000','','','','','','','2023-01-13 04:00:00.000')
    insert into #t1 values('145431','2022-01-10 13:37:06.000','','','','','','P5','2023-01-13 03:00:00.000')

    select * from #t1 order by vid,[RowUpdatDateTime]

    CREATE TABLE #new([VID] [varchar](100) NULL, [OrdDate] [datetime] NOT NULL,
    [Prg1] [varchar](10) NULL, [Prg2] [varchar](10) NULL, [Prg3] [varchar](10), [Prg4] [varchar](10), [Prg5] [varchar](10), [Prg6] [varchar](10),
    [RowUpdatDateTime] DateTime
    )
    ON [PRIMARY]
    insert into #new values('11111','2022-01-10 13:37:06.000','P1','P2','P3','','NEW1','','2023-01-13 00:07:00.000')
    insert into #new values('66611','2022-01-10 13:37:06.000','P1','','','','','NEW2','2023-01-14 01:00:00.000')
    insert into #new values('118811','2022-01-10 13:37:06.000','','P2','','','NEW3','','2023-01-14 01:10:00.000')
    insert into #new values('66611','2022-01-10 13:37:06.000','P1','P2','P3','P4','P5','NEWP6','2023-01-14 12:23:00.000')

    select * from #new order by vid,[RowUpdatDateTime]
    BEGIN TRY
    BEGIN TRANSACTION
    DELETE d
    FROM #t1 d
    WHERE EXISTS(SELECT *
    FROM #new n
    WHERE n.[VID] = d.[VID]);
    INSERT INTO #t1
    (
    [VID],[OrdDate],[Prg1],[Prg2],[Prg3],[Prg4],[Prg5],[Prg6],[RowUpdatDateTime]
    )
    SELECT
    [VID],[OrdDate],[Prg1],[Prg2],[Prg3],[Prg4],[Prg5],[Prg6],[RowUpdatDateTime]
    FROM #new;
    COMMIT TRANSACTION;
    END TRY

    BEGIN CATCH
    IF @@TRANCOUNT > 0
    ROLLBACK TRANSACTION;
    THROW;
    END CATCH

    select * from #t1 order by vid,[RowUpdatDateTime]
    drop table #t1
    drop table #new
  • When you do the insert you just need to select the most recent row for that [VID] on the #new table.

    You can do this with a CTE, for the example you just posted:

    ;WITH CTE AS 
    (
    SELECT DISTINCT [VID]
    FROM #new
    ),
    CTE2 AS
    (
    SELECT x.[VID],x.[OrdDate],x.[Prg1],x.[Prg2],x.[Prg3],x.[Prg4],x.[Prg5],x.[Prg6],x.[RowUpdatDateTime]
    FROM CTE
    CROSS APPLY(SELECT TOP(1)
    x.[VID],x.[OrdDate],x.[Prg1],x.[Prg2],x.[Prg3],x.[Prg4],x.[Prg5],x.[Prg6],x.[RowUpdatDateTime]
    FROM #new x
    WHERE x.VID = CTE.VID
    ORDER BY RowUpdatDateTime DESC) x
    )
    INSERT INTO #t1
    (
    [VID],[OrdDate],[Prg1],[Prg2],[Prg3],[Prg4],[Prg5],[Prg6],[RowUpdatDateTime]
    )
    SELECT [VID],[OrdDate],[Prg1],[Prg2],[Prg3],[Prg4],[Prg5],[Prg6],[RowUpdatDateTime]
    FROM CTE2
    ;

    For the original SQL:

    BEGIN TRY

    BEGIN TRANSACTION

    DELETE d
    FROM [dbo].[DataFin] d
    WHERE EXISTS(SELECT *
    FROM NEWDATA n
    WHERE n.Unit = d.Unit
    AND n.PName = d.PName
    AND n.IntID = d.AMn
    AND n.AccNu = d.AccNu);

    ;WITH CTE AS
    (
    SELECT DISTINCT
    Unit,
    PName,
    AMn,
    AccNu
    FROM NEWDATA n
    ),
    CTE2 AS
    (
    SELECT x.*
    FROM CTE d
    CROSS APPLY(SELECT TOP(1) *
    FROM NEWDATA n
    WHERE d.Unit = n.Unit
    AND d.PName = n.PName
    AND d.AMn = n.AMn
    AND d.AccNu = n.AccNu
    ORDER BY n.RowUpdateDateTime DESC) x
    )
    INSERT INTO [dbo].[DataFin]
    (
    Unit,
    PName,
    AName,
    EnteredDateTime,
    IntID,
    VID,
    AccNu,
    IntAID,
    IntUID,
    RowUpdateDateTime,
    A_RowUpdateDateTime
    )
    SELECT Unit,
    PName,
    AName,
    EnteredDateTime,
    IntID,
    VID,
    AccNu,
    IntAID,
    IntUID,
    RowUpdateDateTime,
    A_RowUpdateDateTime
    FROM CTE2;

    COMMIT TRANSACTION;

    END TRY

    BEGIN CATCH
    IF @@TRANCOUNT > 0
    ROLLBACK TRANSACTION;
    THROW;
    END CATCH

    Alternatively you could delete any duplicate rows with older dates on the NEWDATA table before you do the processing.

     

    • This reply was modified 1 year, 8 months ago by  Jonathan AC Roberts. Reason: fixed aliases to make queries work
  • Thanks for the reply. I am getting Incorrect Syntax near ')'

    ;WITH CTE AS
    (
    SELECT DISTINCT [VID],[OrdDate],[Prg1],[Prg2],[Prg3],[Prg4],[Prg5],[Prg6],[RowUpdatDateTime]
    FROM #new
    ),
    CTE2 AS
    (
    SELECT x.[VID],x.[OrdDate],x.[Prg1],x.[Prg2],x.[Prg3],x.[Prg4],x.[Prg5],x.[Prg6],x.[RowUpdatDateTime]
    FROM CTE x
    CROSS APPLY(SELECT TOP(1)
    x.[VID],x.[OrdDate],x.[Prg1],x.[Prg2],x.[Prg3],x.[Prg4],x.[Prg5],x.[Prg6],x.[RowUpdatDateTime]
    FROM #new
    ORDER BY RowUpdatDateTime DESC)
    )

  • I corrected that in my previous comment.

    ;WITH CTE AS
    (
    SELECT DISTINCT [VID],[OrdDate],[Prg1],[Prg2],[Prg3],[Prg4],[Prg5],[Prg6],[RowUpdatDateTime]
    FROM #new
    ),
    CTE2 AS
    (
    SELECT x.[VID],x.[OrdDate],x.[Prg1],x.[Prg2],x.[Prg3],x.[Prg4],x.[Prg5],x.[Prg6],x.[RowUpdatDateTime]
    FROM CTE x
    CROSS APPLY(SELECT TOP(1)
    x.[VID],x.[OrdDate],x.[Prg1],x.[Prg2],x.[Prg3],x.[Prg4],x.[Prg5],x.[Prg6],x.[RowUpdatDateTime]
    FROM #new
    ORDER BY RowUpdatDateTime DESC) x
    )
    INSERT INTO #t1
    (
    [VID],[OrdDate],[Prg1],[Prg2],[Prg3],[Prg4],[Prg5],[Prg6],[RowUpdatDateTime]
    )
    SELECT [VID],[OrdDate],[Prg1],[Prg2],[Prg3],[Prg4],[Prg5],[Prg6],[RowUpdatDateTime]
    FROM CTE2
    ;

    It was missing the "x" on the end of ORDER BY RowUpdatDateTime DESC) x

  • Sorry, my bad. It works. Thanks for all the replies.

  • don075 wrote:

    Sorry, my bad. It works. Thanks for all the replies.

    Please let me know how long the procedure takes with the updated SQL.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply