March 3, 2023 at 11:28 pm
I have written a stored procedure that updates data in my database with the data that was received from a third party database.
I run a script that brings the data that was modified in the last 24 hours from the 3rd party system. This table contains about 500,000 records for each day.
Then the stored procedure uses a cursor to go through the received data and update the main table in my database. If the data was updated, it deletes the record from the main table and inserts the new one.
If the data from new table is not found, it simply inserts the row into the main table.
This has been running without any issues for the past 5 years. I think with recent patching of sql/windows, this script is taking 5+ hours to complete (used to be about 30 minutes). It's taking most of the sql resources. I would like to know whether there is away to optimize it. If using cursor is not recommended, is there any other way that I could update main data from the newdata without using cursors.
Any help is greatly appreciated.
DECLARE @Unit varchar(20)
DECLARE @AName varchar(50)
DECLARE @RDate datetime
DECLARE @EDate datetime
DECLARE @UpdateDateTime datetime
DECLARE @AMn varchar(50)
DECLARE @AccNu varchar(30)
DECLARE @AType varchar(50)
DECLARE @Rank varchar(10)
Declare @PName varchar(200)
DECLARE @VID varchar(25)
Declare @Result_date datetime
Declare @Result_At varchar(50)
Declare @IntID varchar(200)
Declare @IntAID varchar(200)
Declare @A_RowUpdateDateTime datetime
Declare @Result_A_date datetime
Declare @Result_A_At varchar(50)
DECLARE cursor1 CURSOR
FOR
Select Distinct Unit, PName, AName, EnteredDateTime, IntID, VID, AccNu, IntAID,
IntUID, RowUpdateDateTime, A_RowUpdateDateTime
from NEWDATA
OPEN cursor1
FETCH NEXT FROM cursor1 INTO @Unit , @PName, @AName, @EDate, @AMn, @VID, @AccNu, @IntAID, @IntID, @UpdateDateTime, @A_RowUpdateDateTime
WHILE @@FETCH_STATUS = 0
BEGIN
Set @Result_date = Null
Set @Result_At = Null
Set @Result_A_date = Null
Select top 1 @Result_date= RowUpdateDateTime, @Result_At= AName from [dbo].[DataFin] where Unit= @Unit and PName= @PName
and IntID = @AMn and AccNu = @AccNu
Select top 1 @Result_A_date= A_RowUpdateDateTime , @Result_A_At= AName from [dbo].[DataFin] where Unit= @Unit and PName= @PName
and IntID = @AMn and AccNu = @AccNu
If @Result_date is NULL --New record. Insert into table
Begin
Insert into [dbo].[DataFin]
(Unit, PName, AName, EnteredDateTime, IntID, VID, AccNu, IntAID,
IntUID, RowUpdateDateTime, A_RowUpdateDateTime)
Select
Unit, PName, AName, EnteredDateTime, IntID, VID, AccNu, IntAID,
IntUID, RowUpdateDateTime, A_RowUpdateDateTime
from NEWDATA
where Unit= @Unit and VID = @VID and AccNu= @AccNu and IntID = @AMn
End
Else If (DateDiff(Second,@Result_date,@UpdateDateTime) >0 and @Result_At = @AName)
Or (DateDiff(Second,@Result_A_date,@A_RowUpdateDateTime) >0 and @Result_AV_At = @AName)
Begin
Delete from [dbo].[DataFin] where Unit= @Unit and PName= @PName and IntID = @AMn and AccNu = @AccNu
Insert into [dbo].[DataFin]
(Unit, PName, AName, EnteredDateTime, IntID, VID, AccNu, IntAID,
IntUID, RowUpdateDateTime, A_RowUpdateDateTime)
Select
Unit, PName, AName, EnteredDateTime, IntID, VID, AccNu, IntAID,
IntUID, RowUpdateDateTime, A_RowUpdateDateTime
from NEWDATA
where Unit= @Unit and VID = @VID and AccNu= @AccNu and IntID = @AMn
End
FETCH NEXT FROM cursor1 INTO @Unit , @PName, @AName, @EDate, @AMn, @VID, @AccNu, @IntAID, @IntID, @UpdateDateTime, @A_RowUpdateDateTime
END
CLOSE cursor1;
DEALLOCATE cursor1
March 4, 2023 at 1:44 am
I run a script that brings the data that was modified in the last 24 hours from the 3rd party system.
This table contains about 500,000 records for each day.
Then the stored procedure uses a cursor to go through the received data and update the main table in my database.
If the data was updated, it deletes the record from the main table and inserts the new one.
If the data from new table is not found, it simply inserts the row into the main table.
This does the same thing:
BEGIN TRY
BEGIN TRANSACTION
DELETE d
FROM [dbo].[DataFin] d
WHERE EXISTS(SELECT *
FROM NEWDATA n
WHERE n.Unit = d.Unit
AND n.PName = d.PName
AND n.IntID = d.AMn
AND n.AccNu = d.AccNu);
INSERT INTO [dbo].[DataFin]
(
Unit,
PName,
AName,
EnteredDateTime,
IntID,
VID,
AccNu,
IntAID,
IntUID,
RowUpdateDateTime,
A_RowUpdateDateTime
)
SELECT Unit,
PName,
AName,
EnteredDateTime,
IntID,
VID,
AccNu,
IntAID,
IntUID,
RowUpdateDateTime,
A_RowUpdateDateTime
FROM NEWDATA;
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
IF @@TRANCOUNT > 0
ROLLBACK TRANSACTION;
THROW;
END CATCH
March 4, 2023 at 10:24 am
I'd be interested to hear how long the new version takes.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
March 5, 2023 at 12:03 am
This does the same thing:
Are you sure? I struggle to understand cursors ate the best of times, but I think the set based solution might delete unchanged rows and then insert them again. This would have the same end result, but if it was 50 million not 500k it might matter. I would definitely get rid of the cursor, but perhaps only delete rows that have a changed RowUpdateDateTime.
This is the criteria for deleting rows.
Else If (DateDiff(Second,@Result_date, @UpdateDateTime) >0 and @Result_At = @AName)
Or (DateDiff(Second,@Result_A_date,@A_RowUpdateDateTime) >0 and @Result_AV_At = @AName)
I don't think the name comparison matter as the pairs of variables appear to be assigned with identical logic.
Does this account for the RowUpdateTimes comparison? Depending on how much change there is with each batch this could well be slower, but I prefer not to update unchanged rows. (I only took out the transaction to focus on the logic).
DELETE d
FROM dbo.DataFin AS d
WHERE EXISTS (SELECT 1
FROM dbo.NEWDATA AS n
WHERE n.Unit = d.Unit
AND n.PName = d.PName
AND n.IntID = d.AMn
AND n.AccNu = d.AccNu
AND ( n.RowUpdateDateTime > d.RowUpdateDateTime
OR
n.A_RowUpdateDateTime > d.RowUpdateDateTime));
INSERT INTO dbo.DataFin ( Unit,PName,AName,EnteredDateTime,IntID, VID, AccNu,
IntAID,IntUID,RowUpdateDateTime,A_RowUpdateDateTime)
SELECT Unit, PName, AName,EnteredDateTime,IntID, VID,AccNu,
IntAID,IntUID,RowUpdateDateTime,A_RowUpdateDateTime
FROM dbo.NEWDATA AS n
WHERE NOT EXISTS ( SELECT 1
FROM dbo.NEWDATA AS n
WHERE n.Unit = d.Unit
AND n.PName = d.PName
AND n.IntID = d.AMn
AND n.AccNu = d.AccNu)
March 5, 2023 at 12:21 am
Given the OP stated "I run a script that brings the data that was modified in the last 24 hours from the 3rd party system" I assumed that the data had been modified and so need updating or inserted into this database.
March 5, 2023 at 12:46 am
Jonathan AC Roberts wrote:This does the same thing:
Are you sure? I struggle to understand cursors ate the best of times, but I think the set based solution might delete unchanged rows and then insert them again. This would have the same end result, but if it was 50 million not 500k it might matter. I would definitely get rid of the cursor, but perhaps only delete rows that have a changed RowUpdateDateTime.
This is the criteria for deleting rows.
Else If (DateDiff(Second,@Result_date, @UpdateDateTime) >0 and @Result_At = @AName)
Or (DateDiff(Second,@Result_A_date,@A_RowUpdateDateTime) >0 and @Result_AV_At = @AName)I don't think the name comparison matter as the pairs of variables appear to be assigned with identical logic.
Does this account for the RowUpdateTimes comparison? Depending on how much change there is with each batch this could well be slower, but I prefer not to update unchanged rows. (I only took out the transaction to focus on the logic).
DELETE d
FROM dbo.DataFin AS d
WHERE EXISTS (SELECT 1
FROM dbo.NEWDATA AS n
WHERE n.Unit = d.Unit
AND n.PName = d.PName
AND n.IntID = d.AMn
AND n.AccNu = d.AccNu
AND ( n.RowUpdateDateTime > d.RowUpdateDateTime
OR
n.A_RowUpdateDateTime > d.RowUpdateDateTime));
INSERT INTO dbo.DataFin ( Unit,PName,AName,EnteredDateTime,IntID, VID, AccNu,
IntAID,IntUID,RowUpdateDateTime,A_RowUpdateDateTime)
SELECT Unit, PName, AName,EnteredDateTime,IntID, VID,AccNu,
IntAID,IntUID,RowUpdateDateTime,A_RowUpdateDateTime
FROM dbo.NEWDATA AS n
WHERE NOT EXISTS ( SELECT 1
FROM dbo.NEWDATA AS n
WHERE n.Unit = d.Unit
AND n.PName = d.PName
AND n.IntID = d.AMn
AND n.AccNu = d.AccNu)
Your insert statement contains:
FROM dbo.NEWDATA AS n
WHERE NOT EXISTS ( SELECT 1
FROM dbo.NEWDATA AS n
WHERE n.Unit = d.Unit
AND n.PName = d.PName
AND n.IntID = d.AMn
AND n.AccNu = d.AccNu)
I think there is an error as alias "d" does not exist. If it is joining the two dbo.NEWDATA tables together it will never return any rows.
March 5, 2023 at 1:23 am
Yeah it’s meant to be datafin as d, but you’re right. If the newdata only has changed or new data then there is no need to compare the dates either in the cursor or set approach.
March 6, 2023 at 4:07 pm
Thanks for all the replies. Much appreciated.
I agree that I don't need to compare data in the new table with the existing one.
I created a temp table to test the provided sample script. Got a small issue with that.
Here in my script, I am looking only for the VID field in the where criteria.
My new table contains only 4 new rows.
Two rows got data for the same VID 66611.
Here the latest rowupdatetime for VID 66611, got 2023-01-14 12:23.
I want only this row inserted to the table.
The script however, inserts both.
CREATE TABLE #t1([VID] [varchar](100) NULL, [OrdDate] [datetime] NOT NULL,
[Prg1] [varchar](10) NULL, [Prg2] [varchar](10) NULL, [Prg3] [varchar](10), [Prg4] [varchar](10), [Prg5] [varchar](10), [Prg6] [varchar](10),
[RowUpdatDateTime] DateTime
)
ON [PRIMARY]
insert into #t1 values('11111','2022-01-10 13:37:06.000','P1','P2','P3','','','','2023-01-13 00:00:00.000')
insert into #t1 values('2222','2022-01-10 13:37:06.000','P1','','','','','','2023-01-13 01:00:00.000')
insert into #t1 values('33333','2022-01-10 13:37:06.000','','P2','','','','','2023-01-13 01:10:00.000')
insert into #t1 values('444','2022-01-10 13:37:06.000','P1','P2','P3','P4','P5','P6','2023-01-13 12:23:00.000')
insert into #t1 values('55511','2022-01-10 13:37:06.000','','','P3','','','','2023-01-13 15:00:00.000')
insert into #t1 values('66611','2022-01-10 13:37:06.000','','P2','P3','P4','','','2023-01-13 17:37:00.000')
insert into #t1 values('77711','2022-01-10 13:37:06.000','','','P3','P4','P5','','2023-01-13 03:03:00.000')
insert into #t1 values('118811','2022-01-10 13:37:06.000','','','','','','','2023-01-13 04:00:00.000')
insert into #t1 values('145431','2022-01-10 13:37:06.000','','','','','','P5','2023-01-13 03:00:00.000')
select * from #t1 order by vid,[RowUpdatDateTime]
CREATE TABLE #new([VID] [varchar](100) NULL, [OrdDate] [datetime] NOT NULL,
[Prg1] [varchar](10) NULL, [Prg2] [varchar](10) NULL, [Prg3] [varchar](10), [Prg4] [varchar](10), [Prg5] [varchar](10), [Prg6] [varchar](10),
[RowUpdatDateTime] DateTime
)
ON [PRIMARY]
insert into #new values('11111','2022-01-10 13:37:06.000','P1','P2','P3','','NEW1','','2023-01-13 00:07:00.000')
insert into #new values('66611','2022-01-10 13:37:06.000','P1','','','','','NEW2','2023-01-14 01:00:00.000')
insert into #new values('118811','2022-01-10 13:37:06.000','','P2','','','NEW3','','2023-01-14 01:10:00.000')
insert into #new values('66611','2022-01-10 13:37:06.000','P1','P2','P3','P4','P5','NEWP6','2023-01-14 12:23:00.000')
select * from #new order by vid,[RowUpdatDateTime]
BEGIN TRY
BEGIN TRANSACTION
DELETE d
FROM #t1 d
WHERE EXISTS(SELECT *
FROM #new n
WHERE n.[VID] = d.[VID]);
INSERT INTO #t1
(
[VID],[OrdDate],[Prg1],[Prg2],[Prg3],[Prg4],[Prg5],[Prg6],[RowUpdatDateTime]
)
SELECT
[VID],[OrdDate],[Prg1],[Prg2],[Prg3],[Prg4],[Prg5],[Prg6],[RowUpdatDateTime]
FROM #new;
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
IF @@TRANCOUNT > 0
ROLLBACK TRANSACTION;
THROW;
END CATCH
select * from #t1 order by vid,[RowUpdatDateTime]
drop table #t1
drop table #new
March 6, 2023 at 4:41 pm
When you do the insert you just need to select the most recent row for that [VID] on the #new table.
You can do this with a CTE, for the example you just posted:
;WITH CTE AS
(
SELECT DISTINCT [VID]
FROM #new
),
CTE2 AS
(
SELECT x.[VID],x.[OrdDate],x.[Prg1],x.[Prg2],x.[Prg3],x.[Prg4],x.[Prg5],x.[Prg6],x.[RowUpdatDateTime]
FROM CTE
CROSS APPLY(SELECT TOP(1)
x.[VID],x.[OrdDate],x.[Prg1],x.[Prg2],x.[Prg3],x.[Prg4],x.[Prg5],x.[Prg6],x.[RowUpdatDateTime]
FROM #new x
WHERE x.VID = CTE.VID
ORDER BY RowUpdatDateTime DESC) x
)
INSERT INTO #t1
(
[VID],[OrdDate],[Prg1],[Prg2],[Prg3],[Prg4],[Prg5],[Prg6],[RowUpdatDateTime]
)
SELECT [VID],[OrdDate],[Prg1],[Prg2],[Prg3],[Prg4],[Prg5],[Prg6],[RowUpdatDateTime]
FROM CTE2
;
For the original SQL:
BEGIN TRY
BEGIN TRANSACTION
DELETE d
FROM [dbo].[DataFin] d
WHERE EXISTS(SELECT *
FROM NEWDATA n
WHERE n.Unit = d.Unit
AND n.PName = d.PName
AND n.IntID = d.AMn
AND n.AccNu = d.AccNu);
;WITH CTE AS
(
SELECT DISTINCT
Unit,
PName,
AMn,
AccNu
FROM NEWDATA n
),
CTE2 AS
(
SELECT x.*
FROM CTE d
CROSS APPLY(SELECT TOP(1) *
FROM NEWDATA n
WHERE d.Unit = n.Unit
AND d.PName = n.PName
AND d.AMn = n.AMn
AND d.AccNu = n.AccNu
ORDER BY n.RowUpdateDateTime DESC) x
)
INSERT INTO [dbo].[DataFin]
(
Unit,
PName,
AName,
EnteredDateTime,
IntID,
VID,
AccNu,
IntAID,
IntUID,
RowUpdateDateTime,
A_RowUpdateDateTime
)
SELECT Unit,
PName,
AName,
EnteredDateTime,
IntID,
VID,
AccNu,
IntAID,
IntUID,
RowUpdateDateTime,
A_RowUpdateDateTime
FROM CTE2;
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
IF @@TRANCOUNT > 0
ROLLBACK TRANSACTION;
THROW;
END CATCH
Alternatively you could delete any duplicate rows with older dates on the NEWDATA table before you do the processing.
March 6, 2023 at 7:31 pm
Thanks for the reply. I am getting Incorrect Syntax near ')'
;WITH CTE AS
(
SELECT DISTINCT [VID],[OrdDate],[Prg1],[Prg2],[Prg3],[Prg4],[Prg5],[Prg6],[RowUpdatDateTime]
FROM #new
),
CTE2 AS
(
SELECT x.[VID],x.[OrdDate],x.[Prg1],x.[Prg2],x.[Prg3],x.[Prg4],x.[Prg5],x.[Prg6],x.[RowUpdatDateTime]
FROM CTE x
CROSS APPLY(SELECT TOP(1)
x.[VID],x.[OrdDate],x.[Prg1],x.[Prg2],x.[Prg3],x.[Prg4],x.[Prg5],x.[Prg6],x.[RowUpdatDateTime]
FROM #new
ORDER BY RowUpdatDateTime DESC)
)
March 6, 2023 at 7:41 pm
I corrected that in my previous comment.
;WITH CTE AS
(
SELECT DISTINCT [VID],[OrdDate],[Prg1],[Prg2],[Prg3],[Prg4],[Prg5],[Prg6],[RowUpdatDateTime]
FROM #new
),
CTE2 AS
(
SELECT x.[VID],x.[OrdDate],x.[Prg1],x.[Prg2],x.[Prg3],x.[Prg4],x.[Prg5],x.[Prg6],x.[RowUpdatDateTime]
FROM CTE x
CROSS APPLY(SELECT TOP(1)
x.[VID],x.[OrdDate],x.[Prg1],x.[Prg2],x.[Prg3],x.[Prg4],x.[Prg5],x.[Prg6],x.[RowUpdatDateTime]
FROM #new
ORDER BY RowUpdatDateTime DESC) x
)
INSERT INTO #t1
(
[VID],[OrdDate],[Prg1],[Prg2],[Prg3],[Prg4],[Prg5],[Prg6],[RowUpdatDateTime]
)
SELECT [VID],[OrdDate],[Prg1],[Prg2],[Prg3],[Prg4],[Prg5],[Prg6],[RowUpdatDateTime]
FROM CTE2
;
It was missing the "x" on the end of ORDER BY RowUpdatDateTime DESC) x
March 6, 2023 at 7:51 pm
Sorry, my bad. It works. Thanks for all the replies.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply