Improve performance

Question

Improve performance

yuvipoy

SSChampion

Points: 10469
More actions
March 5, 2014 at 4:49 am

#290552

This is my current stored procedure it is taking more time to finish the opration.
any suggestion to improve this?
There are around 25 _TESTDATA tables each around 8 million records , i want to delete the records which are older than 30 days.
CREATE PROCEDURE sp_delete_by_date
@pdate AS int,
@nrp bit = 1
AS
SET DEADLOCK_PRIORITY LOW
SET IMPLICIT_TRANSACTIONS ON
DECLARE @i tinyint
DECLARE @Table_Name varchar(50)
CREATE TABLE #MC (
ID int IDENTITY(1,1) NOT NULL,
Table_Name varchar(50) NOT NULL)
INSERT INTO #MC
SELECT object_name(fkeyid) FROM sysforeignkeys
WHERE ( object_name(fkeyid) LIKE '%_TESTDATA' )
AND object_name(fkeyid) NOT LIKE 'TMP_%'
SET @i = 1
WHILE @i <=(SELECT MAX(ID) FROM #MC)
BEGIN
SELECT @Table_Name=Table_Name FROM #MC WHERE ID=@i
EXEC ('DELETE FROM '+ @Table_Name +' WHERE RUN_START_TIME<=' + @pdate + '')
COMMIT TRAN
SET @i=@i +1
END
IF @nrp <> 1
BEGIN
WAITFOR DELAY '00:00:00.800'
END
DELETE TESTDATA WHERE RUN_START_TIME <=@pdate
COMMIT TRAN
IF @nrp <> 1
BEGIN
WAITFOR DELAY '00:00:00.200'
END
DELETE TR WHERE RUN_END_TIME<=@pdate
COMMIT TRAN
DROP TABLE #MC
COMMIT TRAN
Do we really need Commit at every delete statement or can we have it as common statement
something like
IF @@ERROR != 0
BEGIN
ROLLBACK TRANSACTION
RETURN
END
ELSE
BEGIN
COMMIT TRANSACTION
END

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Steve Jones - SSC Editor SSC Guru Points: 736455 More actions · Answer 1

If you commit after every row, it will run slower, but it also allows you to manage your log size. If it's one big commit, it's one big transaction that must be maintained in the log.

In terms of making this faster, I'd look at deleting batches of rows, perhaps in the 10,000-100,000 size range to speed things up. That often gives you a good balance between one large transaction and 8mm small ones.

yuvipoy SSChampion Points: 10469 More actions · Answer 2

So steve you are suggesting some thing like

EXEC ('DELETE Top (10000) FROM '+ @Table_Name +' WHERE RUN_START_TIME<=' + @pdate + '')

COMMIT TRAN

DELETE Top (10000) TESTDATA WHERE RUN_START_TIME <=@pdate

COMMIT TRAN

DELETE Top (10000) TR WHERE RUN_END_TIME<=@pdate

COMMIT TRAN

incase what if in lower version of sql server say sql server 2000 how can we manage it

EricEyster SSCrazy Points: 2974 More actions · Answer 3

yuvipoy,

we have similar requirements for our real-time systems. We need to delete millions of rows and not cause locking that would stop transactions. I wrote a delete loop that works for us.

--Start by getting the records to delete

SELECT ID

INTO #DeleteKeys

FROM TESTDATA WITH (NOLOCK)

WHERE RUN_START_TIME <=@pdate

--loop thru small batches to avoid excessive locking

create table #DeleteBatch (ID int)

declare @RowCount int

SET @RowCount = 1000

WHILE @RowCount = 1000

BEGIN

DELETE TOP (1000) #DeleteKeys

OUTPUT DELETED.ID

INTO #DeleteBatch

FROM #DeleteKeys

SET @RowCount = @@ROWCOUNT

DELETE

FROM TESTDATA

FROM TESTDATA TD

INNER JOIN #DeleteBatch DB

ON TD.ID = DB.ID

TRUNCATE TABLE #DeleteBatch

END

yuvipoy SSChampion Points: 10469 More actions · Answer 4

Hi EricEyster,

After having bacth operation is delete operation faster?

how much time you gained after rewritting your query.

Incase what if in lower version of sql server say sql server 2000 how can we manage it

EricEyster SSCrazy Points: 2974 More actions · Answer 5

yuvipoy (3/5/2014)
Hi EricEyster,
After having bacth operation is delete operation faster?
how much time you gained after rewritting your query.
Incase what if in lower version of sql server say sql server 2000 how can we manage it

You should see at least a 10x performance improvement versus single record delete. It will vary depending on how big you make the batch size for the delete. We have to decrease the size on some systems and that slows the delete. Others, we can go as high as 100,000 and get great performance.

If you are using SQL2000, then you do not have the output operator. I updated the code to work without it. It does require an extra step, but it should not drastically change the performance on a reasonably sized delete:

declare @pdate datetime

set @pdate = getdate()-7

--Start by getting the records to delete

SELECT ID

INTO #DeleteKeys

FROM TESTDATA WITH (NOLOCK)

WHERE RUN_START_TIME <=@pdate

--loop thru small batches to avoid excessive locking

create table #DeleteBatch (ID int)

declare @RowCount int

SET @RowCount = 1000

WHILE @RowCount = 1000

BEGIN

INSERT INTO #DeleteBatch

SELECT TOP (1000) ID

FROM #DeleteKeys

SET @RowCount = @@ROWCOUNT

DELETE #DeleteKeys

FROM #DeleteKeys DK

JOIN #DeleteBatch DB

ON DK.ID = DB.ID

DELETE

FROM TESTDATA

FROM TESTDATA TD

INNER JOIN #DeleteBatch DB

ON TD.ID = DB.ID

TRUNCATE TABLE #DeleteBatch

END

Steve Jones - SSC Editor SSC Guru Points: 736455 More actions · Answer 6

Sorry for the delay. I typically do what Eric has written here, even back in SQL 2000/7, but I might run 3-4 batches manually at different sizes and time them to understand what the best size might be for my system.

yuvipoy SSChampion Points: 10469 More actions · Answer 7

declare @pdate datetime

set @pdate = getdate()-7

--Start by getting the records to delete

SELECT ID

INTO #DeleteKeys

FROM TESTDATA WITH (NOLOCK)

WHERE RUN_START_TIME <=@pdate

If i have refer another table for delete then will this query will be better choice or

New Code :1

declare @pdate datetime

set @pdate = getdate()-7

--Start by getting the records to delete

SELECT a.ID

INTO #DeleteKeys

FROM TESTDATA as a WITH (NOLOCK)

WHERE a.id not in(Select b.id from another_table as b )

and a.RUN_START_TIME <=@pdate

this query

New Code :2

declare @pdate datetime

set @pdate = getdate()-7

--Start by getting the records to delete

SELECT a.ID

INTO #DeleteKeys

FROM TESTDATA as a WITH (NOLOCK)

WHERE not exists(Select b.id from another_table as b where a.id =b.id )

and a.RUN_START_TIME <=@pdate

Not exists vs Not in.

The id field in both the table is not null column.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 8

yuvipoy (3/17/2014)
Not exists vs Not in.
The id field in both the table is not null column.

No difference.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

yuvipoy SSChampion Points: 10469 More actions · Answer 9

Table A is having 100k records

Table B is having 8000k records

When i delete the TableB with TableA record exists the query is taking long time thats why i came across Not Exists Vs Not IN.

how about leftouter join?

Thanks!

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 10

yuvipoy (3/17/2014)
When i delete the TableB with TableA record exists the query is taking long time thats why i came across Not Exists Vs Not IN.
how about leftouter join?

Very slightly slower than not in/not exists in my tests.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

yuvipoy SSChampion Points: 10469 More actions · Answer 11

Is there a better way to do so to delete the records checking with another table.

yuvipoy SSChampion Points: 10469 More actions · Answer 12

You should see at least a 10x performance improvement versus single record delete. It will vary depending on how big you make the batch size for the delete. We have to decrease the size on some systems and that slows the delete. Others, we can go as high as 100,000 and get great performance.
If you are using SQL2000, then you do not have the output operator. I updated the code to work without it. It does require an extra step, but it should not drastically change the performance on a reasonably sized delete:
declare @pdate datetime
set @pdate = getdate()-7
--Start by getting the records to delete
SELECT ID
INTO #DeleteKeys
FROM TESTDATA WITH (NOLOCK)
WHERE RUN_START_TIME <=@pdate
--loop thru small batches to avoid excessive locking
create table #DeleteBatch (ID int)
declare @RowCount int
SET @RowCount = 1000
WHILE @RowCount = 1000
BEGIN
INSERT INTO #DeleteBatch
SELECT TOP (1000) ID
FROM #DeleteKeys
SET @RowCount = @@ROWCOUNT
DELETE #DeleteKeys
FROM #DeleteKeys DK
JOIN #DeleteBatch DB
ON DK.ID = DB.ID
DELETE
FROM TESTDATA
FROM TESTDATA TD
INNER JOIN #DeleteBatch DB
ON TD.ID = DB.ID
TRUNCATE TABLE #DeleteBatch
END

I tried your code but it is taking more time than the normal delete at present which i am doing.

My column is uniquidentifier not as INT u created in #Delete tables.

joining on uniquidentifier to delete some 50 k records out of 5 million records is taking more than 15 min with locking the main table.

uniquidentifier column is noncluster primary key

and where condition column is a clustered one

When i run SP_lock and see the SPID of Delete transaction session it is having more records EXT , TAB,RID with some 300 records for the transaction.

yuvipoy SSChampion Points: 10469 More actions · Answer 13

Delete operation takes more or less same time i dont see any improvement in the query when i have batch delete. To delete some 1 million records.

atually the approach is taking more time than the current one.