July 11, 2018 at 1:09 am
Hey guys,
I can't find anything specific in terms of batch deleting and inserting.
How I personally deal with batches, I hope it's correct. Imagine we have the following scenario.
There is a table with a total of twenty-five million records and you need to delete one million records based on criteria. I will do something like:
--This will return a total of twenty-five million records but I will limit it to 10 loops so I don't have to wait like fifteen minutes per attempt
While AffectedRows > 0
begin transaction
set rowcount 50 000
delete from table where date between 6/11/2018 and 7/11/2018(mm/dd/yyyy)
set AffectedRows = @@rowcount
commit transaction
I'll write down the time for execution of three attempts and calculate the average time, I/O and calculate an average I/O and increase the rowcount by percents. Depending on how it goes I won't play too much with the amount as it usually is around 130 - 150 000. From what I've read if the deletion amount is more than twenty-five percent of the table, I should disable all indexes and rebuild them afterward. I don't monitor the transaction log as the batches are small.
I will happily accept any bits of advice.
Kind Regards,
Charlie
July 11, 2018 at 4:26 am
What is the clustered index on the table?
_____________
Code for TallyGenerator
July 11, 2018 at 4:46 am
The current table on which I'm working on has only a Primary Key. I just started second guessing my self if my methods for batch determination are correct.
July 11, 2018 at 5:46 am
You better make [date] a key for a clustered index on the table.
Otherwise:
DECLARE @FromID int, @ToID int
SELECT @FromID = MIN(ID), @ToID = MAX(ID)
FROM (
SELECT TOP 50000 ID
FROM Table
WHERE date between 6/11/2018 and 7/11/2018
ORDER BY ID
) DT
DELETE T
FROM Table T
WHERE ID BETWEEN @FromID AND @ToID
AND date between 6/11/2018 and 7/11/2018
_____________
Code for TallyGenerator
July 12, 2018 at 12:14 pm
The size of the batch should be a size that allows the process to complete in a minimal time during each iteration to avoid blocking other processes. You could also include a wait in each loop to allow other processes time to execute - and if your database is in simple recovery, perform a checkpoint after each loop.
You don't need the explicit begin/commit transaction since you are only affecting a single table - an implicit transaction will be applied anyways.
You also should not be using SET ROWCOUNT - you can use TOP in the delete statement itself.
DELETE TOP(@batchSize)
FROM Table
WHERE date BETWEEN '2018-06-11' AND '2018-07-11'
If you do not have an index on the 'date' column - you definitely will benefit by creating one. If you can make that a clustered index on that table - with a non-clustered index to support the PK it would probably improve the delete performance quite a bit.
If this is a regular occurrence - say an archive/purge process - you should definitely consider a partitioned table. It would be much easier to manage in the long term.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply