Assistance Regarding Big Batches

  • Hey guys,
    I can't find anything specific in terms of batch deleting and inserting. 
    How I personally deal with batches, I hope it's correct. Imagine we have the following scenario. 
    There is a table with a total of twenty-five million records and you need to delete one million records based on criteria. I will do something like:
    --This will return a total of twenty-five million records but I will limit it to 10 loops so I don't have to wait like fifteen minutes per attempt

    While AffectedRows > 0
    begin transaction 
    set rowcount 50 000
    delete from table where date between 6/11/2018 and 7/11/2018(mm/dd/yyyy)
    set AffectedRows = @@rowcount
    commit transaction 
    I'll write down the time for execution of three attempts and calculate the average time, I/O and calculate an average I/O and increase the rowcount by percents. Depending on how it goes I won't play too much with the amount as it usually is around 130 - 150 000. From what I've read if the deletion amount is more than twenty-five percent of the table, I should disable all indexes and rebuild them afterward. I don't monitor the transaction log as the batches are small.

    I will happily accept any bits of advice.

    Kind Regards,
    Charlie

  • What is the clustered index on the table?

    _____________
    Code for TallyGenerator

  • The current table on which I'm working on has only a Primary Key. I just started second guessing my self if my methods for batch determination are correct.

  • You better make [date] a key for a clustered index on the table.

    Otherwise:
    DECLARE @FromID int, @ToID int
    SELECT @FromID = MIN(ID), @ToID = MAX(ID)
    FROM (
    SELECT TOP 50000 ID
    FROM Table
    WHERE date between 6/11/2018 and 7/11/2018
    ORDER BY ID
    ) DT

    DELETE T
    FROM Table T
    WHERE ID BETWEEN @FromID AND @ToID
    AND date between 6/11/2018 and 7/11/2018

    _____________
    Code for TallyGenerator

  • The size of the batch should be a size that allows the process to complete in a minimal time during each iteration to avoid blocking other processes.  You could also include a wait in each loop to allow other processes time to execute - and if your database is in simple recovery, perform a checkpoint after each loop.

    You don't need the explicit begin/commit transaction since you are only affecting a single table - an implicit transaction will be applied anyways.

    You also should not be using SET ROWCOUNT - you can use TOP in the delete statement itself.

    DELETE TOP(@batchSize)
    FROM Table
    WHERE date BETWEEN '2018-06-11' AND '2018-07-11'

    If you do not have an index on the 'date' column - you definitely will benefit by creating one.  If you can make that a clustered index on that table - with a non-clustered index to support the PK it would probably improve the delete performance quite a bit.

    If this is a regular occurrence - say an archive/purge process - you should definitely consider a partitioned table.  It would be much easier to manage in the long term.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply