How do gradually trim a big table

  • [EDIT - I did not invent this code, just remembered that I don't remember where I found it :blush: but it proved useful to me.]

    I did this for a table with about 3 millions rows. By default the system did not trim old rows, so I had to bring it down to keeping only rows for the last 60 or 90 days. Probably not the most efficient, but it worked without blocking up things. Hope it helps.

    Before delete:

    Table NameRow Count

    YourTable1797867

    -- Turn off record messages.

    SET NOCOUNT ON

    -- Set up variables to store dates.

    DECLARE @StartDate datetime

    DECLARE @EndDate datetime

    -- Set start and end date values.

    SET @StartDate = '01/15/2015'

    SET @EndDate = '01/31/2015'

    -- Loop in batches of 1000 records to break up processing.

    WHILE EXISTS ( SELECT * FROM YourTable WHERE YourDateTime BETWEEN @StartDate AND @EndDate)

    BEGIN

    SET ROWCOUNT 1000

    DELETE FROM YourTable WHERE YourDateTime BETWEEN @StartDate AND @EndDate

    SET ROWCOUNT 0

    END

    -- Run select to make sure records have been deleted

    SELECT MIN(YourDateTime) FROM YourTable

    - webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • peter.row (5/26/2016)


    Eric M Russell (5/23/2016)


    If you're deleting a total of 100 million rows, then deleting in batches of 10,000 would be too small, because there will be 10,000 iterations of the loop. Each loop is a separate table scan. However many total rows you're intending to delete, I would set the batch size so that you're looping 10 or maybe 100 times at most. Also, if this is a staging database (not a production transactional database), then you may want to consider doing a CHECKPOINT and log truncation for each loop.

    As long as their ADD_DATE_TIME_STAMP field had an index on it there would not be a table scan and hence the looping would not be that bad.

    When it comes to bulk deletes and updates, transaction logging is what's truely consuming the bulk of the I/O and processing time, and having indexes in place means you are logging even more data. If the goal is to optimize the mass delete process, then you'll probably get the best result by first disabling or dropping all indexes on the table, which will minimize the volume of transaction logging. Next, perform mass delete in something like 10 iterations, and then once it's all complete then re-enable indexes.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (5/26/2016)


    peter.row (5/26/2016)


    Eric M Russell (5/23/2016)


    If you're deleting a total of 100 million rows, then deleting in batches of 10,000 would be too small, because there will be 10,000 iterations of the loop. Each loop is a separate table scan. However many total rows you're intending to delete, I would set the batch size so that you're looping 10 or maybe 100 times at most. Also, if this is a staging database (not a production transactional database), then you may want to consider doing a CHECKPOINT and log truncation for each loop.

    As long as their ADD_DATE_TIME_STAMP field had an index on it there would not be a table scan and hence the looping would not be that bad.

    When it comes to bulk deletes and updates, transaction logging is what's truely consuming the bulk of the I/O and processing time, and having indexes in place means you are logging even more data. If the goal is to optimize the mass delete process, then you'll probably get the best result by first disabling or dropping all indexes on the table, which will minimize the volume of transaction logging. Next, perform mass delete in something like 10 iterations, and then once it's all complete then re-enable indexes.

    If you are simply deleting data, as a reread of the OP indicates, and at this time you have 64+ million rows of data and need to retain approximately 5 million or so here is a possible solution for the first iteration. Please note there are some caveats to this depending on the data involved so test it in a test environment.

    1) Use SELECT * INTO to create a table in the current database for the data to be kept.

    2) Create all the indexes on this new table that exists on the original table.

    3) Truncate the original table.

    4) Use ALTER TABLE to switch the new table back to the old table.

    In a test environment you may encounter things that you need to account for to make this work. Once you have this major purge done, then it is a matter of keeping things clean on a daily basis.

Viewing 3 posts - 16 through 17 (of 17 total)

You must be logged in to reply to this topic. Login to reply