scrub data in 5 minute intervals

  • I have a situation where I have table with over a billion records and needs to be scrubbed. Table does have a field with date time timestamp. I have been deleting rows from the table using the script below which basically provides me delete statements by date for records older than 90 days.

    But now on each day row count is over 30 million rows and it takes forever to delete by date and transaction log becomes humongous.

    So I would like to scrub it in 5 minute intervals instead of daily for records older than 90 days. Even in 5 minute intervals the record count tends to be around a million. This will keep the delete slice small enough to help not a gigantic transaction log.

    I was wondering if someone has any thoughts on how to accomplish this?

    declare @startdate Datetime

    declare @enddate Datetime

    set @startdate = getdate()-480

    set @enddate = getdate()-90

    --set @vStart = select convert(varchar,@startdate, 102)

    print @startdate

    print @enddate

    WHILE (@startdate < @enddate)

    BEGIN

    print 'delete from vending where DetectedDate < ''' + CONVERT(varchar(10), @startdate, 101) +''''

    set @startdate = @startdate+1

    END

    I am hoping to modify the script above to produce a script with statements like this for a window between last 90 and 120 days:

    delete from vending where DetectedDate <'6/15/2015 8:25:00 PM'

    go

    delete from vending where DetectedDate <'6/15/2015 8:30:00 PM'

    go

    delete from vending where DetectedDate <'6/15/2015 8:35:00 PM'

    go

    thanks in advance

  • It's best to delete by the clustering key.

    Therefore, what is the table clustered on? If it's clustered on identity -- gack! -- but you have an index on datetime, you can do the conversion yourself from datetime to id and then delete by id.

    If possible, put the db in bulk-logged or even simple recovery model while you're doing the deletes.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • sqlstar2011 (11/2/2015)


    I have a situation where I have table with over a billion records and needs to be scrubbed. Table does have a field with date time timestamp. I have been deleting rows from the table using the script below which basically provides me delete statements by date for records older than 90 days.

    But now on each day row count is over 30 million rows and it takes forever to delete by date and transaction log becomes humongous.

    So I would like to scrub it in 5 minute intervals instead of daily for records older than 90 days. Even in 5 minute intervals the record count tends to be around a million. This will keep the delete slice small enough to help not a gigantic transaction log.

    I was wondering if someone has any thoughts on how to accomplish this?

    declare @startdate Datetime

    declare @enddate Datetime

    set @startdate = getdate()-480

    set @enddate = getdate()-90

    --set @vStart = select convert(varchar,@startdate, 102)

    print @startdate

    print @enddate

    WHILE (@startdate < @enddate)

    BEGIN

    print 'delete from vending where DetectedDate < ''' + CONVERT(varchar(10), @startdate, 101) +''''

    set @startdate = @startdate+1

    END

    I am hoping to modify the script above to produce a script with statements like this for a window between last 90 and 120 days:

    delete from vending where DetectedDate <'6/15/2015 8:25:00 PM'

    go

    delete from vending where DetectedDate <'6/15/2015 8:30:00 PM'

    go

    delete from vending where DetectedDate <'6/15/2015 8:35:00 PM'

    go

    thanks in advance

    1. Which Edition of SQL Server do you have?

    2. How much space does the table currently occupy?

    3. How much disk free space do you have?

    4. Can you provide the DDL for the table in question including all constraints, indexes, and FK's that may point to the table?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I deal with a lot of large data tables and have found this article by Aaron Bertrand one of the best I have seen. It shows the research and results clearly. I highly recommend his breaking up large deletes methodology.

    https://sqlperformance.com/2013/03/io-subsystem/chunk-deletes

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply