Data archive

  • Hi all,

    I have been assigned a task to remove all the data older then 2008 from around 10 tables from Production.

    records are close to 2-5 million each table.

    I need your expertise to find out the best way to delete such a big amount of data from production without hampering the performance.

    thanks

    Puneet

  • When I've been tasked with doing similar kinds of deletes, I try to spread the pain out over time, and hence spread it out over transaction logs as well so as to avoid unnecessary resource contentions. My way of doing this is to do something like

    SET ROWCOUNT 100000 (or some rowcount you will need to decide that you can delete safely without causing pain)

    DELETE FROM TABLE WHERE ins_dt < '2008-1-1'

    Then, set up a sql server job to kick off this script every 15-30 minutes (adjust time depending on how much time the query takes, and based on how much resources the query absorbs).

    While you first enable your jobs, watch it run a few times while your resources with perfmon and activity monitor, and adjust time interval/row count as necessary to mitigate pain.

    Another advantage of this approach is it batches out the deletes so you dont have any long-running transactions.

    If you provide more specifics about the tables I can provide a more detailed answer for you.

  • Something to keep in mind for those currently using 2008 or moving to it:

    Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in SQL Server 2008. Avoid using SET ROWCOUNT together with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. Also, for DELETE, INSERT, and UPDATE statements that currently use SET ROWCOUNT, we recommend that you rewrite them to use the TOP syntax.

  • upstart (10/15/2010)


    Something to keep in mind for those currently using 2008 or moving to it:

    Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in SQL Server 2008. Avoid using SET ROWCOUNT together with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. Also, for DELETE, INSERT, and UPDATE statements that currently use SET ROWCOUNT, we recommend that you rewrite them to use the TOP syntax.

    Looks like that info is a direct quote from http://msdn.microsoft.com/en-us/library/ms188774.aspx

    Thanks for pointing that out!

  • If you are using SQL 2005 or above this should work.

    NOTE: depending on how wide your rows are it might be necessary to tweak the amount deleted per iteration.

    WHILE 1=1

    BEGIN

    BEGIN TRAN

    DELETE TOP(20000) FROM dbo.<table_name>

    WHERE <date_column> < MYDATE

    IF @@rowcount < 20000

    BEGIN

    COMMIT TRAN

    BREAK;

    END

    COMMIT TRAN

    END

  • Make sure you have frequent transaction log backups running, or your .LDF will grow quite big.

    I do a similar process to those mentioned above. You can use a loop with your TOP(50000) type processing and add a delay in each loop so you don't hog the machine.

  • Thanks Guys for your reply this will really going to help.

    @getoffmyfoot: below is the detail of one of my biggest table.

    Total number of rows in table:831762784

    Rows that get deleted: 30%

    Table structure:

    PanelistIDint4

    ParentIDint4

    AttributeKeyIDint4

    AttributeValueIDint4

    ProfileTypeIDint4

    OpenAttributeValuenvarchar510

    Deletedbit1

    CreateDatedatetime8

    ModifyDatedatetime8

    ModifyUserIDint4

    i have transactional replication implemented on this, whould it be a good idea to stop it before deleting the records?

  • Do you want the deletes to happen at the subscribe too ? If you stop replication, and want the subscriber to be deleted too, then you'll have to reinitialize the subscription.

  • Tell me the best you think going to help. stopping replication and reinitilizing or just deleting in small chunk and keep replication to do it job?

  • I would try letting replication handle the deletes. If your replication job runs frequently, maybe it won't have a negative impact.

  • Thanks All for your reply, will let you know the result after completing this task.

  • Something nobody's mentioned is foreign key constraints. If another table has a dependency on the table you are deleting from, and your foreign keys don't have cascading deletes, you'll need to plan carefully the order in which you delete from your tables.

    John

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply