October 15, 2010 at 10:23 am
Hi all,
I have been assigned a task to remove all the data older then 2008 from around 10 tables from Production.
records are close to 2-5 million each table.
I need your expertise to find out the best way to delete such a big amount of data from production without hampering the performance.
thanks
Puneet
October 15, 2010 at 11:28 am
When I've been tasked with doing similar kinds of deletes, I try to spread the pain out over time, and hence spread it out over transaction logs as well so as to avoid unnecessary resource contentions. My way of doing this is to do something like
SET ROWCOUNT 100000 (or some rowcount you will need to decide that you can delete safely without causing pain)
DELETE FROM TABLE WHERE ins_dt < '2008-1-1'
Then, set up a sql server job to kick off this script every 15-30 minutes (adjust time depending on how much time the query takes, and based on how much resources the query absorbs).
While you first enable your jobs, watch it run a few times while your resources with perfmon and activity monitor, and adjust time interval/row count as necessary to mitigate pain.
Another advantage of this approach is it batches out the deletes so you dont have any long-running transactions.
If you provide more specifics about the tables I can provide a more detailed answer for you.
October 15, 2010 at 12:55 pm
Something to keep in mind for those currently using 2008 or moving to it:
Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in SQL Server 2008. Avoid using SET ROWCOUNT together with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. Also, for DELETE, INSERT, and UPDATE statements that currently use SET ROWCOUNT, we recommend that you rewrite them to use the TOP syntax.
October 15, 2010 at 1:12 pm
upstart (10/15/2010)
Something to keep in mind for those currently using 2008 or moving to it:Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in SQL Server 2008. Avoid using SET ROWCOUNT together with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. Also, for DELETE, INSERT, and UPDATE statements that currently use SET ROWCOUNT, we recommend that you rewrite them to use the TOP syntax.
Looks like that info is a direct quote from http://msdn.microsoft.com/en-us/library/ms188774.aspx
Thanks for pointing that out!
October 15, 2010 at 1:19 pm
If you are using SQL 2005 or above this should work.
NOTE: depending on how wide your rows are it might be necessary to tweak the amount deleted per iteration.
WHILE 1=1
BEGIN
BEGIN TRAN
DELETE TOP(20000) FROM dbo.<table_name>
WHERE <date_column> < MYDATE
IF @@rowcount < 20000
BEGIN
COMMIT TRAN
BREAK;
END
COMMIT TRAN
END
October 15, 2010 at 1:57 pm
Make sure you have frequent transaction log backups running, or your .LDF will grow quite big.
I do a similar process to those mentioned above. You can use a loop with your TOP(50000) type processing and add a delay in each loop so you don't hog the machine.
October 15, 2010 at 2:38 pm
Thanks Guys for your reply this will really going to help.
@getoffmyfoot: below is the detail of one of my biggest table.
Total number of rows in table:831762784
Rows that get deleted: 30%
Table structure:
PanelistIDint4
ParentIDint4
AttributeKeyIDint4
AttributeValueIDint4
ProfileTypeIDint4
OpenAttributeValuenvarchar510
Deletedbit1
CreateDatedatetime8
ModifyDatedatetime8
ModifyUserIDint4
i have transactional replication implemented on this, whould it be a good idea to stop it before deleting the records?
October 15, 2010 at 5:56 pm
Do you want the deletes to happen at the subscribe too ? If you stop replication, and want the subscriber to be deleted too, then you'll have to reinitialize the subscription.
October 18, 2010 at 2:55 am
Tell me the best you think going to help. stopping replication and reinitilizing or just deleting in small chunk and keep replication to do it job?
October 18, 2010 at 4:31 am
I would try letting replication handle the deletes. If your replication job runs frequently, maybe it won't have a negative impact.
October 18, 2010 at 4:35 am
Thanks All for your reply, will let you know the result after completing this task.
October 18, 2010 at 4:59 am
Something nobody's mentioned is foreign key constraints. If another table has a dependency on the table you are deleting from, and your foreign keys don't have cascading deletes, you'll need to plan carefully the order in which you delete from your tables.
John
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply