July 1, 2010 at 10:51 am
Performance being the main criteria. Which is the best way to delete 200000 records daily from table with 5 - 7 million records...
1. Directly use a delete statement
2. create a temp table insert 5-7 million records into temp table, truncate original table and insert back the necessary records..
Any other approach ?
Note: The ID column is an identity column with a clustered index
July 1, 2010 at 11:04 am
With the clustered index on ID I would propose the following:
select into a temp table the IDs that qualify for being deleted (use NOLOCK if you can)
delete from the main table by joining to your temp table. Depending upon how the candidate IDs are dispersed throughout the table and/or the size of the rows you may need limit the deletes to smaller batches using top N in a while loop.
The probability of survival is inversely proportional to the angle of arrival.
July 1, 2010 at 11:09 am
sturner,
how do i limit them. can you please provide me an example.
Thanks
July 1, 2010 at 11:30 am
for limiting rows you have two options. If the IDs to be deleted are pretty well grouped together and mostly contiguous you can divide up the deletes by ID ranges. Or in the simplest case you can just limit each delete to a fixed number of rows and keep deleting until they are all gone.
just as an example:
declare @nrows int
select ID into #deleteList from BIGTABLE where ... (where clause to find candidates for deletion).
set @nrows = @@ROWCOUNT
while @nrows > 0
begin
delete top(10000) A
from BIGTABLE A
JOIN #deleteList B on A.ID = B.ID
set @nrows = @@ROWCOUNT
end
You could also get the min and max ID from the #deleteList and divvy up the deletes evenly based on the dispersion of IDs... The logic would be a bit more complicated but would allow you another level of control over the deletes using a where clause versus top(n)
The probability of survival is inversely proportional to the angle of arrival.
July 1, 2010 at 11:33 am
I will definitly try it out. thanks
July 1, 2010 at 12:14 pm
I like sturner's code. The one thing I might change is adding in a delay between loops, especially if this might take enough time to cross a log backup. Can help limit log growth.
July 1, 2010 at 12:44 pm
Good point Steve, I should have mentioned that as well. If your goal is to minimize the size of the transaction log then pausing long enough between iterations to allow the log to be backed up will do the trick.
Even if that is not an issue, breaking it up in smaller batches will reduce the size of the footprint and reduce blocking and/or lock escalation.
The probability of survival is inversely proportional to the angle of arrival.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply