September 14, 2009 at 11:16 pm
Comments posted to this topic are about the item Deleting Large Number of Records
September 15, 2009 at 1:54 am
Nice one Lynn 🙂
I've found that for deleting a sub-section of data from a much larger table (for example - can we delete history from transactions table that is over x years old) it can help to create a filtered index under SS2008 for that... You can create the index on-line, and it significantly reduces the seek time if you're deleting a sub-set.
Of course, if the column is already indexed - no probs! 😀
Atlantis Interactive - SQL Server Tools
My blog[/url]
Why I wrote a sql query analyzer clone
September 15, 2009 at 5:28 am
Hi Lynn,
Good article concept, this is a problem that I've had to deal with many times, and I suspect most DBAs do at one point or another.
A couple concerns though:
1) Would those transaction log backup files really be manageable? If you chose to make the batch size smaller (say 1,000 instead of 10,000) to minimize duration of locks (esp. if the deletion operations are expensive, for example containing substantial BLOB columns), then you rapidly end up with 2,000 transaction log files for your 2,000,000 records... Imagine having to recover those! :ermm: I guess it's a minor point, but I suspect in many (most?) cases the transaction log backup logic should be disconnected from the deletion loop itself.
2) Another point is the deletion mechanism - in SQL Server 2000 I have found the SET ROWCOUNT method to be quite painful on large tables. I've not looked deeply into it, but in the environments I've encountered it's MUCH more efficient to identify an indexed numeric key (not necessarily unique, as long as it's smoothly distributed), ideally with a clustered index, and loop through that instead. I imagine that there's a constant cost to evaluating the WHERE clause, even when the ROWCOUNT is only going to act on a subset of the matching records.
Am I alone in this? Has anyone else had to come up with different ways due to performance issues with the SET ROWCOUNT method?
Tao
---
Edited: fixed mistake: PK -> Clustered index
http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.
September 15, 2009 at 5:43 am
Matt Whitfield (9/15/2009)
Nice one Lynn 🙂I've found that for deleting a sub-section of data from a much larger table (for example - can we delete history from transactions table that is over x years old) it can help to create a filtered index under SS2008 for that... You can create the index on-line, and it significantly reduces the seek time if you're deleting a sub-set.
Of course, if the column is already indexed - no probs! 😀
Thank you for the info on SQL Server 2008. At this time I unfortunately haven't had the oportunity to work with 2008 so I'll have to look into that when I do and see what changes that may result in with this method of deleting records.
September 15, 2009 at 6:02 am
Tao Klerks (9/15/2009)
Hi Lynn,Good article concept, this is a problem that I've had to deal with many times, and I suspect most DBAs do at one point or another.
A couple concerns though:
Okay, let's take a look.
1) Would those transaction log backup files really be manageable? If you chose to make the batch size smaller (say 1,000 instead of 10,000) to minimize duration of locks (esp. if the deletion operations are expensive, for example containing substantial BLOB columns), then you rapidly end up with 2,000 transaction log files for your 2,000,000 records... Imagine having to recover those! :ermm: I guess it's a minor point, but I suspect in many (most?) cases the transaction log backup logic should be disconnected from the deletion loop itself.
There will always be a pain point some where. Part of the issue I was trying to deal with here is controlling the size of the transaction log. This is one of the areas we see frequently on the forums where the OP is concerned with the transaction log growing excessively during the deletion process, taking much of the available disk space regardless of recovery model in use.
When the database is using the SIMPLE recovery model, then all that is needed is to batch the deletes. By doing this, as the transaction log is checkpointed the space will be reused in the transaction log keeping the file under control.
Unfortunately, if the database is using either BULK-LOGGED or FULL recovery model, you actually need to run periodic transaction log backups during the delete process. This is what my code allows one to do. You have to decide how big the batch should be for each delete, how large the transaction log should be allowed to grow (if necessary), how many transaction log files are you going to create. It is possible to further modify this code fairly easily to also incorprate periodic differential backups as well. If you want a differential backup after every 100 transaction log backups, it wouldn't be hard to do that with this code.
2) Another point is the deletion mechanism - in SQL Server 2000 I have found the SET ROWCOUNT method to be quite painful on large tables. I've not looked deeply into it, but in the environments I've encountered it's MUCH more efficient to identify an indexed numeric key (not necessarily unique, as long as it's smoothly distributed), ideally with a clustered index, and loop through that instead. I imagine that there's a constant cost to evaluating the WHERE clause, even when the ROWCOUNT is only going to act on a subset of the matching records.
I can't really address this particular issue regarding SET ROWCOUNT in SQL Server 2000 has I haven't really had to do this myself. I'm just trying to provide people with a method of accomplishing deleting a large number of records while trying to maintain the size the transaction log and keep the table relatively open for normal processing by hopefully keeping SQL Server 2000 from putting and holding a table lock. As I have always said on these forums, when you are using anything suggested on any forum be sure to test, test, and test some more.
This is just one option I am offering, it may not always be the best. It may require some tweaks and modifications depending on the environment it is used in. If there is a suitable index as you suggest, it would be easy to modify this code to force it to use that index.
September 15, 2009 at 7:47 am
I'm not sure I agree that SET ROWCOUNT is the issue in SQL 2000. I've used a similar method years ago in 2000 to delete large numbers of records and it seemed to work fine for me. Granted, large is relative and what is large today might be substantially larger than five years ago, but I'd be interested to know specifically if it's changed the execution plan or where the issue was.
September 15, 2009 at 7:51 am
Thanks, Lynn! Here's a possible minor tweak that I think makes one less trip through the loop (unless the total number of rows we're deleting is an exact multiple of the batchsize):
Instead of:
while @batchsize 0
If we say:
declare @orgBatchSize bigint
set @orgBatchSize = @batchsize
while @batchsize = @orgBatchSize
This exits the loop as soon as the number of deleted rows is less than the batchsize (which will only happen when there's nothing left to delete). This could be a time-saver if the Where clause on the Delete is slow.
September 15, 2009 at 7:57 am
Lynn ... great article!
Any thoughts on how (or even if) this approach should be modified for bulk deletions on a SQL 2005 mirror configuration with log-shipping in place?
JC
September 15, 2009 at 7:59 am
Steve
Sorry - would have quoted but the forum was insistent on quoting another post when I clicked quote 😀
I would put money on the fact that when you used SET ROWCOUNT one of the two following possibilities was true:
1) There was no where clause on the statement
2) The where clause was easily satisfied by an index
...
Atlantis Interactive - SQL Server Tools
My blog[/url]
Why I wrote a sql query analyzer clone
September 15, 2009 at 8:15 am
Just... keep simple!
If you need to delete a large number of rows and the scenario don't need to keep the database log.
In this case I want to delete all the rows between 2009/01/01 and 2009/10/23, about 100000000 rows.
delete top (10000) from myTable where colDate between '20090101' and '20091023'
backup log myDatabase with no_log
go 10000
What you think?
P.S. The recovery model is SIMPLE.
September 15, 2009 at 8:24 am
I use this ( this is home cooked)
create proc temp_slowDelete
(
@table varchar(255),
@where varchar(1024)
)
as
/* Slow transaction friendly delete */
declare @ParmDefinition nvarchar(1024)
declare @deleteStatment nvarchar(2048)
declare @statment nvarchar(2048)
declare @total int
declare @rowcount int
Select @deleteStatment = ''delete from '' + @table + '' where '' + @where
set rowcount 250000
set nocount on
RAISERROR (''Counting rows to delete...'',0,1) WITH NOWAIT
SET @ParmDefinition = N''@CountOUT int OUTPUT'';
select @statment=''select @CountOUT = count(*) from '' + @table + '' where '' + @where
exec sp_executesql @statment,@ParmDefinition,@CountOUT=@total OUTPUT;
if (@total = 0)
begin
print ''Nothing todo :)''
return
end
RAISERROR (''%d to delete'',0,1,@total) WITH NOWAIT
exec sp_executesql @deleteStatment
set @rowcount = @@ROWCOUNT
while (@ROWCOUNT > 0)
begin
select @total = @total - @ROWCOUNT
RAISERROR (''Deleted %d, %d left'',0,1,@rowcount,@total) WITH NOWAIT
exec sp_executesql @deleteStatment
set @rowcount = @@ROWCOUNT
end'
September 15, 2009 at 8:27 am
Ric Sierra (9/15/2009)
Just... keep simple!If you need to delete a large number of rows and the scenario don't need to keep the database log.
In this case I want to delete all the rows between 2009/01/01 and 2009/10/23, about 100000000 rows.
delete top (10000) from myTable where colDate between '20090101' and '20091023'
backup log myDatabase with no_log
go 10000
What you think?
P.S. The recovery model is SIMPLE.
If your database is using the SIMPLE recovery model you can't do BACKUP LOG, nor is it needed. The only other issue I'd have with your simple process is why run it more than needed. Using the GO 10000, your delete statement is going to run 10000 times regardless of how many times it really needs to run.
If your database is using BULK-LOGGED or FULL recovery model, then you do need the transaction log backups in case a recovery is needed. The code I have provided can be incorporated into a stored procedure where periodic deletes of large nmber of records need to be accomplished.
September 15, 2009 at 8:32 am
Keep in mind that even with simple mode, the log can grow. If you don't break it into batches then each delete is logged, which can be a lot of data. It's only when a transaction is committed can the log records be marked as "deleted" during a checkpoint.
September 15, 2009 at 8:42 am
Thanks, Lynn. Most timely, since I'm in the middle of deleting something like 15 million rows from a production database that is designated high availability in our organization. Very helpful!
September 15, 2009 at 8:47 am
Lynn Pettis (9/15/2009)
Ric Sierra (9/15/2009)
Just... keep simple!If you need to delete a large number of rows and the scenario don't need to keep the database log.
In this case I want to delete all the rows between 2009/01/01 and 2009/10/23, about 100000000 rows.
delete top (10000) from myTable where colDate between '20090101' and '20091023'
backup log myDatabase with no_log
go 10000
What you think?
P.S. The recovery model is SIMPLE.
If your database is using the SIMPLE recovery model you can't do BACKUP LOG, nor is it needed. The only other issue I'd have with your simple process is why run it more than needed. Using the GO 10000, your delete statement is going to run 10000 times regardless of how many times it really needs to run.
If your database is using BULK-LOGGED or FULL recovery model, then you do need the transaction log backups in case a recovery is needed. The code I have provided can be incorporated into a stored procedure where periodic deletes of large nmber of records need to be accomplished.
In this scenario I need to delete 100,000,000 rows, by each batch I delete 10,000 rows but to ensure to delete all the rows I need to run 10,000 times the batch.
10,000 x 10,000 = 100,000,000
Viewing 15 posts - 1 through 15 (of 72 total)
You must be logged in to reply to this topic. Login to reply