Blog Post

Gradually Deleting Data in SQL Server

,

If you have a situation where you have a very large table in SQL Server, where you need to periodically delete tens of millions of rows of data, there are several ways to do it.

If you have a maintenance window (or your database is not required to be available 24 x 7 x 365), you can (and probably should) just delete all of the rows in one shot, using a set based operation. This would be the quickest way to delete a large number of rows, but you will probably end up getting lock escalation to a table lock, which essentially makes the table unavailable during the delete.

Another issue to consider is whether you have transactional replication on that table, and/or you have database mirroring in place on the database. Deleting a large number of rows from a table will generate a lot of log activity, which may cause transactional replication or database mirroring to fall behind. This of course depends on your hardware and network infrastructure. You also want to keep an eye on your transaction log, to make sure it is not filling up and having auto-grow kick in.

A safer, but much more time consuming way to delete millions of rows is to use some sort of looping mechanism, where you gradually delete a fairly small number of rows in a loop, to slowly nibble away at the table. This will take much longer than a set based operation, but, if done properly, will not cause concurrency problems, and will not overwhelm transactional replication or database mirroring.

At any rate, I recently faced a situation like this, so I decided to show one method to deal with it pretty easily. In this case, we want to delete every row that has a TransactionId lower than a certain number. We are going to delete 500 random rows that qualify in each delete, and loop 5000 times, with a slight delay between each delete. This will delete 2.5 million rows each time the query is run.  You can obviously adjust these numbers and the delay time so that it works best in your environment. You could also wrap this into a stored procedure.

-- Gradual Delete Sample
-- Glenn Berry 
-- August 2011
-- http://sqlserverperformance.wordpress.com/
-- Twitter: GlennAlanBerry
SET NOCOUNT ON;
-- Check space used by table before we begin
EXEC sp_spaceused N'dbo.BigLoggingTable';
-- Declare local variables
DECLARE @NumberOfLoops AS int;
SET @NumberOfLoops = 5000;
DECLARE @CurrentLoop AS int;
SET @CurrentLoop = 0
DECLARE @DeleteSize bigint;
SET @DeleteSize = 500;
DECLARE @HighWaterMark bigint;
SET @HighWaterMark = 382989078;
WHILE @CurrentLoop < @NumberOfLoops
    BEGIN
        -- Just delete any xxx rows that are below the HighWaterMark
        DELETE
        FROM dbo.BigLoggingTable
        WHERE TransactionId IN
            (SELECT TOP(@DeleteSize) TransactionId
             FROM dbo.BigLoggingTable WITH (NOLOCK)
             WHERE TransactionId < @HighWaterMark);
        WAITFOR DELAY '00:00:00:50';
        SET @CurrentLoop = @CurrentLoop + 1;
    END
-- Check space used by table after we are done    
EXEC sp_spaceused N'dbo.BigLoggingTable';

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating