Deleting data from large tables.

Question

Deleting data from large tables.

Xiver

Newbie

Points: 9
More actions
March 3, 2009 at 3:34 pm

#131486

I have a 5 column table FSS_Data (fssWellID int, fssIndex int, fssTimeStamp datetime, fssType int, fssValue real) where the primary keys are WellID, Index, and Timestamp. I have separate ascending indices on fssTimeStamp and fssValue. The table is used to store 30 days worth of data. There are approximately 300 million rows of data for that period of time. The current delete operation that I'm using looks like the following and it run on a configurable time interval (currently every 10 minutes):
DELETE TOP (1000) FROM FSS_Data WHERE fssTimeStamp < @dtPurgeDate
WHILE @@ROWCOUNT > 0
BEGIN
DELETE TOP (1000) FROM FSS_Data WHERE fssTimeStamp < @dtPurgeDate
END
Using one delete statement takes too long and makes the transaction log grow to large. I was thinking about disabling the indices for the table and rebuilding them after the delete statement, but I don't know what the overhead of rebuilding the indices are. Does anyone have any suggestions about how I could improve the performance of these delete statements? I would also like to know how to calculate the overhead of rebuilding table indices.
TLDR: What is the overhead of rebuilding table indices? What are the real advantages of disabling indices during a delete?

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004365 More actions · Answer 1

So you're trying to delete 300/30 or about 10 million rows per day? If you only delete 1000 rows every 10 minutes, that means that you would need 10,000 ten minute intervals to delete what you need to... kind of a loosing battle there.

Dropping and rebuilding indexes on such a large table would probably be prohibitively long and they're also logged... so nothing gained by doing that. The choice of clustered index (I assume the PK you stated is clustered) isn't going to help a bit. Depending on the "spread", you could be locking a lot of the table making it nearly unusable during the deletes.

My recommendation would be to see how many rows you can delete in a second with TABLOCKX on the table. Then, set you delete rowcount to that and increase the frequency of the deletes to once every 5 seconds. Of course, you'll need to do a little calculation and see if it will ever catch up to what needs to be deleted at that rate.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Michael Valentine Jones SSC Guru Points: 64818 More actions · Answer 2

You are adding 10,000,000 rows a day, about 116 rows per second, so to keep up you will have to be able to delete at at least that rate.

You should try some medium size deletes, say 50,000 at a time, and see how long they take to run. It would be best if you deleted in primary key sequence. If you could delete 50,000 rows at a time, you could delete 10,000,000 rows in 200 iterations of the loop.

I think you are going to run into problems fairly quickly with severe index fragmentation. If you see your deletes starting to take longer and longer, you will nedd to stop to re-index or defrag.

You should look into a solution using partitioned tables where you can just remove data in bulk by day, week, or month. It will be easier to manage than trying to keep up with high volume deletes, and will use less resources.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 3

Here is some code you may want to try in a test environment. You will need to finish the code as the BACKUP LOG statement is incomplete.

declare @RowsDeleted int,

@BatchSize int;

set @BatchSize = 10000;

set @RowsDeleted = @BatchSize;

while @RowsDelete <> 0

begin

with FSSDataDelete (

fssWellId,

fssIndex,

fssTimeStamp

) as (

select top (@BatchSize)

fssWellId,

fssIndex,

fssTimeStamp

from

dbo.FSS_Data

where

fssTimeStamp < @dtPurgeDate

order by

fssWellId,

fssIndex,

fssTimeStamp

)

delete from

dbo.FSS_Data

from

dbo.FSS_Data fssd

inner join FSSDataDelete fssdd

on (fssd.fssWellId = fssdd.fssWellId

and fssd.fssIndex = fssdd.fssIndex

and fssd.fssTimeStamp = fssdd.fssTimeStamp);

set @RowsDeleted = @@rowcount;

backup log {your_db_name] to disk = 'path to file'; -- path to fie should be a variable, with a dynamic filename

end

Rudyx - the Doctor SSC-Forever Points: 43695 More actions · Answer 4

Partitioning anybody ???

RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 5

rudy komacsar (3/5/2009)
Partitioning anybody ???

Would be a good way to go, but the how will depend on what edition of SQL Server 2005 is being used. If Enterprise Edition is being used, then yes, definately move to partitioned tables. If using Standard Edition, then your choice then is to use a partitioned view which would be more difficult to maintain.

Rudyx - the Doctor SSC-Forever Points: 43695 More actions · Answer 6

Point taken Lynn. However given the size and scale of the issue ,the minor inconveniences that making partitioning work in standard edition would probably far outweigh the potential fallout from user backlash or performance fallout.

It all boils down to design, planning, planning, planning and testing - been there and done that already.

RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 7

I'm not knocking using a partitioned view. Your right, it takes a lot more planning and testing but can be accomplished.

It really comes down to what the OP can sell to "the powers that be".

Xiver Newbie Points: 9 More actions · Answer 8

You guys are great. I've been out of town on personal business, but I was very happy to read all of the replies when I got back. I did some reading based on your suggestions and I think I'm going to try to setup partitioning. I'm going to do some experiments this week to determine what the trade offs are going to be and I'll post them here when I'm done. Thanks again for all of the suggestions.