March 3, 2009 at 3:34 pm
I have a 5 column table FSS_Data (fssWellID int, fssIndex int, fssTimeStamp datetime, fssType int, fssValue real) where the primary keys are WellID, Index, and Timestamp. I have separate ascending indices on fssTimeStamp and fssValue. The table is used to store 30 days worth of data. There are approximately 300 million rows of data for that period of time. The current delete operation that I'm using looks like the following and it run on a configurable time interval (currently every 10 minutes):
DELETE TOP (1000) FROM FSS_Data WHERE fssTimeStamp < @dtPurgeDate
WHILE @@ROWCOUNT > 0
BEGIN
DELETE TOP (1000) FROM FSS_Data WHERE fssTimeStamp < @dtPurgeDate
END
Using one delete statement takes too long and makes the transaction log grow to large. I was thinking about disabling the indices for the table and rebuilding them after the delete statement, but I don't know what the overhead of rebuilding the indices are. Does anyone have any suggestions about how I could improve the performance of these delete statements? I would also like to know how to calculate the overhead of rebuilding table indices.
TLDR: What is the overhead of rebuilding table indices? What are the real advantages of disabling indices during a delete?
March 3, 2009 at 7:31 pm
So you're trying to delete 300/30 or about 10 million rows per day? If you only delete 1000 rows every 10 minutes, that means that you would need 10,000 ten minute intervals to delete what you need to... kind of a loosing battle there.
Dropping and rebuilding indexes on such a large table would probably be prohibitively long and they're also logged... so nothing gained by doing that. The choice of clustered index (I assume the PK you stated is clustered) isn't going to help a bit. Depending on the "spread", you could be locking a lot of the table making it nearly unusable during the deletes.
My recommendation would be to see how many rows you can delete in a second with TABLOCKX on the table. Then, set you delete rowcount to that and increase the frequency of the deletes to once every 5 seconds. Of course, you'll need to do a little calculation and see if it will ever catch up to what needs to be deleted at that rate.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 3, 2009 at 9:05 pm
You are adding 10,000,000 rows a day, about 116 rows per second, so to keep up you will have to be able to delete at at least that rate.
You should try some medium size deletes, say 50,000 at a time, and see how long they take to run. It would be best if you deleted in primary key sequence. If you could delete 50,000 rows at a time, you could delete 10,000,000 rows in 200 iterations of the loop.
I think you are going to run into problems fairly quickly with severe index fragmentation. If you see your deletes starting to take longer and longer, you will nedd to stop to re-index or defrag.
You should look into a solution using partitioned tables where you can just remove data in bulk by day, week, or month. It will be easier to manage than trying to keep up with high volume deletes, and will use less resources.
March 3, 2009 at 10:14 pm
Here is some code you may want to try in a test environment. You will need to finish the code as the BACKUP LOG statement is incomplete.
declare @RowsDeleted int,
@BatchSize int;
set @BatchSize = 10000;
set @RowsDeleted = @BatchSize;
while @RowsDelete <> 0
begin
with FSSDataDelete (
fssWellId,
fssIndex,
fssTimeStamp
) as (
select top (@BatchSize)
fssWellId,
fssIndex,
fssTimeStamp
from
dbo.FSS_Data
where
fssTimeStamp < @dtPurgeDate
order by
fssWellId,
fssIndex,
fssTimeStamp
)
delete from
dbo.FSS_Data
from
dbo.FSS_Data fssd
inner join FSSDataDelete fssdd
on (fssd.fssWellId = fssdd.fssWellId
and fssd.fssIndex = fssdd.fssIndex
and fssd.fssTimeStamp = fssdd.fssTimeStamp);
set @RowsDeleted = @@rowcount;
backup log {your_db_name] to disk = 'path to file'; -- path to fie should be a variable, with a dynamic filename
end
March 5, 2009 at 8:15 pm
Partitioning anybody ???
RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."
March 5, 2009 at 8:26 pm
rudy komacsar (3/5/2009)
Partitioning anybody ???
Would be a good way to go, but the how will depend on what edition of SQL Server 2005 is being used. If Enterprise Edition is being used, then yes, definately move to partitioned tables. If using Standard Edition, then your choice then is to use a partitioned view which would be more difficult to maintain.
March 5, 2009 at 8:32 pm
Point taken Lynn. However given the size and scale of the issue ,the minor inconveniences that making partitioning work in standard edition would probably far outweigh the potential fallout from user backlash or performance fallout.
It all boils down to design, planning, planning, planning and testing - been there and done that already.
RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."
March 5, 2009 at 8:40 pm
I'm not knocking using a partitioned view. Your right, it takes a lot more planning and testing but can be accomplished.
It really comes down to what the OP can sell to "the powers that be".
March 10, 2009 at 12:31 pm
You guys are great. I've been out of town on personal business, but I was very happy to read all of the replies when I got back. I did some reading based on your suggestions and I think I'm going to try to setup partitioning. I'm going to do some experiments this week to determine what the trade offs are going to be and I'll post them here when I'm done. Thanks again for all of the suggestions.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply