Deleting data from large tables.

  • I have a 5 column table FSS_Data (fssWellID int, fssIndex int, fssTimeStamp datetime, fssType int, fssValue real) where the primary keys are WellID, Index, and Timestamp. I have separate ascending indices on fssTimeStamp and fssValue. The table is used to store 30 days worth of data. There are approximately 300 million rows of data for that period of time. The current delete operation that I'm using looks like the following and it run on a configurable time interval (currently every 10 minutes):

    DELETE TOP (1000) FROM FSS_Data WHERE fssTimeStamp < @dtPurgeDate

    WHILE @@ROWCOUNT > 0

    BEGIN

    DELETE TOP (1000) FROM FSS_Data WHERE fssTimeStamp < @dtPurgeDate

    END

    Using one delete statement takes too long and makes the transaction log grow to large. I was thinking about disabling the indices for the table and rebuilding them after the delete statement, but I don't know what the overhead of rebuilding the indices are. Does anyone have any suggestions about how I could improve the performance of these delete statements? I would also like to know how to calculate the overhead of rebuilding table indices.

    TLDR: What is the overhead of rebuilding table indices? What are the real advantages of disabling indices during a delete?

  • So you're trying to delete 300/30 or about 10 million rows per day? If you only delete 1000 rows every 10 minutes, that means that you would need 10,000 ten minute intervals to delete what you need to... kind of a loosing battle there.

    Dropping and rebuilding indexes on such a large table would probably be prohibitively long and they're also logged... so nothing gained by doing that. The choice of clustered index (I assume the PK you stated is clustered) isn't going to help a bit. Depending on the "spread", you could be locking a lot of the table making it nearly unusable during the deletes.

    My recommendation would be to see how many rows you can delete in a second with TABLOCKX on the table. Then, set you delete rowcount to that and increase the frequency of the deletes to once every 5 seconds. Of course, you'll need to do a little calculation and see if it will ever catch up to what needs to be deleted at that rate.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • You are adding 10,000,000 rows a day, about 116 rows per second, so to keep up you will have to be able to delete at at least that rate.

    You should try some medium size deletes, say 50,000 at a time, and see how long they take to run. It would be best if you deleted in primary key sequence. If you could delete 50,000 rows at a time, you could delete 10,000,000 rows in 200 iterations of the loop.

    I think you are going to run into problems fairly quickly with severe index fragmentation. If you see your deletes starting to take longer and longer, you will nedd to stop to re-index or defrag.

    You should look into a solution using partitioned tables where you can just remove data in bulk by day, week, or month. It will be easier to manage than trying to keep up with high volume deletes, and will use less resources.

  • Here is some code you may want to try in a test environment. You will need to finish the code as the BACKUP LOG statement is incomplete.

    declare @RowsDeleted int,

    @BatchSize int;

    set @BatchSize = 10000;

    set @RowsDeleted = @BatchSize;

    while @RowsDelete <> 0

    begin

    with FSSDataDelete (

    fssWellId,

    fssIndex,

    fssTimeStamp

    ) as (

    select top (@BatchSize)

    fssWellId,

    fssIndex,

    fssTimeStamp

    from

    dbo.FSS_Data

    where

    fssTimeStamp < @dtPurgeDate

    order by

    fssWellId,

    fssIndex,

    fssTimeStamp

    )

    delete from

    dbo.FSS_Data

    from

    dbo.FSS_Data fssd

    inner join FSSDataDelete fssdd

    on (fssd.fssWellId = fssdd.fssWellId

    and fssd.fssIndex = fssdd.fssIndex

    and fssd.fssTimeStamp = fssdd.fssTimeStamp);

    set @RowsDeleted = @@rowcount;

    backup log {your_db_name] to disk = 'path to file'; -- path to fie should be a variable, with a dynamic filename

    end

  • Partitioning anybody ???

    RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

  • rudy komacsar (3/5/2009)


    Partitioning anybody ???

    Would be a good way to go, but the how will depend on what edition of SQL Server 2005 is being used. If Enterprise Edition is being used, then yes, definately move to partitioned tables. If using Standard Edition, then your choice then is to use a partitioned view which would be more difficult to maintain.

  • Point taken Lynn. However given the size and scale of the issue ,the minor inconveniences that making partitioning work in standard edition would probably far outweigh the potential fallout from user backlash or performance fallout.

    It all boils down to design, planning, planning, planning and testing - been there and done that already.

    RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

  • I'm not knocking using a partitioned view. Your right, it takes a lot more planning and testing but can be accomplished.

    It really comes down to what the OP can sell to "the powers that be".

  • You guys are great. I've been out of town on personal business, but I was very happy to read all of the replies when I got back. I did some reading based on your suggestions and I think I'm going to try to setup partitioning. I'm going to do some experiments this week to determine what the trade offs are going to be and I'll post them here when I'm done. Thanks again for all of the suggestions.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply