Speeding up deletes on a large table

  • Hi,

    I have a table that contains around 200 million rows.

    It has two indexes - A Clustered Index and another a Non-Clustered Index

    I have to delete some data from this depending on a "date" column which is not part of any of the indexes.

    I have started running the delete command as:

    Delete from <TableName> Where <DateColumn> < '2005-05-01'

    but it is taking forever to complete.

     

    Is there any way to speed up this process of deletion?

    I thought of one way that is of dropping the non-clustered index because when the data has to be deleted from the table it has to be deleted from the non-clustered index thereby SQL Server has to do dual work. After performing all the deletes I can re-add the non-clustered index. Apart from this is there any other way.

     

    Regards

    RSingh

  • If it were me i'd be thinking of moving all of the data >= '20050501' into a seperate table and then truncating the original, better yet, drop it completely and rename the new tableto the old table name.

    I'm assuming the bulk of the rows are from before this cut off date and you want to get rid of them? 

    Also, adding the Date Column to the index may also help with the initial select into another table.

  • Mike,

    Thanks for the reply.

    The count of rows for >= '20050501' is again 75 million.

    Wouldn't it be again time consuming to Select so many rows into another table and then creating both the indexes on the new table.

    Regards

  • I would suggest your biggest issue is going to be handling the transaction logs.  This is going to be a pain wether you are inserting 75million rows or deleting 125million.

     

    Check your recovery model - if its simple then what you should do is batch this up into batches of eg 300k records.  This will allow the log to checkpoint and clear out.

    If its full,then you have 2  choices - put it to simple, or schedule tlog bakups every few minutes this process runs.

     

    Either way, batching is going to be your only practical way forward.

     

    good luck...

  • I agree, with that number of records you'd be better off doing it in batches and controlling the transaction log size yourself.

    An index on the date would still be beneficial though.

  • One thing you might consider, if you don't have to worry about ongoing transactions "while U wait", is a variation of Mike's suggestion.  Use BCP with the QUERYOUT option to put the rows you want to *keep* on an OS flat file on the server; then truncate the table; then BCP the results back into the table.  I would drop the PK and indexes before BCPing the data back into the tables, and re-install them once the re-population is complete.  Also, set the batch size (e.g., -b1000) on the re-population, so it commits incrementally.  You may not (I'm trying to anticipate possible problems) be able to use the -n option (NATIVE), so if that doesn't work, -c (character) ought to work; if -c, the default delimiter is a tab character (ASCII 9), so it might not be a bad idea to specify another delimiter (e.g., ASCII 2) if you want to avoid using a delimiter that might be included in the actual data.

    Good luck!

  • Only difficulty there is then the pk (I am carelessly assuming its clustered) rebuild...

  • Thanks everyone for the reply.

    Here is some additional information in this scenario:

    1. The recovery model of the DB is set to SIMPLE.

    2. The table in question doesn't have any Primary Key. It has got only one clustered index(non-unique) and one non-clustered index.

    3. The column used in the where clause is not part of any of the indexes.

    Here is the summary of steps to take as per your recommendations:

    1. BCP the data to keep (75 million rows in this case) to a flat file.

    2. Truncate the table

    3. Drop the indexes.

    4. BCP the data in.

    5. Re-create the indexes.

    Please tell if the above sequence of steps is correct.

    Also can anyone tell how much time it will take on a DB Server with 4 GB RAM (AWE Enabled) and 4 Intel Pentium(III) Xeon Processors (701 MHz).

    Regards

  • I don't know about the sequence... but might I suggest you test this with a smaller load (say 2M / 0.75M rows) to make sure that it works, and that would give you a small idea of the needed time to accomplish this.

  • One step I forgot to mention:  have a good backup!  🙂  Also, if you have a test server, you should debug your process before you go live.

    Also, before you even start:  Assuming you may not want to restore the *whole* database, you ought to BCP the *entire* table out to a nice secluded folder somewhere, so that if something bad happens, you can return the situation back to the status quo ante.

  • Another possibility:

    You can create a job to delete some 100.000 rows and schedule it to run every few minutes (depending on how long it takes to complete). I recently deleted about 60 M rows from a table this way, during normal operation of a database - without visible performance decrease. Of course the process took several days to complete, but that wasn't an issue. However, I had clustered index on the date_column... in fact I indexed this column on purpose before I started with deletion process (the table had no clustered index before).

    The job could be very simple, like this:

    SET ROWCOUNT 100000

    DELETE your_table

    WHERE date_column < '20050101'

    I never tried the BCP method, so I can't compare the effect on performance and the speed of both.

  • I have a table of 832 million rows and i had problems with delete as well based on date. The approach that produced the best improvement is:

    1. Create a clustered index on the date column

    2. Create a batch delete i.e Delete top 2million or 4million based on where clause on date column and truncate the transaction log as a next step. You could make it 1million or less based on the availability of space for log growth

    3. Loop through 2, till the @@rowcount returns 0

    Deleting 30million rows from 832 million row table takes 45mins.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply