Deleting data from large table

  • I have a 130 GIG table and need to delete some of the data as it got very large.

    The trouble is everytime it is deleting just for a day it uses at least 10 GIG and I am having to reorganize the data doing the shirnk after every day of data.

    If i use bulk logging instead of simple will this keep the temporary storage that it needs to build before it deletes the data to a smaller level...

  • do you mean using 10GB of tran log?

    do this as a crawler delete, something like this, so tran logs can be backed up

    Declare @num int

    Set @num = 1 -- force it into loop first time (or could do count of no of records to be deleted )

    While @num <> 0

    Begin

    begin transaction

    set rowcount 10000 -- stop after 10000 rows, change to value you want

    delete from your table where your filter

    set @num = @@rowcount -- capture no of rows deleted, when this is 0, will drop out of loop

    commit transaction

    --waitfor delay '00:00:01'

    end

    if @@trancount > 0

    commit transaction

    alternatively use top.

    the waitfor is to stop the delete being so intrusive and give other processes a chance

    --copyright 2009 Jeff Moden

    ---------------------------------------------------------------------

  • That is how much trans log it is filling up with each time it deletes the records.

    I could try the crawler will it then only do the log when you comit it.

  • The database is in simple mode.

    Declare @num int

    Set @num = 1 -- force it into loop first time (or could do count of no of records to be deleted )

    While @num <> 0

    Begin

    begin transaction

    set rowcount 10000 -- stop after 10000 rows

    delete from mytable where

    StartDate >= '2009-12-04' and

    EndDate <= '2009-12-05'

    set @num = @@rowcount -- capture no of rows deleted, when this is 0, will drop out of loop

    commit transaction

    -- waitfor delay '00:00:01'

    end

    if @@trancount > 0

    commit transaction

    So is this going to do each row and commit it and not write to the transaction log i am finding I am having to do

    DBCC SHRINKFILE (N'mytable' , 87410) 87410 changes every 1 gig i can get back.

    Need to go to get to two weeks....data left so far only at Dec dates

    because each day causes the transaction log to go up 10to 15 GiG and i have to shrink database.

  • First, you should not use the SET ROWCOUNT to limit the number of rows processed in SQL Server 2005/2008. This has been marked for depreciation in future versions of SQL Server. You should use the TOP in instead.

    Please check out the following for more information:

    How To Delete a Large Number of Records[/url]

    Limiting Deleted Rows by Using TOP

  • Tracey, the code I gave you will commit every 10000 rows because thats the value set for rowcount. As your log is in simple mode it will automatically truncate that data out so your log will not grow.

    10000 is an arbitrary value, you can use whatever suits and you may well be able to use a higher value without seeing any growth in the log. Use dbcc sqlperf(logspace) to track how much log is used.

    the higher the value for rowcount the faster the delete should run. The waitfor is also not a necessity and it will run faster without it, just depends what else is going on on your table.

    If you are deleting most of the data in the table then instead you should insert the data you want to keep into a new table, drop the original, then rename the new table.

    In my first post I mentioned the use of top as an alternative because as lynn correctly says set rowcount will be deprecated in future versions, so its up to you which code you want to use in this instance, both do the same thing.

    ---------------------------------------------------------------------

  • george sibbald (3/21/2010)


    ...its up to you which code you want to use in this instance, both do the same thing.

    Not quite the same, at least not exactly. ROWCOUNT with data modification statements is deprecated for good reason. First, the number of rows set typically cannot be used by the optimizer in the same way that TOP can - potentially resulting in a less optimal execution plan. Second, the value set for ROWCOUNT stays in effect for any trigger code that might be invoked. Unless the trigger is coded to SET ROWCOUNT 0 explicitly, unexpected results may occur.

  • Paul White NZ (3/21/2010)


    george sibbald (3/21/2010)


    ...its up to you which code you want to use in this instance, both do the same thing.

    Not quite the same, at least not exactly. ROWCOUNT with data modification statements is deprecated for good reason. First, the number of rows set typically cannot be used by the optimizer in the same way that TOP can - potentially resulting in a less optimal execution plan. Second, the value set for ROWCOUNT stays in effect for any trigger code that might be invoked. Unless the trigger is coded to SET ROWCOUNT 0 explicitly, unexpected results may occur.

    thanks Paul.

    tracey, sounds to me like if you haven't started yet you should go for the top method, especially if you have a delete trigger on this table, and to get in the habit of using top going forward. If you have started, don't worry, the code I gave works. If you want to cancel a run you can as you only lose the last batch.

    ---------------------------------------------------------------------

  • Right now I have the following

    There are approx 133095 rows per day.

    DELETE TOP(100000)

    FROM tablename WHERE

    startdate >= '2009-12-17' and

    startdate <= '2009-12-17'

    Then this uses approx 10 GIG of Log space so I am doing

    DBCC SHRINKFILE (N'databasename_log' , 0)

    GO

    It takes 10-15 minutes to execute.

    So each time it finishes I am just chaining the date in the where clause.

    I got back 50 GIG yesterday still 80 GIG to go......

  • TRACEY-320982 (3/21/2010)


    Right now I have the following

    There are approx 133095 rows per day.

    DELETE TOP(100000)

    FROM tablename WHERE

    startdate >= '2009-12-17' and

    startdate <= '2009-12-17'

    Then this uses approx 10 GIG of Log space so I am doing

    DBCC SHRINKFILE (N'databasename_log' , 0)

    GO

    It takes 10-15 minutes to execute.

    So each time it finishes I am just chaining the date in the where clause.

    I got back 50 GIG yesterday still 80 GIG to go......

    Don't shrink the log after each execution - it is just going to grow again and that is going to slow the whole process down. Manually grow the log file to a size larger than 10GB - and leave it there until this process has completed.

    Once the process has completed - then you can shrink the log file back to its normal working size.

    If you are using a date range to perform the delete, and it average 120,000 rows per delete - I would drop the TOP value to 10000 instead of 100000. This will use less log space and might even process faster (maybe).

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • TRACEY-320982 (3/21/2010)


    Right now I have the following

    There are approx 133095 rows per day.

    DELETE TOP(100000)

    FROM tablename WHERE

    startdate >= '2009-12-17' and

    startdate <= '2009-12-17'

    with these values you are still doing almost a days worth per transaction, so reduce the value for top, i would go for 10000.

    don't shrink the log between runs, its just going to grow again.

    why not startdate >= '2009-12-17' and startdate < '2009-12-18'?

    even better specify the full range you want to delete in the while loop and just leave it to run.

    It sounds like you are deleting most of the data in the table, yes? If so use insert into to copy the data you want to keep into a new table rather than delete.

    ---------------------------------------------------------------------

  • and another thing, if you stick with the delete, reindex this table when you have finished, the likeliehood of internal fragmentation is high.

    ---------------------------------------------------------------------

  • Thanks everyone for your advice..I am going through this long task. I will not delete the log each time as you mentioned as it keeps filling up. I do the loop and sit back.

    Cheers everyone.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply