Purging huge table, good approach

  • I need to purge a table containing data older than 2009. There is data starting from 2006. Around 20 million records needs to be purged. The database is in bulk-Logged recovery model.

    As this database is a critical one, 24X 7 availability.

    Please share with me best approach to purge this table.

    M&M

  • To add, the database is in bulk-logged recovery model

    M&M

  • I do something similar, updating old records to set IMAGE data to NULL to free up space. It's fairly slow, but it can run 24/7 without affecting anyone.

    Add your own loop logic. I don't know if it would be faster to create a temp table, and insert blocks of records, then delete those blocks, or if it would cause blocking.

    This is simplified code I use:

    -- This gives me 1 valid record to be updated

    set @primarykey-2 = (select top 1 Unique_ID

    from BigTable with (nolock)

    where OrderType = 'NoSale'

    AND ImageField is not null

    AND StartDate < dateadd(dd,-365,getdate())

    -- Now Update it

    UPDATE BigTable SET ImageField = NULL

    WHERE Unique_ID = @primarykey-2

  • I normally do this:

    - Create the exact same table structure with the original one and give it new name

    - Copy only data that you need only over to this new table

    - Rename the old table to something else and rename the new table to the original one

    - Truncate the original table and drop it if you don't need it anymore.

    Hope this will help

    Minh

  • mohammed moinudheen (5/13/2010)


    I need to purge a table containing data older than 2009. There is data starting from 2006. Around 20 million records needs to be purged. The database is in bulk-Logged recovery model.

    As this database is a critical one, 24X 7 availability.

    Please share with me best approach to purge this table.

    Looks like the purge strategy is date based then I would suggest to partition table by such a date then rely on sliding window partition purging.

    This is fast and clean.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Thank you all for the wonderful inputs.

    M&M

  • joemai (5/13/2010)


    I normally do this:

    - Create the exact same table structure with the original one and give it new name

    - Copy only data that you need only over to this new table

    - Rename the old table to something else and rename the new table to the original one

    - Truncate the original table and drop it if you don't need it anymore.

    Hope this will help

    Minh

    If you do this method, make sure you apply any specific permissions, triggers etc. to the new table.

  • joemai (5/13/2010)


    I normally do this:

    - Create the exact same table structure with the original one and give it new name

    - Copy only data that you need only over to this new table

    - Rename the old table to something else and rename the new table to the original one

    - Truncate the original table and drop it if you don't need it anymore.

    Reverse purging is is a fantastic solution providing...

    1- Table is static.

    2- Table is not too big.

    3- % of purged rows candidates is large.

    Having said that, please note this does not works for a table that gets inserte/updated/deleted during the "purge" process.

    Anyway, during reverse purging described last step is an overkill, no need to truncate - just drop it.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Declare @num int

    Set @num = 1 -- force it into loop first time (or could do count of no of records to be deleted )

    While @num <> 0

    Begin

    begin transaction

    set rowcount 10000 -- stop after 10000 rows, change to value you want

    DELETE FROM EMP WHERE --Your year based criteria

    set @num = @@rowcount -- capture no of rows deleted, when this is 0, will drop out of loop

    commit transaction

    -- waitfor delay '00:00:01' --optional to let others in

    end

  • I like Manu's suggestion.

    Doing something like this, deleting large # of records without affecting up-time or users, is one of the few things that are best done in a loop, doing a quick delete of some records, then pausing to ensure that your loop isn't so tight that it keeps everyone else from getting in there. I'd suggest a delete size of about 1000-5000 records, with a 2-3 second pause between the delete iterations.

    Note that you don't need to do the SET ROWCOUNT... you can instead do DELETE TOP (1000)

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply