Archiving Data

  • Hello Experts,

    we have a situation he a table grown to 180+ Gigs with several millions of records. this affects Re Indexing - online maintanence job weekly due to load everytime. we are planning to move old data to a archive table.

    Suggest a best method to proceed. we are planning to move based on date which is not in any of the index (and stored as int column )and we cant modify as it is a third party application.

    we have come up with this idea

    plan 1 - Rename the table and move one month data from archive to production table.

    plan 2 - Move one month data in several chunks to archive table.

    post this we are planning to schedule weekly archive of the table.

    if you have better plan also please suggest.

    Regards
    Durai Nagarajan

  • I have lots of suggestions but "It Depends" on what's what.

    1. How many months/GB of data do you want to keep active at any time.

    2. What is the format of the date in the INT column? Is it (hopefully) YYYYMMDD?

    3. Is the table referenced by FKs from any other table?

    4. Does the table have an indexed view on it?

    5. What is the PK of this table and does it really have to be unique or is this more like an audit table (which you seem to indicate)? Yeah, I know. Strange question but it's actually very important.

    6. Are there any other unique indexes that actually have to be unique?

    7. Which edition of SQL Server do you have? Standard Edition or Enterprise Edition?

    8. How long does it take to delete a month of information from the table or haven't you tried that yet?

    9. How many rows are in the largest month, so far?

    10. Do the older months need to be updateable or is this truly and audit or history table of sorts?

    11. Can we change the indexes on the table?

    I need answers to all of those questions before I can make a recommendation.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • 1. How many months/GB of data do you want to keep active at any time.

    1 month

    2. What is the format of the date in the INT column? Is it (hopefully) YYYYMMDD?

    like this DATEDIFF(SECOND,{d '1970-01-01'}, '04/1/2013') with '04/1/2013' beign today's date

    3. Is the table referenced by FKs from any other table?

    No

    4. Does the table have an indexed view on it?

    No

    5. What is the PK of this table and does it really have to be unique or is this more like an audit table (which you seem to indicate)? Yeah, I know. Strange question but it's actually very important.

    ID an int column but not identity

    6. Are there any other unique indexes that actually have to be unique?

    No

    7. Which edition of SQL Server do you have? Standard Edition or Enterprise Edition?

    Enterprise

    8. How long does it take to delete a month of information from the table or haven't you tried that yet?

    Haven't tried yet , we have requested for test server.

    9. How many rows are in the largest month, so far?

    more than 1.8 million

    10. Do the older months need to be updateable or is this truly and audit or history table of sorts?

    This is a mail audit

    11. Can we change the indexes on the table?

    No

    Regards
    Durai Nagarajan

  • Based upon your answers to Jeff's (excellent) questions, your first option sounds sufficient: Rename the table and move 1 month of data into the production table. Once complete. You can drop all the non clustered indexes from the archive table, making sure you've created them on the new table first.

    Once you've verified you have the proper records in the production table, you can remove them from the archive table, then create a simple script to run every hour/day/week/etc - as an example:DECLARE @ArchiveDate datetime,

    @BatchSize int,

    @Error int = 0,

    @Err varchar(15),

    @ErrMsg varchar(250),

    @Env varchar(65),

    @EmailBody nvarchar(750),

    @EmailSubject nvarchar(140)

    BEGIN

    SELECT @BatchSize = 1000, @Env = @@SERVERNAME, @ArchiveDate = (SELECT DATEADD(MONTH , 1,GETDATE()))

    END

    BEGIN

    BEGIN TRY

    BEGIN TRANSACTION

    /* Create holding Table */

    IF EXISTS(SELECT * FROM tempdb.dbo.sysobjects WHERE ID = OBJECT_ID(N'tempdb..#Temp'))

    BEGIN

    DROP TABLE [dbo].[#Temp]

    END

    /* Get X number of records to Archive */

    CREATE TABLE #Temp (ID bigint)

    INSERT INTO #Temp

    SELECT TOP (@BatchSize) ID

    FROM [ProductionDatabase].[schema].[TableName] WITH(READUNCOMMITTED) --In OLTP Environment, avoid potential table lock

    WHERE <<Column to filter on>> < @ArchiveDate

    /* Archive the data to the storage table */

    INSERT INTO [ArchiveDatabase].[schema].[TableName] (col1, col2, col3...)

    SELECT TOP (@BatchSize) col1, col2, col3...

    FROM [ProductionDatabase].[schema].[TableName] MyData WITH(READUNCOMMITTED) --In OLTP Environment, avoid potential table lock

    INNER JOIN #Temp T ON MyData.ID = T.ID

    SELECT @Error = @Error + @@ERROR

    /* Delete the data from the Production table */

    DELETE MyData FROM [ProductionDatabase].[schema].[TableName] MyData

    INNER JOIN #Temp T ON MyData.ID = T.ID

    SELECT @Error = @Error + @@ERROR

    IF @Error = 0

    BEGIN

    COMMIT TRANSACTION

    END

    ELSE

    BEGIN

    ROLLBACK TRANSACTION

    END

    END TRY

    BEGIN CATCH

    SELECT @Err = ERROR_NUMBER(), @ErrMsg = ERROR_MESSAGE()

    BEGIN

    ROLLBACK TRANSACTION

    SET @EmailSubject = '' + @Env + ' :: DAILY DATA PRUNING FAILURE (DATABASE) dbo.MyTable : '+ CONVERT(VARCHAR(19),GETDATE(),121)

    SET @EmailBody =

    '********************************************************************** ' + CHAR(13)

    + 'ERROR ' + @ErrMsg + CHAR(13)

    + '**********************************************************************' + CHAR(13)

    /* Send Email */

    EXECUTE msdb.dbo.sp_send_dbmail @recipients = 'someonewhoneedstoknow@company.com'

    ,@subject=@EmailSubject

    ,@body = @emailBody

    ,@body_format = 'TEXT'

    ,@sensitivity = 'Confidential'

    ,@importance='High'

    END

    END CATCH

    END

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • Note: I threw that code together as a quick example only and most like will need tweaking so I don't advise running it "as-is" without verifying it beforehand

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • durai nagarajan (2/3/2014)


    1. How many months/GB of data do you want to keep active at any time.

    1 month

    2. What is the format of the date in the INT column? Is it (hopefully) YYYYMMDD?

    like this DATEDIFF(SECOND,{d '1970-01-01'}, '04/1/2013') with '04/1/2013' beign today's date

    3. Is the table referenced by FKs from any other table?

    No

    4. Does the table have an indexed view on it?

    No

    5. What is the PK of this table and does it really have to be unique or is this more like an audit table (which you seem to indicate)? Yeah, I know. Strange question but it's actually very important.

    ID an int column but not identity

    6. Are there any other unique indexes that actually have to be unique?

    No

    7. Which edition of SQL Server do you have? Standard Edition or Enterprise Edition?

    Enterprise

    8. How long does it take to delete a month of information from the table or haven't you tried that yet?

    Haven't tried yet , we have requested for test server.

    9. How many rows are in the largest month, so far?

    more than 1.8 million

    10. Do the older months need to be updateable or is this truly and audit or history table of sorts?

    This is a mail audit

    11. Can we change the indexes on the table?

    No

    Based (especially) on the first question, I don't see any need for special tricks here.

    For the first take, I'd just copy the last 30 days of data into another table (should take less than a minute especially if you use SELECT/INTO to both create and populate the new table), rename the original table as an archive table and rename the new table as the original name. The SELECT/INTO should have an ORDER BY on it for the date column and use OPTION(MAXDOP 1) if you have no intention of adding a clustered index by date.

    Once a day, run a job that copies the rows that are 30 days or older to your archive table (making sure you don't insert dupes and then delete those same rows from the active table. That won't take long because, even in a HEAP, the rows to be deleted will mostly all be together at the logical beginning of the table and shouldn't cause a full table lock. That means that inserts will be able to continue at the logical end of the table. On the average, you'd only be deleting something less than 60,000 rows per day. That should take about 6 seconds because you have no FK's and, I assume, relatively few indexes on the table.

    Some will suggest the ardure of partitioning for this particular table but I don't believe it'll be worth all that because you're only going to keep 1.8 million rows, which is comparatively small for today's machines and databases.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks guys for your input

    They have decided to keep 3 months data so we are planning to move the current 3 months data to new table and drop the existing and rename the new table.

    sorrry for the delay.

    Regards
    Durai Nagarajan

  • Jeff, love to read the questioner you have asked? This reflect the experience you are carrying. Simply superb!!!

    Cheers,

    FM

    ---------------------------------------------------
    "Thare are only 10 types of people in the world:
    Those who understand binary, and those who don't."

  • durai nagarajan (2/11/2014)


    Thanks guys for your input

    They have decided to keep 3 months data so we are planning to move the current 3 months data to new table and drop the existing and rename the new table.

    sorrry for the delay.

    Believe it or not, that's should be just the first step. If you're always going to keep 3 months worth of data online, you have to start asking questions like whether or not that means just the last 90 so days with nightly rolloffs (archive/delete) data older than that or if you want to keep 3 months previous plus the current active month, etc. If you want to rolloff entire months instead of just a day at a time because of "Plan B", then you might want to consider Partitioned Tables and the "3 month sliding" window if your DELETEs take too long.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • free_mascot (2/12/2014)


    Jeff, love to read the questioner you have asked? This reflect the experience you are carrying. Simply superb!!!

    Cheers,

    FM

    Thanks for the feedback, FM. I'm actually smack dab in the middle of setting up partitioning on a 27GB table and a 300GB table and just completed building a presentation on the subject for my local PASS group. The questions I asked are based on all the workarounds I had to do to do it all the right way instead of just throwing partitioning at the table. LOTS of caveats if you want to do it the "right way" for Audit tables of that size especially trying to figure out how to NOT make a whole bunch of file space "Read Only" for each partition.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply