How do you archive data?

  • Hey everyone thanks for your help. I got the green light to use partitioning on the production database. Basically we have a database where we keep 31 days of logs (pure inserts) across 2 tables.

    Log_Table (holds current day's data)

    Log_Table_Arch (holds previous 30 days)

    During the purge process, we have to move data from Log_Table to Log_Table_Arch. At the tail end, the 32nd day gets purged according to my original post.

    My plan is to use the sliding window methodology by putting Log_Table into 3 partitions, Log_Table_Arch into 31 partitions, and creating a new table Log_Table_Purge. The final table would be used to switch out the 32nd day into a separate table, then copied to the archive server, then truncated/dropped. The partition key would be the insert date, and i will have to changed the clustered index from the primary key to insert_date, and create all indexes to be aligned with the partition scheme.

  • Interesting idea... and, if you really do that, there is no reason for a purge process. Just create the new table on a daily basis and have a script that alters the view to allow the new table and not the oldest. It'll take about 65 milliseconds to rebuild the view which is much shorter than any Delete type of purge.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Oh yeah... almost forgot... plan for it now... if you leave the long term archive as daily tables, you'll have a hell of a time doing a search for a bit of given data for anything over 31 days.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Gabe (12/15/2008)


    Hey everyone thanks for your help. I got the green light to use partitioning on the production database. Basically we have a database where we keep 31 days of logs (pure inserts) across 2 tables.

    Log_Table (holds current day's data)

    Log_Table_Arch (holds previous 30 days)

    During the purge process, we have to move data from Log_Table to Log_Table_Arch. At the tail end, the 32nd day gets purged according to my original post.

    My plan is to use the sliding window methodology by putting Log_Table into 3 partitions, Log_Table_Arch into 31 partitions, and creating a new table Log_Table_Purge. The final table would be used to switch out the 32nd day into a separate table, then copied to the archive server, then truncated/dropped. The partition key would be the insert date, and i will have to changed the clustered index from the primary key to insert_date, and create all indexes to be aligned with the partition scheme.

    When you say you plan to "use the sliding window methodolgy", are you talking about the same thing as found in the following SQL Server 2005 Help URL:

    ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/e8bcefac-19a9-44dd-a350-dfb4b593e21d.htm

    Heh... I was just looking for the differences between partitioned "views" and partitioned "tables" and found that little piece of computational heaven. If it actually works the way they say it does, am I gonna have some fun playing with archive scenarios... the goal will be just a wee bit more sophisticated than their 2 table example... I'm gonna do the ol' short term, near team, long term, and offline archive scenario. Might even write up an article on it if someone hasn't already... looking for that now.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes although my situation is a little different since I have the two tables.

    Some obstacles I've come across which will save you time if you plan ahead:

    Bear in mind speed and minimal logging of purging is my goal - in order to achieve this:

    1) All partitions must use the same filegroup

    2) All indexes must be aligned (see the 2nd link below for more detail)

    3) The clustered index must also be the partition key

    4) If you select records that cross multiple partitions, SS 2005 will only use 1 thread per partition. Thus if read performance is high priority, and you have a large data set, you should use many partitions and make sure each partition has the same size of data. For example say if you partition by day, and your friday's have 5 times more activity. If you select data for Mo-Wed it may take 30 sec, but if you select Wed-Fri it will take 150 sec - Wed & Thurs will be done after around 30 seconds, but the 1 thread working on Friday will take longer. (of course, i'm over simplifying the math for sake of getting point across.) This is fixed in SQL Server 2008.

    Lastly remember that partitioning is only enabled on Enterprise edition

    Also these link gives more detail on what scripts you can run if you implement it:

    http://technet.microsoft.com/en-us/library/aa964122(SQL.90).aspx

    http://msdn.microsoft.com/en-us/library/ms345146.aspx

  • Gabe (12/18/2008)


    3) The clustered index must also be the partition key.

    I would say that's only true for things that are temporal in nature. If the partitions are based on something less finite than Date/Time, such as Location, then I can't see wasting a valuable thing like a Clustered Index on something that may only have 10 or 12 values in it overall. Of course, I could be wrong and, if I am, I sure would like to know why because I'd like to use the tool in the very near future for just such a thing... obviously, I'm very new to partitioned tables and I haven't setup any performance experiements, yet. Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • No argument from me - but I mentioned that only because during the sliding window, you need to split partitions in order to create a new empty partition. If there's no index on the partitioning key, it has no way of knowing whether any of the data in the existing partition will fall in the new partition range, so the database engine will have to do a full scan (on the affected partition).

    Example:

    I have a table partitioned by day on insert_date which is populated by GetDate(). I need to create a new partition tomorrow, so I alter the partition function (after altering partition scheme setting next used filegroup):

    ALTER partition scheme ps_Day

    NEXT USED [FG1]

    ALTER partition function pf_Day()

    SPLIT RANGE ('12/19/2008')

    If I do not have an index on the partitioning key (insert_date), you and I know that we're creating an empty partition, but there's no way for the database engine to know whether the previous partition (RIGHT '12/18/2008') has any records that need to be moved into the new partition since its boundary is currently open ended - thus a full scan on all data in the 12/18/2008 partition.

    Honestly you can probably use a nonclustered index on the partitioning key with the same results (at the moment I can't think of any reason why it wouldn't work - but i haven't tested it).

  • Just an FYI, I just want to let you know the partitioning has been successfully deployed. The 3 hour purge/archive process now takes 40 minutes with Switch Partitioning. Transaction Log backup files were between 2-4 GB every 10 minutes during the process; now they are average off hours size (50 MB) during the same time frame (as if it isn't even happening).

  • Thanks for the success story, Gabriel. Like I said, I've not used partitioned tables in production, yet. From all the hoopla in BOL, I have to say I'm disappointed with the amount of time it takes to do the sliding window thing... I would have expected a 3 hour archivet to become just several minutes... maybe even less. I'll definitely have to do some serious testing before we try such a deployement.

    Thanks again.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I apologize, I didn't clarify. The 40 minutes technically is really just the time it takes to copy the data over to the archive server (purging & archiving was all done in one job). The sliding window switching step in that job takes less than a second to run. It lives up to the hoopla! 😀

  • AH! Now that's more like it! Long live "hoopla"! 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply