Archiving and Deleting Tables

  • Hi,

    My database has a table that grows continuously as the application users do things, as it is used to record user's actions.

    In order to prevent the database from getting too large, I have a requirement to clear out the table whenever it reaches to a certain size, but before clearing out the table, I must save the data into some sort of yet-to-be-determined archive file.

    The table has no relationships with any other tables in the database, so I figure this should be fairly straight forward to do.

    That said, I am struggling to figure out how to do it as I am a newbie to SQL Server (and databases in general) so I don't even know where to start looking. Any direction that anybody can provide will be much appreciated.

    Questions:

    1. How can I set up SQL Server 2008 to export the table and clear it when it reaches the size limit?

    2. Any recommendations for the archived file format? (I was thinking a CSV file would be sufficient.)

    Thanks,

    -jeremy

  • Exporting to a file is easily done with BCP.

    An important question here though is, what benefits do you (or your boss/company) hope to derive by making some of the data unavailable to the application? Likewise, why do you need to still keep it somewhere else? (what circumstances/conditions would require you to retrieve it and for what tool and/or purpose?)

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Thanks for your reply RBarryYoung. You have brought up some good questions.

    The thought is (or at least what my boss is telling me) that if we allow this table to grow without periodically wiping it out, the size of the database will grow (obviously) which will slow down the performance of the database (I'm not so convinced about this one) and will eventually lead to running out of disk space.

    The data is being stored simply to maintain an audit trail of who did what and when. The application I am writing will be used to support the engineering of various safety-related systems and (for reasons I have not yet fully grasped) keeping track of this information is therefore important. I think the main purpose of maintaining the audit trail is more or less to provide conclusive evidence that the engineering activities that were supposed to have been done were actually done. Regardless, it is a requirement so I must comply.

    There are no requirements for the application to use any of the information stored in the audit tables. The tables simply need to be created and maintained so that in the off-chance that we actually have to go back and look at them, it is possible to open it up in, say, Excel to take a look.

    Each time the table contents are archived and deleted, I am thinking that the archive file will simply be compressed and stuffed into some directory for long-term storage, thus saving disk space and keeping the database smaller and faster.

    Does this reasoning make sense to you?

    Thanks,

    -jeremy

  • I think I'd look closer at the requirements. Is there a need to have x amount on-line for ready access, say previous month plus current for instance. If this works, then you just need a monthly process to archive the appropriate data from the audit table(s). You'd want to be sure that there is a datetime field in each record to use for archiving purposes in this case as well.

    This would also make the archiving process easier to develop and maintain instead of dynamically based on size, and I would assume you are talking about a certain number of rows.

  • Thanks for your reply, Lynn.

    I'm with you as far as archiving based upon time rather than table size. Makes sense and sounds easier. Plus, I think I could easily get that requirement changed without too much fuss.

    I also do want this process to be automatically handled by SQL Server.

    Conceptually, I think I know what I want to do, but I am having trouble figuring out the terminology to search for information on the various features in SQL Server that are required to make it happen.

    Essentially, I want SQL Server to automatically execute a script on a periodic basis. That script should export the information from a database table to a CSV file, compress CSV the file (nice, but not necessary), and delete all data from the table.

    If anyone can suggest the SQL Server features that I should use to do this, that would be great. I just need to know what I'm need to learn more about.

    Thanks,

    -jeremy

  • You would use a scheduled job to perform the archive process. Start by looking up SQL Server Agent in Books Online.

  • Sure it's relatively straightforward. Some folks would rather schedule & drive this from DOS that SQL, but either way works.

    The SOP approach would be a SQL Agent Job scheduled to run every 30 days (or whatever) with the following steps:

    1. Operating System (CmdExec) step that runs BCP to extract all rows older than today - XX days to a CSV file named after the extraction date.

    2. Operating System (CmdExec) step that looks for the CSV above using the calculated extraction date to build the file name, then uses a command line ZIP or other means to compress it (or copy it into a compression file/library), then deletes the original file.

    3. SQL step that deletes all rows older than XX days from the table.

    Error detection is important here: any errors in any step should cause the Job to abort and an Alert to be sent.

    Also, the date math and comparisons should be written to strip or ignore time-of-day. Also, make sure that this runs after midnight (and not just before) so that no date change occurs in the middle of execution.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • OR, using the OUTPUT clause of the delete, you could delete from the audit tables and load the deleted records to stage tables that would make exporting the data with BCP easier. Just be sure to truncate the stage tables prior to loading.

    Still need to be sure of the date checking criteria and such like Barry stated.

  • You can do it either way, however, I prefer to put the DELETE's at the end because I find that it makes the failure modes much safer, and the recovery actions much more "4 AM"-proof.

    I.e., The recovery action (as well as the test action) is always just to rerun it because re-running it is always safe and will always work, once the failure causes have been corrected.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • It's worth mentioning, since you're building a repeatable and well defined process around moving data out of the system, I'd suggest looking at partitioning the table. Partitioning would allow you to, instead of deleting data to simply detach the partition from the table and move it where you need it to go and create a new partition for your table. It's radically faster, reduces downtime and contention.

    I haven't done this myself, but Andy Novick has shown how to do this in several presentations. You might check his web site [/url]for some of the details.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Correct me if I am wrong, isn't table partitioning only available in Enterprise Edition? I thought I had read that somewhere, but I haven't had a chance to check. I can when I get to work, which is where I am heading in a few minutes.

  • It is. I always forget to mention that part.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Much thanks to everyone for their input. I am confident that you have given me the information I need to dig into this problem and figure out how to make it work. This is truly a great forum. You guys rawk! 😀

    -jeremy

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply