Data archival method - viable?

  • Context

    Looking to implement a database archival process that could be used across different product databases.

    SQL Server 2008 R2 - Standard Edition

    Product database schema are subject to minor but regular change

    Main product databases keep data for the legally required time.

    No sensitive data (e.g. PCI) is stored.

    Unlikely that archived data will be queried - a restore to a new database can be used for this purpose.

    Previously used methods of archiving data

    Have looked at other archival methods in past

    - Product_Archive Database

    - Partitioned table (Std Edt limitation)

    Have used the following method

    - Backup full database using SQL Server backup

    - Delete from live database the database that is no longer required (older than x date)

    - Advantages - Simple

    - Disadvantages - Manual, full backup is large, will backup format be available in newer server versions in 3,5,7,9 years time?

    Alternative method

    PowerShell archive script

    Configuration file for product database listing tables to be archived and the archive criteria (Records < DateColumn field). Table order within the configuration file respects the referential integrity constraints

    For each table

    - Use the SQL Server Management Object (SMO) api to script out the table schema

    - Use bcp to create a format file for the table being archived

    - Use bcp to extract archive data to comma separated value format

    - Run TSQL to delete archived records from the database

    - Zip all output to an archive zip

    A parallel archive restore script would be used to load archived data to a new database if required.

    Powershell restore script

    Unzip archive zip

    For each table in the archived zip

    - Use SQL Server Management Object (SMO) api to create tables based on output schema creation script created from the archive script.

    - Use bcp to reload data to newly created table from archive csv file using the bcp format file created from the archive script.

    Advantages

    - Scripted

    - Data archive format is basic (comma separated value)

    - Schema is stored along with the archive data

    - Data archived is targeted to specific tables

    Disadvantages

    - Slightly more complex

    Any thoughts on this proposed method? Viable? Other advantages/disadvantages?

    Thanks

  • Any reason to not just back up the database and then clean out the old data from the live copy?

    A backup has the advantage of maintaining point-in-time consistency between tables, where a data extract won't do that.

    The other advantage is simplicity of restoration.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Definitely more complex, but has the advantage of not being dependent on tools in the future. It's got more places things could break, but they could also be recovered.

    However, there's one thing I'd note. Sometimes you don't want data to be recovered 9 years. Tax law is 7, IIRC, and other industries might want less (or more). I'd get a legal opinion for the company and then decide. These days I wouldn't keep the data beyond what I had to, specifically for lawsuits. I once had to restore a 9 year old database, and management wasn't happy that we could.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply