Data Archiving

  • Hi All

    Our company is planning to archive historical data from production database which has 15+ years of data and keep only last 2 years of data. Here are the preliminary steps:

    1.Backup production database and save somewhere safe.

    2.Delete old data from production database and keep only last 2 years of data.

    I have created a script for deleting old data. To make it simple let me show my script for just  tables .

    We have 2014 SQL Server Ent Edition and database is  part of AOAG .

    Please see attached Arch.sql

    We have 24/7 operations .My questions are:

    1.Is this a good way to delete old data (my script)?

    2.Since i am using date range for Trip_log and Trip_Id for Trip what kind of lock will sql server put on these tables?

     

    Thank You advance!

     

  • Uploading  failed.

    Here is my script

     

     

    CREATE TABLE [TRIP](

    [Trip_Id] [int] NOT NULL,

    [Trip_Date] [datetime] NOT NULL,

    CONSTRAINT [PK_TRIP] PRIMARY KEY NONCLUSTERED ([Trip_Id] ASC) ON [PRIMARY]

    ) ON [PRIMARY]

    go

    CREATE TABLE [TRIP_LOG](

    [Log_ID] [int] NOT NULL,

    [Log_Time] [datetime] NOT NULL

    ) ON [PRIMARY]

    go

    declare @Datefrom date=(select min(Trip_Date)from TRIP with (nolock));

    declare @DateTo   date=dateadd(day, 30, @Datefrom);

    declare @T    table(TRIP_ID          int not null);

    declare @print varchar(50)

    insert into @T  select  Trip_Id from TRIP with (nolock) where  Trip_Date between @Datefrom and @DateTo;

    while @Datefrom < @DateTo

    begin

    delete from TRIP_LOG  where  Log_Time between @Datefrom and @DateTo;

    delete from TRIP      where  Trip_Id in (select TRIP_ID from @T);

    set @Datefrom =dateadd(day, 1, @Datefrom)

    if  dateadd(day, 1, @Datefrom)  > @DateTo

    break

    else

    raiserror ( @print, 0, 1) with nowait

    end

  • I would do this the other way around.  That is, rather then deleting 15 years' worth of data, copy the current 2 year's worth of data to another db.  Then, delete the 2 years' worth from the original db.  That db will then hold only the archive data.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • As i mentioned above,we are 24/7 operations.

    By deleting (in batches) old data will not cause any downtime. There are over 1 million daily  inserts/updates just in these 2 tables( out of 10).

  • Do any of the tables that you want to delete from have IDENTITY columns in them? I don't see IDENTITY anywhere in your  CREATE TABLE scripts but just need to be absolutely sure.

    Also... you're only showing two columns in each table... While that may be good enough for an analysis for doing DELETES, deletes are the farthest thing from my mind on this and I need to know all  the columns in the two tables.

    Also, how many rows are in each of the tables to be deleted from and how big in GB are the Clustered Indexes on those tables.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • There are no IDENTITY columns.

    Rowes to be deleted From Trip_Log :1.2 Billion

    Trip_Log Clustered index size:150 GB

    Trip_Log is stand alone table,no FK

    Rowes to be deleted From Trip:30Million

    Trip Clustered index size:15 GB

    Trip table has over 30 columns + about 10 tables that depend on Trip.But for delete ,Trip_Id and Trip_Date are the columns we want/need to use

     

  • Then, this is fairly easy even in a 24/7 environment (although you didn't list the CREATE TABLEs in full so not 100% sure)

    To be sure, the first thing that I'd likely do is to turn off AOAG but that's just me.  I don't use it.  I use Clustered Instances.

    In the following, there's obviously a little more "assumed knowledge detail" in each step that I've not laid out.  This is the general "gist" of things I've done in the past with great success.  Once example of this is really knowing how to get the "Minimal Logging" including the mandatory use of WITH (TABLOCK), the correct ORDER BY and using OPTION (MAXDOP X) where "X" is no greater than half the number of core in your box to a max of 8.  Using more than 8 for this has some seriously diminishing returns that will slow down your 24/7 activities.

    1. Create a new table for each table to be deleted from.  Include only the Clustered Index at this point in time.
    2. Take a full backup of the database.
    3. If in the FULL Recovery Model, shift to the BULK LOGGED Recovery Model.
    4. Use proper "Minimal Logging" techniques to copy only the data to keep to the new tables.
    5. Rename the original tables to include the suffix of "_Old". ("Outage starts now")
    6. Rename the new tables to be the same as the original tables. ("Outage ends now"... should be measured in milliseconds).
    7. If you have any other indexes that need to be built, do them now while in the BULKK LOGGED Recovery Model, which will do them in a minimally logged fashion (fast) if you cannot use WITH ONLINE = ON in the rebuilds.  This will also be the end of the outages, which may be extremely short for each NCI.
    8. Do a final true-up based on the ID columns of the tables.
    9. If you were previously in the FULL Recovery Model, switch back to the Full Recovery model.
    10. Take another FULL backup.
    11. After a week or so of checking and other assurance, just drop the old tables.
    12. If it's essential to recover the space that's been freed up, start a loop for DBCC SHRINKFILE with the understanding that some indexes will be inverted and may need to be rebuilt, which will cause some wasted space.  If you don't need to recover the space, then don't do the shrink.

    Done correctly, total down time should be measured in milli-seconds.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • To minimize the impact on concurrent operations, consider:

    Breaking up the deletion process into smaller batches to reduce the duration of locks.

    Choosing an appropriate time window for executing the deletion script, such as during off-peak hours.

    Monitoring the performance and resource usage of your SQL Server instance during the deletion process to identify any issues.

    Communicating with stakeholders about the planned maintenance activity and potential downtime.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply