Data Archiving

Question

Post reply

Data Archiving

Barcelona10

SSCertifiable

Points: 5373
More actions
February 2, 2024 at 6:30 pm

#4372202

Hi All
Our company is planning to archive historical data from production database which has 15+ years of data and keep only last 2 years of data. Here are the preliminary steps:
1.Backup production database and save somewhere safe.
2.Delete old data from production database and keep only last 2 years of data.
I have created a script for deleting old data. To make it simple let me show my script for just tables .
We have 2014 SQL Server Ent Edition and database is part of AOAG .
Please see attached Arch.sql
We have 24/7 operations .My questions are:
1.Is this a good way to delete old data (my script)?
2.Since i am using date range for Trip_log and Trip_Id for Trip what kind of lock will sql server put on these tables?

Thank You advance!

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Barcelona10 SSCertifiable Points: 5373 More actions · Answer 1

Uploading failed.

Here is my script

CREATE TABLE [TRIP](

[Trip_Id] [int] NOT NULL,

[Trip_Date] [datetime] NOT NULL,

CONSTRAINT [PK_TRIP] PRIMARY KEY NONCLUSTERED ([Trip_Id] ASC) ON [PRIMARY]

) ON [PRIMARY]

go

CREATE TABLE [TRIP_LOG](

[Log_ID] [int] NOT NULL,

[Log_Time] [datetime] NOT NULL

) ON [PRIMARY]

go

declare @Datefrom date=(select min(Trip_Date)from TRIP with (nolock));

declare @DateTo date=dateadd(day, 30, @Datefrom);

declare @T table(TRIP_ID int not null);

declare @print varchar(50)

insert into @T select Trip_Id from TRIP with (nolock) where Trip_Date between @Datefrom and @DateTo;

while @Datefrom < @DateTo

begin

delete from TRIP_LOG where Log_Time between @Datefrom and @DateTo;

delete from TRIP where Trip_Id in (select TRIP_ID from @T);

set @Datefrom =dateadd(day, 1, @Datefrom)

if dateadd(day, 1, @Datefrom) > @DateTo

break

else

raiserror ( @print, 0, 1) with nowait

end

ScottPletcher SSC Guru Points: 101180 More actions · Answer 2

I would do this the other way around. That is, rather then deleting 15 years' worth of data, copy the current 2 year's worth of data to another db. Then, delete the 2 years' worth from the original db. That db will then hold only the archive data.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Barcelona10 SSCertifiable Points: 5373 More actions · Answer 3

As i mentioned above,we are 24/7 operations.

By deleting (in batches) old data will not cause any downtime. There are over 1 million daily inserts/updates just in these 2 tables( out of 10).

Jeff Moden SSC Guru Points: 1004558 More actions · Answer 4

Do any of the tables that you want to delete from have IDENTITY columns in them? I don't see IDENTITY anywhere in your CREATE TABLE scripts but just need to be absolutely sure.

Also... you're only showing two columns in each table... While that may be good enough for an analysis for doing DELETES, deletes are the farthest thing from my mind on this and I need to know all the columns in the two tables.

Also, how many rows are in each of the tables to be deleted from and how big in GB are the Clustered Indexes on those tables.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Barcelona10 SSCertifiable Points: 5373 More actions · Answer 5

There are no IDENTITY columns.

Rowes to be deleted From Trip_Log :1.2 Billion

Trip_Log Clustered index size:150 GB

Trip_Log is stand alone table,no FK

Rowes to be deleted From Trip:30Million

Trip Clustered index size:15 GB

Trip table has over 30 columns + about 10 tables that depend on Trip.But for delete ,Trip_Id and Trip_Date are the columns we want/need to use

Jeff Moden SSC Guru Points: 1004558 More actions · Answer 6

Then, this is fairly easy even in a 24/7 environment (although you didn't list the CREATE TABLEs in full so not 100% sure)

To be sure, the first thing that I'd likely do is to turn off AOAG but that's just me. I don't use it. I use Clustered Instances.

In the following, there's obviously a little more "assumed knowledge detail" in each step that I've not laid out. This is the general "gist" of things I've done in the past with great success. Once example of this is really knowing how to get the "Minimal Logging" including the mandatory use of WITH (TABLOCK), the correct ORDER BY and using OPTION (MAXDOP X) where "X" is no greater than half the number of core in your box to a max of 8. Using more than 8 for this has some seriously diminishing returns that will slow down your 24/7 activities.

Create a new table for each table to be deleted from. Include only the Clustered Index at this point in time.
Take a full backup of the database.
If in the FULL Recovery Model, shift to the BULK LOGGED Recovery Model.
Use proper "Minimal Logging" techniques to copy only the data to keep to the new tables.
Rename the original tables to include the suffix of "_Old". ("Outage starts now")
Rename the new tables to be the same as the original tables. ("Outage ends now"... should be measured in milliseconds).
If you have any other indexes that need to be built, do them now while in the BULKK LOGGED Recovery Model, which will do them in a minimally logged fashion (fast) if you cannot use WITH ONLINE = ON in the rebuilds. This will also be the end of the outages, which may be extremely short for each NCI.
Do a final true-up based on the ID columns of the tables.
If you were previously in the FULL Recovery Model, switch back to the Full Recovery model.
Take another FULL backup.
After a week or so of checking and other assurance, just drop the old tables.
If it's essential to recover the space that's been freed up, start a loop for DBCC SHRINKFILE with the understanding that some indexes will be inverted and may need to be rebuilt, which will cause some wasted space. If you don't need to recover the space, then don't do the shrink.

Done correctly, total down time should be measured in milli-seconds.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

eric_shane Newbie Points: 3 More actions · Answer 7

To minimize the impact on concurrent operations, consider:

Breaking up the deletion process into smaller batches to reduce the duration of locks.

Choosing an appropriate time window for executing the deletion script, such as during off-peak hours.

Monitoring the performance and resource usage of your SQL Server instance during the deletion process to identify any issues.

Communicating with stakeholders about the planned maintenance activity and potential downtime.

Data Archiving

Cookies on SQLServerCentral