February 2, 2024 at 6:30 pm
Hi All
Our company is planning to archive historical data from production database which has 15+ years of data and keep only last 2 years of data. Here are the preliminary steps:
1.Backup production database and save somewhere safe.
2.Delete old data from production database and keep only last 2 years of data.
I have created a script for deleting old data. To make it simple let me show my script for just tables .
We have 2014 SQL Server Ent Edition and database is part of AOAG .
Please see attached Arch.sql
We have 24/7 operations .My questions are:
1.Is this a good way to delete old data (my script)?
2.Since i am using date range for Trip_log and Trip_Id for Trip what kind of lock will sql server put on these tables?
Thank You advance!
February 2, 2024 at 6:31 pm
Uploading failed.
Here is my script
CREATE TABLE [TRIP](
[Trip_Id] [int] NOT NULL,
[Trip_Date] [datetime] NOT NULL,
CONSTRAINT [PK_TRIP] PRIMARY KEY NONCLUSTERED ([Trip_Id] ASC) ON [PRIMARY]
) ON [PRIMARY]
go
CREATE TABLE [TRIP_LOG](
[Log_ID] [int] NOT NULL,
[Log_Time] [datetime] NOT NULL
) ON [PRIMARY]
go
declare @Datefrom date=(select min(Trip_Date)from TRIP with (nolock));
declare @DateTo date=dateadd(day, 30, @Datefrom);
declare @T table(TRIP_ID int not null);
declare @print varchar(50)
insert into @T select Trip_Id from TRIP with (nolock) where Trip_Date between @Datefrom and @DateTo;
while @Datefrom < @DateTo
begin
delete from TRIP_LOG where Log_Time between @Datefrom and @DateTo;
delete from TRIP where Trip_Id in (select TRIP_ID from @T);
set @Datefrom =dateadd(day, 1, @Datefrom)
if dateadd(day, 1, @Datefrom) > @DateTo
break
else
raiserror ( @print, 0, 1) with nowait
end
February 2, 2024 at 6:38 pm
I would do this the other way around. That is, rather then deleting 15 years' worth of data, copy the current 2 year's worth of data to another db. Then, delete the 2 years' worth from the original db. That db will then hold only the archive data.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
February 2, 2024 at 6:48 pm
As i mentioned above,we are 24/7 operations.
By deleting (in batches) old data will not cause any downtime. There are over 1 million daily inserts/updates just in these 2 tables( out of 10).
February 4, 2024 at 12:42 am
Do any of the tables that you want to delete from have IDENTITY columns in them? I don't see IDENTITY anywhere in your CREATE TABLE scripts but just need to be absolutely sure.
Also... you're only showing two columns in each table... While that may be good enough for an analysis for doing DELETES, deletes are the farthest thing from my mind on this and I need to know all the columns in the two tables.
Also, how many rows are in each of the tables to be deleted from and how big in GB are the Clustered Indexes on those tables.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 4, 2024 at 8:42 pm
There are no IDENTITY columns.
Rowes to be deleted From Trip_Log :1.2 Billion
Trip_Log Clustered index size:150 GB
Trip_Log is stand alone table,no FK
Rowes to be deleted From Trip:30Million
Trip Clustered index size:15 GB
Trip table has over 30 columns + about 10 tables that depend on Trip.But for delete ,Trip_Id and Trip_Date are the columns we want/need to use
February 5, 2024 at 3:12 am
Then, this is fairly easy even in a 24/7 environment (although you didn't list the CREATE TABLEs in full so not 100% sure)
To be sure, the first thing that I'd likely do is to turn off AOAG but that's just me. I don't use it. I use Clustered Instances.
In the following, there's obviously a little more "assumed knowledge detail" in each step that I've not laid out. This is the general "gist" of things I've done in the past with great success. Once example of this is really knowing how to get the "Minimal Logging" including the mandatory use of WITH (TABLOCK), the correct ORDER BY and using OPTION (MAXDOP X) where "X" is no greater than half the number of core in your box to a max of 8. Using more than 8 for this has some seriously diminishing returns that will slow down your 24/7 activities.
Done correctly, total down time should be measured in milli-seconds.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 6, 2024 at 5:20 pm
To minimize the impact on concurrent operations, consider:
Breaking up the deletion process into smaller batches to reduce the duration of locks.
Choosing an appropriate time window for executing the deletion script, such as during off-peak hours.
Monitoring the performance and resource usage of your SQL Server instance during the deletion process to identify any issues.
Communicating with stakeholders about the planned maintenance activity and potential downtime.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply