April 25, 2022 at 12:25 pm
Our company has a database containing a single table that is used to log actions in our Mailer process. This database can grow larger than I would prefer in a short period of time. I am constantly having to shrink it and the other tables used in the Mailer process. The logging table currently has 156,787,290 records in it and the database has grown to 30GB with the log file at 2GB.
I am perplexed by how large the transaction log grows if I attempt to delete most of the records in the table. I attempted to delete 151,757,051 of the records from the table yesterday, on a dev server, to determine how long it would take and how large the database transaction log file would grow. I let the delete command run for 45 minutes, but I had to cancel it due to time constraints and because the log file grew to 84GB.
Why would the log file grow to almost three times the size of the original database? I also noticed that most of the records in the table did NOT get deleted.
I realize that this is not the best way to clear this table. I would normally stop the Mailer process, copy one month of data to another table in another database, truncate the original table and then restore the copied records. This process gets more tedious and difficult when I have to delete the records from the multiple databases used in the entire Mailer process. Those databases contain numerous tables and get very large if I do not delete the records on a monthly basis.
Thank you in advance,
Brian
Part-Time DBA
April 25, 2022 at 1:31 pm
Log can grow larger than data ( it also needs to log index activity) for making rollback possible.
Delete in small batches to minimize duration and locking. (it may be locked out by other transactions)
Perhaps you can copy the keys you want to delete to a temporary table and join from there (instead of a complex CTE)
Is there an index that helps you with deletes ? (so you don't need to scan the whole log to find which items to delete)
It is easier to help when there is sample/DDL information
April 25, 2022 at 2:30 pm
You can make a loop to delete X number of records, and include a transaction log back up in each loop.
You will need disk space for the backups, but your LDF should not get huge.
If you do this monthly, maybe schedule it to run during quiet activity time
NextDelete:
waitfor delay '00:00:03' --- Dont hog the server
BACK UP Trans Log
Delete top(100000)
from My_Table
where ArchiveDate < getdate() -30
if @@rowcount > 0 goto NextDelete
April 25, 2022 at 2:42 pm
homebrew01,
That is an interesting suggestion that I may consider for deleting both the log records and the process records. I would like to use cascading deletes to delete the process records, but record locking has been an issue in the past. Your method should help eliminate record locking which would allow the Mailer processes to keep running during the delete process.
Thank you very much!
April 25, 2022 at 2:49 pm
Jo,
Below is the info you requested:
-- DDL for Log Table:
CREATE TABLE [dbo].[tbNewMatchBatchLog](
[NewMatchBatchLogID] [bigint] IDENTITY(1,1) NOT NULL,
[BatchType] [char](1) NOT NULL,
[NewMatchBatchID] [int] NULL,
[NewMatchBatchRunTimeID] [int] NULL,
[AgentSavedCriteriaID] [int] NULL,
[ProcedureName] [varchar](100) NOT NULL,
[Action] [varchar](100) NOT NULL,
[StartDateTime] [datetime2](7) NOT NULL,
[EndDateTime] [datetime2](7) NULL,
[Records] [int] NULL,
CONSTRAINT [PK_tbNewMatchBatchLog] PRIMARY KEY CLUSTERED
(
[NewMatchBatchLogID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
-- SQL to Delete every record except last 30 days
select
top 1 NewMatchBatchLogID
from NewMatchesLog.dbo.tbNewMatchBatchLog
where
StartDateTime < dateadd(Day, -30, getdate())
order by NewMatchBatchLogID desc
delete
from NewMatchesLog.dbo.tbNewMatchBatchLog
where
NewMatchBatchLogID <= 686438997
April 25, 2022 at 2:55 pm
For best performance, make sure you have the log table properly clustered. Specifically, if you intend to delete by log time, as is quite typical, then cluster the table that way. If an identity is available on the table, add that to the clus key to make it unique.
CREATE <LOG> TABLE ( log_date datetime2 NOT NULL, id int IDENTITY(1, 1) NOT NULL, ... )
CREATE CLUSTERED INDEX ... ON ... ( log_date, id ) WITH ...
Do the deletes in batches/loops as shown above, but based on the MIN(log_date) remaining in the table going forward. That is, delete the rows in clus key order. This will ensure a very fast start to each delete.
Also, be sure to do a CHECKPOINT as part of your process. The log can't be trimmed until a CHECKPOINT occurs, even if it's already been backed up.
So, the sequence would be:
DELETE TOP (nnnnnn) rows
CHECKPOINT
WAITFOR
BACKUP log
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply