Every DBA knows how important it is to have appropriate monitoring and alerting on production databases. Knowing the instant something is wrong, or about to go wrong, is very important. One of the things that is monitored is the growth of transaction log files (for both user and tempdb databases) - if auto-grow is permitted on the database log file. If the log file should grow and continue to grow unchecked, it could consume the entire drive and then halt any further transactions.
So, what happens when I get an SMS alert or email message in the middle of the night (since that is typically when a log growth event might happen)? I must find out why the log grew. I must also document that the log grew and the time that it happened so that I can correlate the events and isolate a pattern. It is necessary to know the timeframe of the events to be able to better trouble-shoot and prevent this type of activity in the future.
Two Methods...
There are two methods that I use in finding why my log grew (unexpectedly). The first method is provided from Microsoft through the use of DBCC commands. The second method requires some setup prior to the event (this is where the trending and forecasting also helps). This second method is one that I created to help with some auditing needs, which was then repurposed to help in troubleshooting these types of scenarios.
Immediately when notified of a log growth event, one needs to run the following command:
DBCC OpenTran(mydb)
Running this command will provide information for an open transaction in the affected database, should a transaction exist. A critical piece of information that this command returns is the SPID. This SPID can then be used in the following command to help determine what might be causing the log growth.
DBCC InputBuffer(SPID)
To demonstrate this, I have created a scenario that will leave a transaction open and not committed. In order to test this scenario I created an environment that would help me to demonstrate how to test for the open transaction. To be specific to the topic of the article, I created a new database using the default file size settings of my model database. The database is created small so I could force a log and data file growth to occur. To be able to see that the log and data file have grown, I also needed to create a logging table and a procedure that will insert into that table. So, here are the scripts to setup those preliminary environment requirements.
Database
Use master
Go Create Database TestLogGrowth
Go
I have created the database in this manner accepting the default values already established in the model database. Some settings of note that are pertinent to the success of testing the methods outlined in this article are: 1) Filegrowth (1MB for Data and 10% for Log), 2) Size (2MB for Data and 1MB for Log), 3) AutoGrowth (On), and 4) Recovery Model (Full).
LogTable
Use TestLogGrowth
Go Create Table dbo.DBFileSizeLog (
LogID int identity primary key, DBName Varchar(25),DBSizeMB decimal(15,2),UnallocatedSizeMB decimal(15,2)
,DataSizeMB decimal(15,2),LogSizeMB decimal(15,2),CheckDate datetime
)
Go
Procedure to Log database size information
Create procedure dbo.SpaceUsed_perfmon as
Set nocount on
Begin
Declare @instanceName Varchar(25) Set @instanceName = 'TestLogGrowth'
;
WITH LogData (database_name, measure, value) AS
(
SELECT instance_name,
CASE counter_name
WHEN N'Data File(s) Size (KB)' THEN N'DSize'
WHEN N'Log File(s) Size (KB)' THEN N'LSize'
WHEN N'Log File(s) Used Size (KB)' THEN N'Used'
WHEN N'Log Growths' THEN N'Grow'
END,
cntr_value
FROM sys.dm_os_performance_counters PC
WHERE counter_name IN (N'Data File(s) Size (KB)', N'Log File(s) Size (KB)'
, N'Log File(s) Used Size (KB)', N'Log Growths')
AND instance_name = @InstanceName
),
LogUsage AS
(
SELECT database_name,
DataSizeMB = CONVERT(DEC(9,2), SUM(CASE WHEN measure = N'DSize' THEN value ELSE 0 END) / 1024.0),
LogSizeMB = CONVERT(DEC(9,2), SUM(CASE WHEN measure = N'LSize' THEN value ELSE 0 END) / 1024.0),
LogUsedMB = CONVERT(DEC(9,2), SUM(CASE WHEN measure = N'Used' THEN value ELSE 0 END) / 1024.0),
LogGrowths = CONVERT(DEC(9,2), SUM(CASE WHEN measure = N'Grow' THEN value ELSE 0 END))
FROM LogData
GROUP BY database_name
)
Insert into DBFileSizeLog (DBName,DBSizeMB,DataSizeMB,LogSizeMB,LogUsedMB,LogUsedPercent,LogGrowths,CheckDate)
SELECT database_name,DataSizeMB+LogSizeMB,DataSizeMB,LogSizeMB,LogUsedMB
,[LogUsedPercent] = CONVERT(DEC(9,2), ((LogUsedMB * 100.0) / LogSizeMB))
,LogGrowths, convert(varchar(19),GETDATE(),120)
FROM LogUsage
ORDER BY database_name ASC;
End
This procedure queries performance monitor counters stored in sys.dm_os_performance_counters in SQL Server 2005 and 2008. I am using just the counters I feel are necessary to monitor and then insert into the log table. Using this method allows me repeatedly run the stored procedure during the open transaction without needing to alter other configuration settings.
With those pieces available, I now have a choice to setup an SQL Agent Job or to invoke the procedure manually before and after each of the statements (which will leave a transaction in an open state) that I will execute. To test the first method, I will invoke the procedure manually. To test the second method, I will create an SQL Agent Job that will call the procedure every 10 seconds (for the purposes of my testing only - the time value can be adjusted to suit the needs of the environment).
All I have left to do now is create a table that will host my test data for these tests. That table is rather simple and is populated as follows:
SELECT TOP (1000000)
LogID = IDENTITY(INT,1,1),
SomeDate = getdate(),
LogUID12 = RIGHT(NEWID(),12),
LogUID = NewID()
INTO dbo.GrowThatLog
FROM Master.dbo.SysColumns t1,
Master.dbo.SysColumns t2
In my TestLogGrowth database, this table caused the log and data file to grow as displayed in the below image.
To populate the table and demonstrate the first method for determining what is causing a log growth, I transaction wrapped the query and did not commit it. Now when I execute
DBCC OpenTran(TestLogGrowth)
I will receive output like the following:
Armed with the information highlighted in blue, I can now execute the next query DBCC InputBuffer(52) This query will produce the following information (if the open transaction query is still executing).
If the transaction has terminated before the execution of either one of these steps, then the opportunity to find the open transaction causing the log growth using these queries has been lost. For DBCC OpenTran you would get a message stating "No active open transactions." For DBCC InputBuffer, you might even receive information a different transaction (if the spid in question has been reused by another session) or you would receive "Invalid SPID XX specified."
Let us now turn to the second method. As alluded to earlier, I have some tasks left to perform to have this ready to test. First, I need to reset the database from the previous testing. Second, I need to create an SQL Agent Job that will run every 10 seconds (again, the short interval is just for testing purposes and to ensure I capture the data I need). I need one more procedure and one more table as well. (These log tables and procedures would typically be in an admin database. I have included them in the testing database for ease of demonstration.) The final step will be to create an additional SQL Agent Job that will run on an interval to capture query information. There will be two logs in this scenario. One log is for performance monitor counters (already created for the first method), and then there is a log of "Currently Executing SQL."
ExecutingSQL Table
CREATE TABLE [dbo].[AuditTSQL](
[TSqlID] [int] IDENTITY(1,1) NOT NULL,
[LoginName] [varchar](30) NULL,
[Spid] [int] NULL,
[DBName] [varchar](25) NULL,
[ParentQuery] [varchar](max) NULL,
[Wait] [varchar](50) NULL,
[Program] [varchar](75) NULL,
[Hostname] [varchar](25) NULL,
[StartTime] [datetime] NULL,
[InsertedTime] [datetime] NULL,
CONSTRAINT [PK_AuditTSQL] PRIMARY KEY CLUSTERED
(
[TSqlID] ASC
))
ExecutingSQL Proc
Create Procedure RunningSQLStatements
as
Begin
Insert Into AuditTSQL (LoginName,SPID,DBName,ParentQuery,Wait,Program,HostName,StartTime,InsertedTime)
SELECT sp.loginame as [UserName],sp.spid as SPID
, DB_NAME(sp.dbid)
,qt.text
,wait_type
,sp.program_name
,Hostname
,start_time
,convert(varchar(19),GETDATE(),120)
FROM sys.dm_exec_requests er
INNER JOIN sys.sysprocesses sp
ON er.session_id = sp.spid
Inner Join sys.dm_exec_sessions es
On er.session_id = es.session_id
CROSS APPLY sys.dm_exec_sql_text(er.sql_handle) as qt
WHERE es.is_user_process = 1 -- Ignore system spids.
AND er.session_Id NOT IN (@@SPID) -- Ignore this current statement.
ORDER BY sp.dbid, sp.loginame
End
Agent Job
USE [msdb]
GO /****** ******/
BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0
/****** ******/
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback END DECLARE @jobId BINARY(16)
EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'Monitor Log Growth and Executing SQL',
@enabled=1,
@notify_level_eventlog=0,
@notify_level_email=0,
@notify_level_netsend=0,
@notify_level_page=0,
@delete_level=0,
@description=N'No description available.',
@category_name=N'[Uncategorized (Local)]',
@owner_login_name=N'sa', @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** ******/
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'Monitor Log Growth',
@step_id=1,
@cmdexec_success_code=0,
@on_success_action=3,
@on_success_step_id=0,
@on_fail_action=2,
@on_fail_step_id=0,
@retry_attempts=0,
@retry_interval=0,
@os_run_priority=0, @subsystem=N'TSQL',
@command=N'Exec SpaceUsed_perfmon
',
@database_name=N'TestLogGrowth',
@flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** ******/
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'Monitor Executing TSQL',
@step_id=2,
@cmdexec_success_code=0,
@on_success_action=1,
@on_success_step_id=0,
@on_fail_action=2,
@on_fail_step_id=0,
@retry_attempts=0,
@retry_interval=0,
@os_run_priority=0, @subsystem=N'TSQL',
@command=N'exec RunningSQLStatements
',
@database_name=N'TestLogGrowth',
@flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Every 10 Seconds',
@enabled=1,
@freq_type=4,
@freq_interval=1,
@freq_subday_type=2,
@freq_subday_interval=10,
@freq_relative_interval=0,
@freq_recurrence_factor=0,
@active_start_date=20100127,
@active_end_date=99991231,
@active_start_time=163800,
@active_end_time=165959,
@schedule_uid=N'67de68dd-fe6b-40b0-9b3e-ab89bd610174'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave: GO
Now to run some tests again. I will use the same query to populate a table as previously, which I know will force a log growth. Then I can examine my log tables for entries that appear to correlate according to time stamps. In this first trial with this method, I am explicitly wrapping the query in a transaction and waiting to commit the transaction. Since both log procs are in the same job and run at nearly the same time (down to the second in my case), I use the following query to find the correlation.
Select a.DBName,a.ParentQuery as RunningSQL,a.InsertedTime as SQLInsertedTime,d.CheckDate as LogGrowTime
,d.DBSizeMB,d.DataSizeMB,d.LogSizeMB,d.LogUsedMB,d.LogGrowths from audittsql a Inner Join DBFileSizeLog D On a.insertedtime = d.CheckDate
This will yield results that can help me pinpoint the query that caused the log to grow. See the below image for a sample of the output.
If the offending query is not in a transaction, this method will still be able to capture the query. This can be displayed by the following image.
Conclusion
I have now demonstrated two methods to find out what has caused the transaction logs to grow. This very same principle is easily applied to tempdb. This process can also be enhanced to send email alerts when a log growth occurs, and there is opportunity to fine tune this process specific to the environment. The second method I shared was a method I implemented in a 1.8 million user environment attached to a logon trigger. The side effects of this method were minimal and unnoticed by end-users. The scripts used in this article are attached.
These are just two of the available methods to troubleshoot growing logs. As of the writing of this article, I began exploring another method. That method uses Trace Events and will soon be found on my blog.