Building, maintaining, and decommissioning a replication topology of any size or complexity in a production environment can exhaust even the most seasoned DBA. Related issues can quite easily turn a morning task into a full-day recovery project. Risk may be mitigated with established processes and procedural workflows; although, not all DBAs have the resources available to encapsulate each project with a thoroughly tested process. Deadlines may translate to ad-hoc demands that flow into a sensitive production environment as untested tasks, or a general lack of knowledge of SQL Server’s replication technology can result in missed steps and unexplainable behavior. A ballooning log file, for example, may not initially be associated with recently made replication changes. Time is crucial. The man-hours spent uncovering the origins of this issue can overwhelm the simplicity of the resolution. There is no shortage of online articles regarding applicable real-life scenarios when dealing with SQL Server replication. Of course, this does not protect against unexpected results that may place your infrastructure at risk.
Our Scenario
Company A just updated their application infrastructure, which now manages the distribution of Human Resource data to its reporting databases. In the past, a transactional publication synchronized data between the company’s Kronos database with its reporting databases. The upgrade has rendered the publication obsolete. The decommissioning of the publication was approved by the HRIS manager and the company’s lone DBA is tasked to remove it by the end of the day. There only exists one publication with one subscription, and ( to make it simple ) no DR options have been configured.
Prepare Replication Test Environment
Each code snippet is written in SQLCMD script. Please feel free to alter the values of the variables and properties below to match your testing environment. The following scripts assume a configured distributor exists and the SQL Server has been assigned to that distributor.
First, enable replication
:SetVar varMainPubDatabaseName "Som_Database" USE [master] GO -- Enable Transactional and Merge Replication exec sp_replicationdboption @dbname = N'$(varMainPubDatabaseName)' , @optname = N'publish' , @value = N'true' GO exec sp_replicationdboption @dbname = N'$(varMainPubDatabaseName)' , @optname = N'merge publish' , @value = N'true' GO -- Create Distribution Agents exec $(varMainPubDatabaseName).sys.sp_addlogreader_agent @job_login = N'AAA\DomainUser' , @job_password = '*password*' , @publisher_security_mode = 1 GO exec $(varMainPubDatabaseName).sys.sp_addqreader_agent @job_login = N'AAA\DomainUser' , @job_password = '*password*' , @frompublisher = 1 GO
Next create a new publication
:SetVar varMainPubDatabaseName "Som_Database" :SetVar varMainPublicationName "Trans_Pub_Test" USE $(varMainPubDatabaseName) GO EXEC sp_addpublication @publication = N'$(varMainPublicationName)' , @allow_push = N'true' , @allow_pull = N'true' , @independent_agent = N'true' GO EXEC sp_addpublication_snapshot @publication = N'$(varMainPublicationName)' , @job_login = N'AAA\DomainUser' , @job_password = '*password*' , @publisher_security_mode = 1 exec sp_addarticle @publication = N'$(varMainPublicationName)' , @article = N'SomeTables' , @source_owner = N'dbo' , @source_object = N'SomeTables' , @type = N'logbased' , @schema_option = 0x0000000008035CDF , @destination_table = N'SomeTables' , @destination_owner = N'dbo' , @vertical_partition = N'false' GO
Now add a new push subscription to the publication.
:SetVar varMainPubDatabaseName "Som_Database" :SetVar varMainPublicationName "Trans_Pub_Test" :SetVar varSyncServerName "DestinationServer" :SetVar varSyncDatabaseName "SubscriberDb" USE $(varMainPubDatabaseName) EXEC sp_addsubscription @publication = N'$(varMainPublicationName)' , @subscriber = N'$(varSyncServerName)' , @destination_db = N'$(varSyncDatabaseName)' , @subscription_type = N'Push' EXEC sp_addpushsubscription_agent @publication = N'$(varMainPublicationName)' , @subscriber = N'$(varSyncServerName)' , @subscriber_db = N'$(varSyncDatabaseName)' , @job_login = N'AAA\DomainUser' , @job_password = '*password*' , @subscriber_security_mode = 1 GO
We can start the snapshot agent to generate the snapshot and build the subscription database.
Task : Decommission Publication
Now that we have our test environment built for the purpose of the scenario above, the following code snippets will help break down the environment. Please ensure that the values used below match those that were used to build the environment. Also note that this task is completed by our DBA in an active environment during production hours. Transactional volume can vary from environment to environment.
The DBA removes the subscription from the publication :
:SetVar varMainPubDatabaseName "Som_Database" :SetVar varMainPublicationName "Trans_Pub_Test" :SetVar varSyncServerName "DestinationServer" :SetVar varSyncDatabaseName "SubscriberDb" USE $(varMainPubDatabaseName) GO EXEC sp_dropsubscription @publication = N'$(varMainPublicationName)' , @subscriber = N'$(varSyncServerName)' , @destination_db = N'$(varSyncDatabaseName)' , @article = N'all' GO
Now the DBA drops the publication :
:SetVar varMainPubDatabaseName "Som_Database" :SetVar varMainPublicationName "Trans_Pub_Test" -- Drop Article USE $(varMainPubDatabaseName) GO EXEC sp_dropsubscription @publication = N'$(varMainPublicationName)' , @article = N'SomeTables' , @subscriber = N'all' , @destination_db = N'all' GO EXEC sp_droparticle @publication = N'$(varMainPublicationName)' , @article = N'SomeTables' , @force_invalidate_snapshot = 1 GO -- Drop Publication EXEC sp_droppublication @publication = N'$(varMainPublicationName)' GO
Our DBA checks Replication Monitor to verify that the publication has been removed from the Kronos database. The absence of the publication is misinterpreted as the completion of the task assigned. Two days later an e-mail appears stating that the Kronos server has a drive nearing capacity and that the issue must be addressed immediately. After remotely logging into the server, the DBA finds that the drive has less than 100Mb remaining and that the drive’s storage is largely consumed by the transaction log file.
Troubleshooting the Problem
The steps a DBA will take troubleshooting this issue will vary depending on his or her experience with SQL Server. And there is no wrong way to arrive at a singular conclusion. There are, however, efficient methods of collecting enough information to determine that SQL Server is not truncating the log file. The Disk Usage report under the collection of standard reports included in SQL Server Management Studio will help identify how much of the physical log file is consumed and reserved. In this case, the entire log file is being consumed and the physical file ought to be increasing – should autogrowth be enabled.
Since our database is configured for a Full Recovery model, the next step is to ensure that our backup operations have not been interrupted. A backup history will show that a transaction log backup had completed recently. We may take the additional time to run another transaction log backup to ensure that the log growth had not occurred since the last backup. A warning will appear in the message window following the execution of the BACKUP LOG query :
“The log was not truncated because records at the beginning of the log are pending replication or Change Data Capture. Ensure the Log Reader Agent or capture job is running or use sp_repldone to mark transactions as distributed or captured.”
Running a transaction log backup will reveal the cause of the failure, and it is entirely reasonable for a DBA to take this additional step when troubleshooting the issue. If a correlation with the recent replication change can be made early, a simple result set from the sys.databases
view will save time :
USE [Som_Database] GO SELECT name [Database] , state_desc [State] , recovery_model_desc [Recover_Model] , is_published [Is_Published] , is_merge_published [Is_Merge_Published] , log_reuse_wait_desc [Log_Waiting_On] FROM sys.databases WHERE database_id = DB_ID()
Results :
Database | State | Rocovery_Model | Is_Published | Is_Merge_Published | Log_Waiting_On |
---|---|---|---|---|---|
Som_Database | ONLINE | FULL | 1 | 1 | REPLICATION |
We have the name of our database, current state, and the configured recovery model of Full. But wait, the database is still being published. Furthermore, the log file has pending transactions awaiting replication. Now that we know what is preventing the truncation of the log file, we can move towards resolving the issue.
Before we get to the fix, let’s say that we choose to switch the database to a Simple Recovery model rather than taking the time to run a transaction log backup. Switching recovery models does not help in this situation. The transaction log will continue to grow. The DBCC SHRINKFILE command will fail to shrink the physical file and running a new Disk Usage report will show that the log is still full. At this point, it would be best to get a full picture as to how the database is configured :
EXEC sys.sp_helpdb 'Som_Database'
Results :
name | db_size | owner | dbid | created | status | compatibility_level |
---|---|---|---|---|---|---|
Som_Database | 118175.01 MB | sa | 10 | Dec 10 2001 | Status=ONLINE, Updateability=READ_WRITE, UserAccess=MULTI_USER, Recovery=SIMPLE, Version=655, Collation=SQL_Latin1_General_CP1_CI_AS, SQLSortOrder=52, IsAutoCreateStatistics, IsAutoUpdateStatistics, IsMergePublished, IsPublished | 10 |
“Status=ONLINE, Updateability=READ_WRITE, UserAccess=MULTI_USER, Recovery=SIMPLE, Version=655, Collation=SQL_Latin1_General_CP1_CI_AS, SQLSortOrder=52, IsAutoCreateStatistics, IsAutoUpdateStatistics, IsMergePublished, IsPublished”
Executing sp_helpdb provides a condensed view of the database options enabled under the status field. Highlighted above are two options that ought to raise a red flag when troubleshooting a ballooning log file. The next step is to run the prior query against sys.databases
to gather a more granular view of our database configuration.
Why Does Replication Prevent Log Truncation?
To maintain synchronicity between publisher and subscriber, transactions must be marked by the Distributer’s log reader agent as having been replicated. Only then will committed transactions be truncated from the log file. SQL Server prevents truncation after internal ( or manual ) CHECKPOINTS, which may be called by backup operations in a Full Recovery model or committed transactions in a Simple Recovery model. After all, you wouldn’t want to lose a batch of transactions because your scheduled transaction log backup just happened to complete prior to the distribution agent’s run. The consequences would be unsavory.
The Fix
Removing replication from the database will permit the truncation of the log file following a backup operation in a Full Recovery model or a committed transaction in a Simple Recovery model. Execute the following system procedure to remove replication from the database :
EXEC sys.sp_removedbreplication @dbname = ‘Som_Database’ GO
The physical transaction log file will still remain the same size on disk, as SQL Server will keep the storage reserved. To reduce the physical size of the file, run the following Database Console Command statement with the logical name of the log file :
DBCC SHRINKFILE([Som_Database_log], 1024) GO
In this scenario, the database had one publication with a single subscription. The DBA was tasked with decommissioning the publication. All but the final step in removing the database from the replication environment had been followed. Unfortunately, it is often a seemingly minor step that results in extensive troubleshooting and downtime.
The Real Fix
Avoid unexpected issues by taking advantage of the plethora of tools available to SQL Server Database Administrators. A good DBA will know what to do to effectively identify and resolve an issue that crops up. A great DBA will have the tools in place that will help proactively identify issues before they affect the core infrastructure.
For this particular issue, SQL Server alerts are available to identify full transaction logs and excessive growth. Assuming database mail has been configured for your environment, look at creating an alert for Log Usage under the SQL Server performance condition alert type. Should you choose not to set an alert for a Log Usage condition, then certainly consider an alert for error 9002 :
“The transaction log for database is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases”.
Once the necessary alerts are in place, you can manage them in a number of ways. A simple, straightforward method is to assign a database operator these alerts for ease of communication. Windows Server offers the ability to attach a Windows Task to an event under Event Viewer or through Windows Task manager. Then there are your third-party applications, such as SQL Sentry and Red Gate, that will not only offer notification management but alert configuration via a user-friendly UI.
None of these suggestions replace proper file growth monitoring and analysis. Keep track of physical file growth. Utilize management data warehousing and extended events. Data collection and built-in reporting will help better plan for resource utilization and infrastructure growth.
And finally, build a process around every mid- to high-impact change. Work out the steps needed to complete the task you have been assigned; document the steps accordingly, asking questions along to the way; and test the process in a non-production environment. Proper planning will help define roadmaps, leading to a well-managed and stable infrastructure. Remember to deploy the process, not the task.