Problem
Your production SQL Server transactional replication just failed on your largest article and the business impact is critical. How do you get replication restored quickly without the time consuming reinitializing snapshot? In this article we look at how you can quickly restore SQL Server transactional replication.
Solution
Reliable replication is a critical component of many SQL Server shops, so DBAs need a tool to help them quickly restore transactional replication. Many DBAs have found replication failures to be complex and frustrating taking hours or days to resolve. There are several methods available to restore your system quickly, but my favorite technique is to rebuild the publication/subscription from a proven template that requires no time consuming snapshot. You can accomplish this at your shop using five easy steps.
Background
In this example the publisher is named Pubr. Publications are Pub1 and Pub2. Subscribers are Sub1 and Sub2 as shown in figure 1. Pub1 has three articles. Pub2 has a hundred articles. Most articles have modest row counts but one contains 300 million rows. Pub1 is still running but Pub2 with its 300 million row article has failed.
PubrPub1 ------> Sub1 Status=active
Article 1
Article 2
Article 3
Pub2 ------> Sub2 Status=
down
, replication errors queuingArticle 1 300 million rows
Article 2 10 thousand rows
Article 3 100 thousand rows figure 1
In this case articles 2 and 3 are small enough to reinitialize but not the 300 million row article 1. In my shop it would take many hours to reinitialize article 1 and during that time the applications are down. Figure 2 shows that shortly after the failure of article 1 only a very small percentage of the total rows are not replicated, so why reinitialize the entire 300 M row article? What we need is to modify the replication definition on article 1 to avoid reinitialize and then clear out the invalid commands.
PubrPub2 Rows Not Replicated
Article 1 300 M Rows 100 K Rows
Article 2 10 K Rows 0 K Rows
Article 3 100 K Rows 1 K Rows
figure 2
Five Easy Steps
There are five easy steps that you need to follow.
First
Script out your existing publication. The easiest method is via Management Studio. Expand Replication, then expand Local Publications and right click Pub1. Click Generate Script. At the bottom of the pop-up click the drop down and select Open in New Query Window.
In this case we are only interested in sp_addaraticle for ART1 and the sp_addsubsciption. Here are the scripts:
-- Adding the transactional articles use [ReplTest]
exec sp_addarticle
@publication = N'Pub2',
@article = N'Art1',
@source_owner = N'dbo',
@source_object = N'Art1',
@type = N'logbased',
@description = N'',
@creation_script = N'',
@pre_creation_cmd = N'drop',
@schema_option = 0x00000000080350DF,
@identityrangemanagementoption = N'manual',
@destination_table = N'Art1',
@destination_owner = N'dbo',
@status = 24,
@vertical_partition = N'false',
@ins_cmd = N'CALL [dbo].[sp_MSins_dboArt1]',
@del_cmd = N'CALL [dbo].[sp_MSdel_dboArt1]',
@upd_cmd = N'CALL [dbo].[sp_MSupd_dboArt1]'
Notice pre_creation_cmd = ‘drop’ which means truncate the subscriber table something we do not want on large tables. .
exec sp_addsubscription
@publication = N'Pub2',
@subscriber = N'SQLREMOTE',
@destination_db = N'ReplTest',
@subscription_type = N'Push',
@sync_type = N'automatic
',
@article = N'all',
@update_mode = N'read only',
@subscriber_type = 0
Notice sync_type='automatic' which means schema and data must be transferred to the subscriber before replication starts, something we do not want on large tables. Ensure this script is saved.
Second
Modify the add article script for Art1 for quick starting by setting the @pre_creation_cmd=’none’
. This avoids the snapshot for this article and is the first key to bringing replication online quickly. Use the sp_addarticle pre_creation_cmd to control reinitialization at the article level for any article. Avoid using sp_changearticle as this only bypasses the snapshot but does not clear out the bad replication commands.
use [ReplTest]
exec sp_addarticle
@publication = N'Pub1',
@article = N'Art1',
@source_owner = N'dbo',
@source_object = N'Repl1',
@type = N'logbased',
@description = N'',
@creation_script = N'',
@pre_creation_cmd = N'none',
@schema_option = 0x000000000803509F,
@identityrangemanagementoption = N'manual',
@destination_table = N'Repl1',
@destination_owner = N'dbo',
@status = 24,
@vertical_partition = N'false',
@ins_cmd = N'CALL [sp_MSins_dboRepl1]',
@del_cmd = N'CALL [sp_MSdel_dboRepl1]',
@upd_cmd = N'SCALL [sp_MSupd_dboRepl1]'
Next modify the add subscription for quick starting by setting the @sync_type=n'replication support only'. This allows replication to start immediately and is the second key to bringing replication online quickly. Expect the 'subscription not active' message if this parm is 'automatic'.
exec sp_addsubscription @publication = N'Pub2',
@subscriber = N'SQLREMOTE',
@destination_db = N'ReplTest',
@subscription_type = N'Push',
@sync_type = N'replication support only'
,
@article = N'all',
@update_mode = N'read only',
@subscriber_type = 0
Third
Now we need to drop the publication. The easiest method is via SSMS. Expand replication and then expand publication. Select Pub2, right click, and select delete.
This also cleans up the unprocessed commands for this publication that caused the initial problem and is the thrid key to bringing replication up quickly. This isn’t as risky as it seems since you have saved the entire create script in step 1 and recreating a publication via commands takes less than 1 minute.
Fourth
Synchronize the data. This step is recommended since we have removed the snapshot for some articles. This step can be delayed if your goal is a quick restoration of replication and you understand the impact of the missing data. If you are going to synchronize I have found that Redgate’s Data Compare is a great tool to synchronize the subscriber. Microsoft's BCP utility used with a where clause selecting only the missing rows is another option. In this case we only have to synchronize Art 1’s 100K missing rows. Data Compare typically completes this in minutes.
Fifth
We are almost done, we only needed to run the modified script from Step 2 in Management Studio, which will create a new publisher and subscriber. Since pub 1 is running, comment out the first three exec as they aren't needed.
-- Enabling quick restart of replication use master /* exec sp_replicationdboption @dbname = N'ReplTest', @optname = N'publish', @value = N'true' GO exec [ReplTest].sys.sp_addlogreader_agent @job_login = null, @job_password = null, @publisher_security_mode = 1 GO exec [ReplTest].sys.sp_addqreader_agent @job_login = null, @job_password = null, @frompublisher = 1 GO */
Acknowledgement:
Reviewers are critical to writers, thanks to Dale Gundersen @ Reply.com for his advice.