Introduction
When a table is filtered vertically in transactional replication and a new column is added to the article, a committed transaction updating a new column can arrive at the subscriber even if the column was not replicated when that transaction was committed. In this article, I will discuss the mechanism for this behavior and present one workaround using replication tracer tokens.
Testbed Setup
In order to test this scenario, I created two databases, ReplicationTestbed_Publisher and ReplicationTestbed_Subscriber. Each of these databases contains a single table, "Table1". The code is below in Listing 1.
use master go drop table ReplicationTestbed_Publisher.dbo.Table1 go drop table ReplicationTestbed_Subscriber.dbo.Table1 go -- USE ReplicationTestbed_Publisher create table ReplicationTestbed_Publisher.dbo.Table1 ( col1 int identity(1,1) Primary Key clustered , col2 int ) -- USE ReplicationTestbed_Subscriber create table ReplicationTestbed_Subscriber.dbo.Table1 ( col1 int Primary Key clustered , col2 int )
Listing 1. Setup of table structure for both Publisher and Subscriber in Replication Testbed
To set up replication, the new publication and new subscription wizards were used. Both columns were explicitly published via the sp_articlecolumn stored procedure. For the sake of this demonstration, the SQL Server Agent account was used for security, acknowledging that this is not the best practice. In addition, sp_changepublication was run to turn off the replicate_ddl properly. The code to set up replication is in Listing 2.
--Setup Replication of Table1, publishing table1, col1 and col2 use [ReplicationTestbed_Publisher] exec sp_replicationdboption @dbname = N'ReplicationTestbed_Publisher', @optname = N'publish', @value = N'true' GO -- Adding the transactional publication use [ReplicationTestbed_Publisher] exec sp_addpublication @publication = N'ReplicationTestbed_Repl', @description = N'Transactional publication of database ''ReplicationTestbed_Publisher'' from Publisher ''SERVER1''.', @sync_method = N'concurrent', @retention = 0, @allow_push = N'true', @allow_pull = N'true', @allow_anonymous = N'false', @enabled_for_internet = N'false', @snapshot_in_defaultfolder = N'true', @compress_snapshot = N'false', @ftp_port = 21, @allow_subscription_copy = N'false', @add_to_active_directory = N'false', @repl_freq = N'continuous', @status = N'active', @independent_agent = N'true', @immediate_sync = N'false', @allow_sync_tran = N'false', @allow_queued_tran = N'false', @allow_dts = N'false', @replicate_ddl = 1, @allow_initialize_from_backup = N'false', @enabled_for_p2p = N'false', @enabled_for_het_sub = N'false' GO use [ReplicationTestbed_Publisher] exec sp_addarticle @publication = N'ReplicationTestbed_Repl', @article = N'Table1', @source_owner = N'dbo', @source_object = N'Table1', @type = N'logbased', @description = N'', @creation_script = null, @pre_creation_cmd = N'none', @schema_option = 0x000000000803509F, @identityrangemanagementoption = N'manual', @destination_table = N'Table1', @destination_owner = N'dbo', @vertical_partition = N'true', @ins_cmd = N'CALL sp_MSins_dboTable1', @del_cmd = N'CALL sp_MSdel_dboTable1', @upd_cmd = N'SCALL sp_MSupd_dboTable1' -- Adding the article's partition column(s) exec sp_articlecolumn @publication = N'ReplicationTestbed_Repl', @article = N'Table1', @column = N'col1', @operation = N'add', @force_invalidate_snapshot = 1, @force_reinit_subscription = 1 exec sp_articlecolumn @publication = N'ReplicationTestbed_Repl', @article = N'Table1', @column = N'col2', @operation = N'add', @force_invalidate_snapshot = 1, @force_reinit_subscription = 1 -- Adding the article synchronization object exec sp_articleview @publication = N'ReplicationTestbed_Repl', @article = N'Table1', @view_name = N'SYNC_Table1_1__68', @filter_clause = null, @force_invalidate_snapshot = 1, @force_reinit_subscription = 1 GO --subscriber use [ReplicationTestbed_Publisher] exec sp_addsubscription @publication = N'ReplicationTestbed_Repl', @subscriber = N'SERVER1', @destination_db = N'ReplicationTestbed_Subscriber', @subscription_type = N'Push', @sync_type = N'replication support only', @article = N'all', @update_mode = N'read only', @subscriber_type = 0 exec sp_addpushsubscription_agent @publication = N'ReplicationTestbed_Repl', @subscriber = N'SERVER1', @subscriber_db = N'ReplicationTestbed_Subscriber', @job_login = null, @job_password = null, @subscriber_security_mode = 1, @frequency_type = 1, @frequency_interval = 0, @frequency_relative_interval = 0, @frequency_recurrence_factor = 0, @frequency_subday = 0, @frequency_subday_interval = 0, @active_start_time_of_day = 0, @active_end_time_of_day = 0, @active_start_date = 0, @active_end_date = 19950101, @enabled_for_syncmgr = N'False', @dts_package_location = N'Distributor' GO --Turn off DDL replication exec sp_changepublication @publication = N'ReplicationTestbed_Repl' , @property = 'replicate_ddl' , @value = '0' , @force_invalidate_snapshot = 0 , @force_reinit_subscription = 0
Listing 2. Setup of publisher and subscriber properties for Replication Testbed. Code is taken from Replication Setup Wizard.
Normal Operation
At this point, the MSrepl_commands table in the distribution database will show only the commands used to set up replication (Listing 3).
select * from distribution.[dbo].[MSrepl_commands] where publisher_database_id= ( select id from distribution.dbo.MSpublisher_databases where publisher_db='ReplicationTestbed_Publisher')
Listing 3. Distribution database contains 9 records, all related to setting up replication.
If a record is inserted into Table1 at the publisher, there is a corresponding update in the distribution database MSrepl_commands table. Listing 4 shows the record being inserted and the contents of the MSrepl_commands table.
insert into ReplicationTestbed_Publisher.dbo.Table1 (col2) values (5) go --distributor data select * from distribution_Acclaim.[dbo].[MSrepl_commands] where publisher_database_id= (select id from distribution_Acclaim.dbo.MSpublisher_databases where publisher_db='ReplicationTestbed_Publisher' )
Listing 4. A new row arrives in the MSrepl_commands table corresponding to the insert statement.
After the distribution agent is run, the record arrives at the subscriber, as expected. This is seen in Listing 5.
select * from ReplicationTestbed_Publisher.dbo.Table1 select * from ReplicationTestbed_Subscriber.dbo.Table1
Listing 5. Replication has move the new record to the subscriber, as expected.
If a non-replicated column is added to the publisher (Listing 6), there is no change in the number of records at the distributor and of course there is no update to the subscriber.
alter table ReplicationTestbed_Publisher.dbo.Table1 add col3 int --alter so that it only affects col3 (not replicated) update ReplicationTestbed_Publisher.dbo.Table1 set col3=10
Listing 6. Adding column col3 without adding to replication does not affect the MSrepl_Commands table.
Problem Demonstration
A potential problem is introduced when the published table is large, and the change is not brought to the subscriber via replication. Suppose Table1 is many millions of records in size and you wish to add Col4 to the table as NOT NULL. One method of accomplishing this (there are of course others, but this method demonstrates the replication issue under consideration) is to add Col4 as NULL, and then update it with some value before setting it to NOT NULL. The update to Col4 will take some time as the data pages are updated and the committed. Once the update is committed, the distribution log reader still has to read the transaction to determine what portion of the update needs to be applied to the subscriber. During the time that the log reader is reading the log, if the column is then added to the replication subscription, it is possible that the update, that was committed in the past, is still applied to the new column in the distributor.
In order to demonstrate this, instead of running a large update, we will turn off the log reader for a time as seen in Listing 7. This will simulate the latency while the log reader would read a long running transaction in a large transaction log.
--add a column for replication alter table ReplicationTestbed_Publisher.dbo.Table1 add col4 int --to simulate a long running transaction that is not consumed by the log reader, we can turn off the log reader execute msdb.dbo.sp_stop_job @job_name = 'SERVER1-ReplicationTestbed_Publisher-35' go update ReplicationTestbed_Publisher.dbo.Table1 set col4=15 select * from ReplicationTestbed_Publisher.dbo.Table1 select * from ReplicationTestbed_Subscriber.dbo.Table1
Listing 7. Adding Col4 to the publisher.
At this point, col4 exists and has been populated, but the distribution database is not aware of this transaction because we have turned off the log reader. This mimics the case when the log reader is processing a large update to the column. While this condition persists, we will add the column to replication. We will also add the column to the subscriber and update the procedure that is used for replication. This is shown in Listing 8.
Note that during this time, it is possible that updates could occur to the published table before the new column is replicated. This can be addressed depending on the strategy for creating the column at the subscriber, but that is beyond the scope of this article.
--add column to replication. In the deployment, this is the point where we felt we waited "long enough" and can continue. exec sp_articlecolumn @publication = N'ReplicationTestbed_Repl', @article = N'Table1', @column = N'col4', @operation = N'add', @force_invalidate_snapshot = 1, @force_reinit_subscription = 1 --no change to distribution data select * from distribution.[dbo].[MSrepl_commands] where publisher_database_id= (select id from distribution.dbo.MSpublisher_databases where publisher_db='ReplicationTestbed_Publisher')
--add column to staging alter table ReplicationTestbed_Subscriber.dbo.Table1 add col4 int --change distribution proc to match use ReplicationTestbed_Subscriber go ALTER procedure [dbo].[sp_MSupd_dboTable1] @c1 int = NULL, @c2 int = NULL, @c4 int = NULL, @pkc1 int = NULL, @bitmap binary(1) as begin update [dbo].[Table1] set [col2] = case substring(@bitmap,1,1) & 2 when 2 then @c2 else [col2] end, [col4] = case substring(@bitmap,1,1) & 4 when 4 then @c4 else [col4] end where [col1] = @pkc1 if @@rowcount = 0 if @@microsoftversion>0x07320000 exec sp_MSreplraiserror 20598 end
Listing 8. Adding the column to replication and updating the subscriber to match the new schema
At this point, we can simulate that the log reader has finished reading the large transaction by restarting the log reader. After running the distribution agent, the record that was added before col4 was enabled for replication is present at the subscriber (Listing 9).
--turn on log reader. Simulating that the log reader has now consumed the large update execute msdb.dbo.sp_start_job @job_name = 'SERVER1-ReplicationTestbed_Publisher-37' go --change to distribution data! select * from distribution_Acclaim.[dbo].[MSrepl_commands] where publisher_database_id= (select id from distribution_Acclaim.dbo.MSpublisher_databases where publisher_db='ReplicationTestbed_Publisher')
--run distributor --Data changed before enabled for repliation has moved select * from ReplicationTestbed_Publisher.dbo.Table1 select * from ReplicationTestbed_Subscriber.dbo.Table1
Listing 9. After re-enabling the log reader, the update that was committed before the column was added to replication arrives at the subscriber.
This behavior does seem counter intuitive, as if enabling a column for replication causes SQL Server to reach into the past and apply a transaction that is already committed to the subscriber. If the column was not replicated at the time the transaction occurs, why is the transaction pushed to the subscriber once the column is added? It appears that since the log reader is an asynchronous process, the adding of a column to replication can be implemented before the log reader is finished reading a prior transaction. Then, when that transaction is complete, the log reader, seeing at that point that that the column is enabled for replication, will move that large transaction to the distributor even though it happened at a time when the column was not replicated.
In the case of a large table, this can result in millions of records being pushed to the distribution database that were not intended to be there. The effect can be devastating for replication latency and potentially harmful to the subscribers if, for example, ETL transformations make accepting that many records all at once undesirable.
Mitigation Strategy
One method to work around this behavior is to recognize that the adding of a column to replication needs to wait until the log reader has processed any large transaction concerning that new column if you wish to avoid having that update applied to the subscriber. One way this can be accomplished is to add a tracer token to replication and wait for this token to arrive at the distributor. This way, once the token is at the distributor, you know the log reader has read all prior transactions and then the column can be added to replication safely.
As before, we add the column to the publisher and simulate a long update by stopping the log reader, as seen in Listing 10.
use ReplicationTestbed_Publisher go alter table ReplicationTestbed_Publisher.dbo.Table1 add col5 int --simulate long update execute msdb.dbo.sp_stop_job @job_name = 'SERVER1-ReplicationTestbed_Publisher-35' go update ReplicationTestbed_Publisher.dbo.Table1 set col5=20 select * from ReplicationTestbed_Publisher.dbo.Table1 select * from ReplicationTestbed_Subscriber.dbo.Table1
Listing 10. Add a new column to replication, again simulating a long running update
Next, we insert a token into the publisher’s log in Listing 11
DECLARE @publication AS sysname; DECLARE @tokenID AS int; SET @publication = N'ReplicationTestbed_Repl'; -- Insert a new tracer token in the publication database. EXEC sys.sp_posttracertoken @publication = @publication, @tracer_token_id = @tokenID OUTPUT; SELECT 'The ID of the new tracer token is ''' + CONVERT(varchar,@tokenID) + '''.'
Listing 11. Inserting a tracer token. The ID of the token is saved in the @tokenID variable
Once the token is in place, the system stored procedure sp_helptracertokenhistory can be used to obtain the status of the token (Listing 12)
create table #token ( distributor_latency int , subscriber sysname , subscriber_db sysname , subscriber_latency int , overall_latency int ) insert into #token EXEC sys.sp_helptracertokenhistory @publication = N'ReplicationTestbed_Repl', @tracer_id = -2147472026; select distributor_latency from #token drop table #token
Listing 12. The distributor latency is NULL. This is being simulated by not running the log reader, but the situation would be the same if a long running update was not yet processed by the log reader.
In this case, the token has not yet made it to the distributor. This means we cannot add the new column to replication. We can simulate that the log reader is done with the long-running update by turning on the log reader once again (Listing 13).
--long running transaction is done execute msdb.dbo.sp_start_job @job_name = 'SERVER1-ReplicationTestbed_Publisher-35' go --Verify create table #token ( distributor_latency int , subscriber sysname , subscriber_db sysname , subscriber_latency int , overall_latency int ) insert into #token EXEC sys.sp_helptracertokenhistory @publication = N'ReplicationTestbed_Repl', @tracer_id = -2147472028; select * from #token drop table #token
Listing 13. After re-enabling the log reader, the latency is not NULL, indicating that this token has arrived at the distributor
This time, the latency is not null at the distributor, so we know that the update to the subscriber and the enabling of the column for replication can occur (Listing 14).
exec sp_articlecolumn @publication = N'ReplicationTestbed_Repl' , @article = N'Table1' , @column = N'col5' , @operation = N'add' , @force_invalidate_snapshot = 1 , @force_reinit_subscription = 1 --add column to staging alter table ReplicationTestbed_Subscriber.dbo.Table1 add col5 int --change distribution proc to match use ReplicationTestbed_Subscriber go ALTER procedure [dbo].[sp_MSupd_dboTable1] @c1 int = NULL, @c2 int = NULL, @c4 int = NULL, @c5 int = NULL, @pkc1 int = NULL, @bitmap binary(1) as begin update [dbo].[Table1] set [col2] = case substring(@bitmap,1,1) & 2 when 2 then @c2 else [col2] end , [col4] = case substring(@bitmap,1,1) & 4 when 4 then @c4 else [col4] end , [col5] = case substring(@bitmap,1,1) & 8 when 8 then @c5 else [col5] end where [col1] = @pkc1 if @@rowcount = 0 if @@microsoftversion>0x07320000 exec sp_MSreplraiserror 20598 end
Listing 14. Adding the column to replication after the token indicates the latency is no longer an issue.
After running the distribution agent, the data from the simulated long running transaction has not been published, as expected. This is shown in Listing 15.
select * from ReplicationTestbed_Publisher.dbo.Table1 select * from ReplicationTestbed_Subscriber.dbo.Table1
Listing 15. The new column in the subscriber remains null and was not processed by the distribution agent.
Conclusion
The adding of a column to a vertically filtered replicated table is not synchronous with the reading of the publisher transaction log. As a result, it is possible for long running transactions that were committed before a column was published to be pushed to the subscriber if the log reader completes its scan of that transaction after the column was added to the publication. A potential workaround is to deploy a tracer token since it can be observed when the token, and thus all prior transactions, has been read by the log reader, indicating that it is safe to add the column to replication without moving unintended records to the subscriber.