November 13, 2013 at 2:18 pm
We currently have 140 SQL 2000 publications replicating to a single 2008 server with about 25 articles. 5 of the articles are quite substantial. We are in the process of migrating to new 2008 servers that perform in general significantly better than the old 2000 infrastructure. Previously all replication ran fine with a latency of no more than a minute at any time even under peak load. We have migrated about 60 servers to the new 2008 infrastructure but the latency has now shot up. This makes no sense in terms of performance. We have tried a number of things to resolve this but have so far been unsuccessful. The 2008 and 2000 publications are both going to the same DB and using the same replication procs.
First the @status in add article needed changing from the original script where it was 0 to 24. This gave a significant improvement but the latency is still up to 40 minutes at peak times.
We have tried changing from push to pull. This made the performance worse.
We have changed the PollingInterval on the distribution agent from 5 (2008 default) to 10 (2000 default). This made no noticeable difference.
We have changed the ImmediateSync setting to 0 from 1. This made no noticeable difference.
We have ensured the index etc is ok on the central MSreplication_unscriptions table. This made no noticeable difference.
We have tried lock hints on some of the replication procs. This made no noticeable difference.
Any ideas would be much appreciated
November 14, 2013 at 2:36 am
Have you established where the latency is?
Log reader to distribution server?
Distribution agent to subscriber?
November 14, 2013 at 2:55 am
Cheers for the response. The publisher is its own distributor, there isn't a separate server for this. I will post some outputs from the Distribution and Log Reader agents shortly
November 14, 2013 at 3:01 am
Also, is the latency across all subscribers?
November 14, 2013 at 3:05 am
Some are worse than others, but there is latency across all the new servers. There was a correlation to the number of records in msrepl_commans and performance. But even when we set ImmediateSync to 0 and running the Distribution Cleanup job every 30 mins to keep the table size, this didn't help with I guess suggests it is an issue with applying the commands at the subscriber rather than getting the off the distributor?
There is no latency on the 2000 boxes
November 14, 2013 at 4:28 am
chris.roddis-ferrari (11/14/2013)
Some are worse than others, but there is latency across all the new servers. There was a correlation to the number of records in msrepl_commans and performance. But even when we set ImmediateSync to 0 and running the Distribution Cleanup job every 30 mins to keep the table size, this didn't help with I guess suggests it is an issue with applying the commands at the subscriber rather than getting the off the distributor?There is no latency on the 2000 boxes
It could be.
For clarity, you have 200 publishers (140/60 2000/2008) delivering to a single subscriber using push transactional replication. All of the 60 2008 publishers are experiencing latency at a currently unknown "bottleneck".
Are the subscriptions going to the same database/objects?
November 14, 2013 at 5:04 am
Yes all going to the same database/objects. The split is 80 on 2000 and 60 on 2008
Cheers
November 14, 2013 at 5:15 am
Any Drive(disk) level changes happened ? like comparatively low graded disk is being used now.
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
November 14, 2013 at 5:40 am
Disk is now significantly better
Was previously 2 Utlra SCSI 420 72GB drives RAID1-0 and is now 4 SAS 300GB drives 2 RAID1-0 pairs.
November 14, 2013 at 6:25 am
Distribution Agent Log
************************ STATISTICS SINCE AGENT STARTED ***********************
11-14-2013 13:20:45
Total Run Time (ms) : 394605 Total Work Time : 389457
Total Num Trans : 5194 Num Trans/Sec : 13.34
Total Num Cmds : 8928 Num Cmds/Sec : 22.92
Total Idle Time : 0
Writer Thread Stats
Total Number of Retries : 0
Time Spent on Exec : 25784
Time Spent on Commits (ms): 1622 Commits/Sec : 0.13
Time to Apply Cmds (ms) : 389457 Cmds/Sec : 22.92
Time Cmd Queue Empty (ms) : 157 Empty Q Waits > 10ms: 10
Total Time Request Blk(ms): 157
P2P Work Time (ms) : 0 P2P Cmds Skipped : 0
Reader Thread Stats
Calls to Retrieve Cmds : 2
Time to Retrieve Cmds (ms): 369629 Cmds/Sec : 24.15
Time Cmd Queue Full (ms) : 19843 Full Q Waits > 10ms : 128
November 14, 2013 at 6:40 am
It looks like the delivery of commands is whats taking the time. Have you compared the distribution agent profile settings between the servers?
November 14, 2013 at 7:49 am
Only differences are BCPBatchSize and QueryTimeout which we haven't changed and PollingInterval which changed to 5 by default and we have changed back to 10
November 14, 2013 at 8:04 am
chris.roddis-ferrari (11/14/2013)
Only differences are BCPBatchSize and QueryTimeout which we haven't changed and PollingInterval which changed to 5 by default and we have changed back to 10
None of those would make any difference to command delivery. Have you checked for blocking on distribution db and subscription db?
Checked latency using a tracer token?
These are the parameters which modify delivery rate.
[-CommitBatchSize commit_batch_size]
[-CommitBatchThreshold commit_batch_threshold]
[-MaxDeliveredTransactions number_of_transactions]
[-PacketSize packet_size]
[-SubscriptionStreams [1|2|...64]]
November 14, 2013 at 8:36 am
Tracer Token show Publisher to Distributor (same box) as a couple of secs and all the latency is Distributor to Subscriber - 12 minutes in the one I just did. There is no blocking on the distribution db, just a ASYNC_NETWORK_IO wait of about 2 secs on the distribution process. There has always been a level of blocking on the subscriber even before the upgrades started, but this has never caused performance issues. I am not able/don't know how to tell if this has increased
Viewing 15 posts - 1 through 15 (of 25 total)
You must be logged in to reply to this topic. Login to reply