September 30, 2009 at 12:51 pm
Latency between our Distributor and the Subscriber is usually is less than 10 seconds. However sometimes (once or twice a month) it gets behind several minutes, or even a couple of hours. We haven’t been able to put our finger on any specific user activity or processing that precipitates the latency problem. We never have latency issues between the Publisher and the Distributor – only between the Distributor and the Subscriber. When we have these latency issues, transactions and commands are still replicated - as viewed with Replication Monitor - but they just seem to slow down. Transactions will eventually catchup on their own, but there may be latency issues for a few hours. This is one-way, push, transactional replication. The Subscriber is used mainly for read-only public access and job/report processing. PerfMon doesn’t show any CPU or Memory problems.
Our Environment:
SQL Server 2005 SP3/CU1 64 bit Enterprise Edition
Windows Server 2003 Enterprise x64 w/SP2
Intel Xeon CPU 3.40GHz – 8 physical CPUs per node
64GB Memory per node
SAN disks - all RAID 10
Active/Active Cluster
Node1: Publisher, Distributor, Cluster Group, MSDTC
Node2: Subscriber
Publisher: Max Mem set to: 32768
Distributor: Max Mem set to: 10240
Subscriber: Max Mem set to: 20480
(If everything is forced to run on one node, this should leave about 2GB for the OS)
Distribution agent is set for Continuous transaction replication – Push subscription.
The application is vendor supplied.
Anyone have any ideas about what might be causing this, or what to look at to diagnose the problem?
Thanks
September 30, 2009 at 1:17 pm
Replication could slow down when the distribution log clean up job runs. This is just one issue. If the Disk IO is busy, it could slow down replication. This can happen if someone is copying a large Back up file from the distributor to somewhere else. Check if there are any Bulk updates happening on the publisher at the time of the slow down. If there are lots of commands to be propagated from the distribution, this can cause slow down as well.
-Roy
September 30, 2009 at 1:21 pm
When this latency occurs, have you verified that there is nothing blocking at the Subscriber?
September 30, 2009 at 1:24 pm
The Distrubution cleanup job runs every 10 minutes. I believe that this is a default interval and its always been set to 10 minutes. It seems to run okay even when latency is slow.
I know that there are no bulk copy processes of any kind running.
There is no blocking anywhere.
Thanks for your ideas - I'm stumped!
September 30, 2009 at 1:27 pm
Do you see any high IO on the distributor or Subscriber?
And here is a wild question, Do you have antivirus installed on the servers and is it set to run at a certain time doing a full scan?
-Roy
September 30, 2009 at 1:36 pm
We don't run antivirus software on any of the nodes.
I don't administer the SAN, but I've asked those administrators to look at I/O. They aren't in agreement that it is high. I don't think that they have a good stick to measure against. However they are performing some SAN upgrades this weekend that are suppose to improve throughput.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply