August 15, 2014 at 2:55 am
Hi,
Our replication latency kept increasing until it was unacceptable. No data was getting replicated to the subscriber.
We had to fall back to the publisher for querying.
I removed all the subscriptions. Before I add them again I need to make sure that it'll work.
The log reader agent is constantly blocking the distribution clean up job. Killing the log reader helped, but isn't a nice solution.
SPIDBlkbyHostNameDBNameCPUTimeDiskIOStatusProgramNameCommandLoginNameLastBatch
175 .SQLSvr1distribution3499430204sleepingRepl-LogReader-0-MyVLDB-8AWAITING COMMANDDomain\User08/15 08:31:54
115175 SQLSvr1distribution3016364SUSPENDEDDistribution clean up: distributionOPEN CURSORDomain\User08/15 07:05:01
How / why is a sleeping log reader blocking the distribution cleanup job?
How is this related to my VLFs issue that started around the same time?
Cheers
August 15, 2014 at 7:23 am
I think it would help us to know what your replication set up is.
What type of replication?
What is the latency / timing?
Approximately how many subscriptions do you have set up?
How many subscribers to this publisher?
Where is the Distributor? (On the publisher or a separate server?)
I assume it was a pull subscription until you changed it over to a push (given your post). Would that be correct?
August 16, 2014 at 6:19 am
Brandie Tarvin (8/15/2014)
I think it would help us to know what your replication set up is.What type of replication?
What is the latency / timing?
Approximately how many subscriptions do you have set up?
How many subscribers to this publisher?
Where is the Distributor? (On the publisher or a separate server?)
I assume it was a pull subscription until you changed it over to a push (given your post). Would that be correct?
Yeah, a few more details might be more helpful 🙂
Transactional replication
SQLSvr1 is publisher and distributor.
One push subscriber over 10GB LAN.
9 publications. ALL slow.
Latency crept up to 24+ hours.
Both published DBs had a big VLF increase. (97+K and 14+K VLFs)
14K vlfs reduced to 200.
Still waiting to resolve the 97+K
August 17, 2014 at 7:00 am
I didn't use a token so I can't say for sure.
Because of the VLFs issue, I'm assuming it was the log reader.
August 18, 2014 at 4:46 am
First, let's try approaching this problem as if it has nothing to do with the VLFs issue, okay?
What is the schedule (if any) of your snapshot agent?
How big are your subscriptions? Filtered or unfiltered?
Is the Distribution Agent running at the subscriber or publisher? (The agent is different from the actual Distributor database)
Is the DA running under the process account or a different account? (this probably has nothing to do with it, but I'm asking every question I can think of)
Since you misunderstood my latency question, I'm going to rephrase it. How often does your transaction replication update? Meaning, what is the Agent schedule? Continuously? On Demand? Or on a custom schedule?
BTW, you do need to verify if the subscriber and publisher are in the same data center and if they are even on the same domain. That could make a difference.
August 19, 2014 at 9:42 am
Brandie Tarvin (8/18/2014)
First, let's try approaching this problem as if it has nothing to do with the VLFs issue, okay?
Okay.
What is the schedule (if any) of your snapshot agent?
No schedule
How big are your subscriptions? Filtered or unfiltered?
4 GB - 200 GB
Is the Distribution Agent running at the subscriber or publisher? (The agent is different from the actual Distributor database)
Publisher
Is the DA running under the process account or a different account? (this probably has nothing to do with it, but I'm asking every question I can think of)
Different account
Since you misunderstood my latency question, I'm going to rephrase it. How often does your transaction replication update? Meaning, what is the Agent schedule? Continuously? On Demand? Or on a custom schedule?
Continuously
BTW, you do need to verify if the subscriber and publisher are in the same data center and if they are even on the same domain. That could make a difference.
Some datacenter, same rack.
We found that the clustered indexes on some of the subscriber tables were different to the publisher due to reporting queries. These new indexes were not contiguous which lead to a lot of fragmentation and slower inserts on indexed views. Each row was taking about 500 ms to insert. 10 of thousands every day into each of the indexed views.
The clustered indexes have been changed back to surrogate keys and a nonclustered index for reporting.
Now I can setup the subscriptions again and test.
Thanks for you time and patience so far.
August 19, 2014 at 10:50 am
Are you not replicating index changes over to the subscriber?
Let us know if your fix doesn't fix and we'll take it from there.
August 19, 2014 at 3:36 pm
Brandie Tarvin (8/19/2014)
Are you not replicating index changes over to the subscriber?
No. Reporting is offloaded onto the subscriber. It uses different indexes.
Might take a while to setup replication to the other subscriber.
I'll update when I can.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply