Replication Latency

Question

Post reply

Replication Latency

Lynn-266641

SSC-Addicted

Points: 486
More actions
May 1, 2008 at 7:22 am

#190141

Anyone has some good tips or scripts for troubleshooting transactional replication latency between 2000 servers as well as 2000 to 2005 servers? Your help would be much appreciated.
Lynn

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

dbaker-620086 SSCrazy Points: 2336 More actions · Answer 1

Hi Lynn ,

should be no problems doing transactional repl from SS2K to SS2K5 (or vice-versa)

as with most DBA operations, simplest to use the SEM in SS2K (or SSMS in SS2K5) to create the pub and sub(s) and make sure it works.

use the SEM or SSMS tool to create the scripts [is a pretty ugly looking TSQL so needs prettifying!]

- at least it fingers the underlying sprocs [without resorting to Trace/Profiling to capture this]

helps to edit the script to put in conditional code (IF @@servername='SVR1' ...) e.g. for PROD, UAT, DEV and DR. Also allows enhanced params - check each sproc in BOL.

the SEM/SSMS tools (well, external ReplMonitor in SS2K5) provide good GUI of what's going on

- you can dig into DMF (SS2K) or SMO/RMO (SS2K5) if you need more programatic control

suggest you get good baseline timings in early usage so you can detect anything out of ordinary (a problem!), and to conduct capacity-planning (are your LAN/WAN up to the normal/peak traffic?).

at least there's some battle-scarred veterans out there who've been there, done that !

HTH

Dick

Lynn-266641 SSC-Addicted Points: 486 More actions · Answer 2

We've had replication up and running from 2000 to 2005 for a while but now I also need to set it up the other way around. Every now and them, I get latency issues and sometimes they're caused by excess traffic and other times they are caused by other processes holding locks on resources, etc.

I need a way to quickly determine what the cause it when this happens. Are there systems table I can check other than the normal sp_who2 etc.?

JamesCool Old Hand Points: 394 More actions · Answer 3

Hi all,

I have a very interesting SS2K vs ss2K5 problem, which I am currently bypassing in hope of a solution.

My topology:

-1 publication database in SS2K (600GB)

-1 SS2K pull read-only subscriber

-1 SS2K5 pull read-only subscriber

-Replication is occurring via a SS2K5 distributer.

-Replication is read-only one-way transactional (non-updating)

My settings:

The publication setting is currently:

-Subscriptions never expire

-Articles copy clustered and non-clustered indexes, and drop and create new objects, copy insert, update and delete sp's.

-Snapshot = Native SQL

-Subscriptions = Allow pull transactions

Agent profiles:

-Currently added skip errors (2601:2627:20598) to -skiperrors

And herein lies my problem. I am getting a huge number of errors relating to 'row not found at Subscriber when applying replicated command. (Error: 20598)'.

This exact configuration was working perfectly when the environment was totally SS2K but now we have a mixed environment and they system is failing. This is the reason for the skip errors flags being set.

I have noticed that when my achitecture was only on a SS2K platform the @schema_option = 0x00000000000000F3, in SS2K5 the @schema_option = 0x00000000000080F3. Otherwise, when you script and compare the two replications everything else is exactly the same.

Do you have any idea as to why these errors are occurring and how to solve them.

Kind regards

James.

noeld SSC Guru Points: 96590 More actions · Answer 4

These errors are raised by the stored procedures on the subscribers. If you are using -skiperrors the agent is supposed to ignore them but you are somehow risking that the data on the subscriber be out-of-synch with the publisher. Make sure that ALL subscribers are using the same profile.

* Noel

dbaker-620086 SSCrazy Points: 2336 More actions · Answer 5

we use @schema_option = 0x00000000080350DB so I commend this for you to try

you should be able to edit your generation script for all such sp_addarticle calls

let the community know how you get on !

HTH

Dick

JamesCool Old Hand Points: 394 More actions · Answer 6

Hi thanks for the Responses!

Dick - can you tell me what the @schema_option = 0x00000000080350DB settings turn on?

Noel - The -skiperrors is a work around, and definitely not something I want to keep! I have tried moving the distribution agents to run from the distributor (Push Subscription). This puts quite a load on the distributor, but it makes SS2K a bit more useable! So far I have not had any errors - but it is Saturday, and there is very little load on the servers. So I will have to wait for Monday to confirm!

Regards

James

dbaker-620086 SSCrazy Points: 2336 More actions · Answer 7

all explained in BOL which should be your first port of call

- either as installed on your PC or the online MS-website

specific pages at

http://msdn.microsoft.com/en-us/library/ms173857.aspx

Dick

JamesCool Old Hand Points: 394 More actions · Answer 8

Ha ha! I love hearing people tell others to read books on-line. If all the answers were there we wouldn't have forums like this!

Okay - so I managed to publish all the articles and synchronise to the subscriber. (SS2K -> SS2K5 -> SS2K). The way I managed this was to use a push distribution, and to publish all the tables simultaneously.

It would seem if you are using SS2K5 as the distributer, then SQL server prefers that all distribution engines reside on the SSK2 server. All errors such as row not found are eliminated.

Now I have another problem if I try publish an article (yes following BOL) and running sp_addsubscription; the snapshot generates for that single article (which is the desired effect), but the distribution agent never detects the new article. If I invalidate the entire subscription, it works fine. But for a 300GB database this is not what I want to do.

I have tried suggestions of setting @reserved = 'internal' (which is not in the BOL), and it worked for a single article, but then never worked again. The other problem is that records seem to be synchronising as if the snapshot had been transferred correctly, but errors such as SP_MSUpdXXX and so forth are now appearing!

Why has MS changed the replication functionality so much, I have run replication for 2 years on SS2K without hassles on the same databases!!!

Regards

James

dbaker-620086 SSCrazy Points: 2336 More actions · Answer 9

BOL has lots of technical detail but sometimes not much wisdom. The forums would be more focussed if people at least used BOL for what it IS good for.

You suggest that you have a 300GB database in one publication, so adding an extra article is impractical [may require a re-init to achieve]. So you should have several publications and several Distribution Agents to shoulder the work - obviously partition the articles to pubs to avoid DRI problems (race between customers .. orderdetails). Then when you do get busy periods there are several streams active.

"Row not found" errors would occur whether a push (from distributor) or pull (by sub) unless you changed the autogenerated sprocs or wrote your own. A DELETE can be made to succeed (if row already deleted), and perhaps an UPDATE could be made into an INSERT (if all fields supplied or NULLable) but not desirable IMHO.

Developers do ask me for additional table articles to be added (especially "now!"), so the only practical way is to add a new temporary pub (likely independent_agent=false) so I can implement immediately. Then at some quiet time [weekend], I will consolidate the temporaries with mainstream (yes, involves a rip & replace then).

It is possible to specify data already present (ie skip rather than delete/truncate) so a reinit would not do rip&replace, but that has other management issues.

"Why has MS changed the replication functionality so much, I have run replication for 2 years on SS2K without hassles on the same databases!!!"

I guess that progress like death & taxes are inevitable, but at least you can opt-out of progress if you wish [keep running SQL2K]. There are usually several ways to achieve results, and peer-peer or new Synchonization products are worthy of attention [if only to eliminate as less good than what you have now].

Regards

Dick

JamesCool Old Hand Points: 394 More actions · Answer 10

Now you are sprouting wisdom!

I have done exactly that! Created multiple transactional publications on the database with the largest tables being placed in one publication. Now if I add an article I can reinitialise the entire publication.

I suppose this leads to another question, how much additional load do multiple publications have on a production database?

This resolves the problem to a certain degree! Not the solution I was looking for, but a work around!

Thanks!

James

dbaker-620086 SSCrazy Points: 2336 More actions · Answer 11

any pub with independent_agent=true will have its own DistribAgent (eg SysA_Pub1). If you have 5 such pubs that would mean 5 windows processes (each running distrib.exe visible in taskmanager).

If you had 3 (A,B,C) that were independent_agent=true and 2 (D,E) that were independent_agent=false, you would have 4 processes (D + E would share one).

Suppose you had 2 pubs (each independent_agent=true) :"PubBig" with 100 busy-table articles [begin tran; 1MxINSERT; 500KxUPDATE; 100KxDELETE;commit], and one "PubQuick" with 5 tables. Latter would be able to deliver changes quickly (fast-lane, low latency) without having to wait for former (slow trucks).

But remember my comments about DRI, and that you DON'T want to have many tables in a single pub since that might mean more re-inits. At least you can choose when to fold in any temporary pubs.

For WANs you might expect network to be the limiting factor (so DA processes not CPU-bound so just consume a process slot). But for LANs DA's can be surprising CPU-bound (cumulatively probably no difference to one single DA).

Remember that distribution db has connections from LR to write, distrib.exe DA's for reading, and 10-minute cleanup job. Having MANY distrib.exe processes would compete (think blocking) for distribution db access as well as process slot overhead, so don't overdo it.

"Your mileage will vary" (partition your tables into pubs intelligently for logical relationships, physical latencies etc), so test, test, test as ever.

Dick