Is (transactional) replication production worthy?

  • Dear all,

    I have not used replication much in the past but I thought it might be useful to "replicate" some (small) data warehouse data for enabling reading on a server not directly affected by ETL loads, etc.

    Since replication has been around for such a long time and our requirements are quite simple, I thought this should be trouble free...

    However, I have found it quite "temperamental" and not straight forward to keep going.

    We had various problems like

    - failing some distribution cleanup on our test server (which we were never able to fix, although we think we installed replication exactly the same that some other server not showing the problem)

    - replication suddenly failing with problems inserting duplicate keys? How can it fail with inserting duplicate keys when it's the only process updating the replicated database and there is a unique index on the published database table? I tried dropping the subscriptions and re-establishing it with a brand new snapshot... No Joy!

    It is perfectly possible that some (all?) of the difficulties are due to our lack of experience but my question is: "is it worth it pursuing in this direction or should we look for alternative solutions to replication?"

    Some people told me it was "flaky" anyway...

    Is this true?

    Thanks

    Eric

  • We have been using Tran replication for years for reporting. It has been quite stable. But if you have Network problems and HW problems, it can become tough to maintain.

    -Roy

  • Eric Mamet (10/19/2011)


    Dear all,

    - failing some distribution cleanup on our test server (which we were never able to fix, although we think we installed replication exactly the same that some other server not showing the problem)

    What is the error message in the clean up jobs?

    - replication suddenly failing with problems inserting duplicate keys? How can it fail with inserting duplicate keys when it's the only process updating the replicated database and there is a unique index on the published database table? I tried dropping the subscriptions and re-establishing it with a brand new snapshot... No Joy!

    Could be some update is happening on the primary key of the publisher database.

    Replication is configured in many of our production environments and it is running fine. Of course, there were some issues but they were sporadic.

    M&M

  • Interesting because some of these servers are complete jokes. It takes them regularly around 1mn to perform jobs that take 1 or 2 s on my 3 years old laptop...

  • That means you are having bottlenecks/Blocks somewhere. it could be Network IO or disk IO or any blocks. Our latency is around 3 sec and our OLTP server is Kinda pretty busy.

    -Roy

  • We have have been running transactional replication on our production servers for over 8 yrs with only minor issues that have all been traced back to user error.

    For example: Someone adding the same article to a second publication that is replicating to the same subsciber database. That will cause a duplicate key problem.

    We use transactional replication to give users in two or more of our companies a central entry/update point for shared data, then we replicate the data out to each of the companies own servers for their appications/reporting.

    No issues at until now when I'm trying to perform a hardware upgrade and reduce costs by reducing physical processor numbers, taking advantage of multi core processors. Can't figure out how to clear out old replication configuration after restoring replicated databases onto new server. I posted this yesterday but haven't had any replies. I was hoping ROY the Replication GURU would have the quick answer! 🙂

  • If its flakey then my employers entire production environment is flawed! We use it to scale out for web servers.

    It works very well and usually errors come from errant applications and jobs.

  • OK, from the various answers, it seems clear we might be doing something wrong...

    First, the virtual machines we use for development and test are EXTREMELY slow.

    The latency seems to change widely in Replication Monitor but I have seen it expressed in minutes sometimes...

    I know mohammed moinudheen asked me about the distribution error message and I can't answer at the moment because I have been removed access to that box! 🙁

    On the other hand, the good news is that we have so many problems that it's not difficult to find another one. Yeepee!

    On our dev machine, one of our publications got into trouble so I thought I'd refresh it.

    This is a one way transactional replication with two subscribers.

    So, yesterday I asked to re-initialise and create a new snapshot.

    Then I made a small manual change in a small replicated table.

    24 hours later, one of the replicated databases has the change while the other does not (both subscriptions are on the same server, hence same network, etc)

    Yet, if I look into replication Monitor, it says that both subscriptions are fine and the latency excellent (5 and 3 seconds).

    How come it says everything is fine when I know my updates are missing from one of the two replicated tables???

    If I right click on a subscription and look at details, it says in Action Messages "The initial snapshot for publication 'SecurityData' is not yet available." (SecurityData being my publication)

    What does this all mean?

    - From Replication Monitor main screen, it says everything is fine 🙂

    - From looking at the data, it clearly shows that (only) one of the two subscriptions has been updated :crazy:

    - From the Subscription details view, it says nothing is working :angry:

    This is really confusing me...

  • How was the replication initially set up? Was it set up with Sync from backup? If so everything will look normal but none of the changes will go through until a log back up is done.

    -Roy

  • No I think it was setup to perform a snaphot

  • That could be a problem otherwise since we have started using a third party backup tool

  • Oopss!

    I just realised that Object Explorer showed 2 subscriptions to my publication when looking at the publisher server side while it only shows one subscription to that publication when looking at the subscriber server side...

    So, from the subscriber side, I proceeded with adding the one subscription missing and I asked for one snapshot to be created immediately.

    Then, SSMS crashed so I had to restart 😉

    Now, from the subscriber side, I can see my two subscriptions while on the publisher side I only see one!

    If I look at the details of the subscriptions on Replication Monitor, it still says "The initial snapshot "Security Data" is not yet available"...

    WHY NOT???

    Another detail is that the latency is now 1'18" for that subscription while it is 3" for the trouble-free one.

    Am I doing something really stupid? :crying:

  • Damn!

    Looking at the Snapshot Sql Server Agent, I can see the job fails on authentication...

    The job failed. Unable to determine if the owner (SSPUK\eric.mamet) of job FOXDWH-DEV01-FoxDW_DataWarehouse-SecurityData-15 has server access (reason: Could not obtain information about Windows NT group/user 'MyDomain\MyName', error code 0x6ba. [SQLSTATE 42000] (Error 15404)).

    I guess this could explain some of my troubles...

  • Having sorted that issue, things are looking brighter now...

    I am not too impressed with the latency of 3'45" (!) but this machine is simply doing it's best... 😀

  • Not sure it really matters but I've always added new subscriptions from the publisher side then manually started the snapshot agent to push the data to the new subsciber.

    Do you have your publisher and distributor located on the same server?

    We found out years ago (sql server 2000) for best replication performance it's a good idea to use a standalone distribution server. Our Distribution Server has a single processor server with 4Gb of ram dedicated to nothing but sql server replication distribution. We are at the point now where in the very near future we will need to upgrade this server / memory.

    Also, go into replication monitor, click on the publication, then the agents tab, make sure your lo log reader agent is running and that your snapshot agent completed successfully.

    You can double click on both the snapshot agent and logreader agent to see detailed history.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply