Is there another way out?

  • Guys,

    We have a replication set up with a single publisher and a single subscriber.

    Recently, replication failed due to the following reason:

    Data, erroneously, was deleted from subscriber.  Then, to make the publisher and subscriber synch, the same data was deleted from publisher.

    At that point, the publisher, as it should, attempted to delete the data in the subscriber.  But since the data is not there, the attempt failed, and replication failed.

    I tried to restart the agent, but replication kept failing.  Since nothing worked, I just deleted the subscription and recreated it.  The replication then started, but since it had to go through all the steps again (i.e. Snapshot ...), it took us 2 hours to get the data back in synch.

    Luckily this happened in the testing environment.  If it would be production, we would be in trouble, as availability is our #1 concern.

    Would anyone know of any other (i.e. more elegant) way I could've solved this problem?

    The only other idea I had was the following: It is possible that the data which the publisher plans to delete is temporarily stored in some file and if that file is located and deleted, and then the agent is restarted, it could possibly work - but, I could not locate any such files - and this is a completely hypothetical idea.

    Any suggestions would be appreciated.

    Thanks

  • Was this intentional? If so from BOL

    Merge Replication or Updatable Subscriptions

    When replicated data needs to be updated at the Subscribers, you can use snapshot replication or transactional replication with updatable subscription options or you can use merge replication. The method you choose depends on your replication topology and the needs of your application and its users.

    Use merge replication when . . .

    Use snapshot replication or transactional replication with immediate updating or queued updating when . . .

    • Replicated data is read and updated at the Subscriber.
    • Subscriber and Publisher are only occasionally connected.
    • Conflicts caused by multiple updates to the same data are handled and resolved.
    • You need updates to be propagated on a row-by-row basis, and conflicts to be evaluated and resolved at the row level.

    • Replicated data is mostly read-only at the Subscriber.
    • Subscriber, Distributor, and Publisher are connected most of the time, but this is not necessary for queued updating subscriptions.
    • Conflicts caused by multiple updates to the same data are infrequent.
    • You need updates to be propagated on a transaction basis, and conflicts to be evaluated and resolved on a transaction basis (the entire transaction is either committed or rolled back).

  • I believe you were using one way transaction replication...

    Without snapshot you can handle this situation by add the -SkipErrors swith in your distribution job...and later you run the snapshot if neccessary...

    Use the "-SkipErrors" parameter in Distribution Agent cautiously

    http://support.microsoft.com/default.aspx/kb/327817

     

    MohammedU
    Microsoft SQL Server MVP

  • This is a quite useful suggestion.

     

    Thank you very much

  • Antares,

    I implemented "updatable subscriptions" solution.

    However, when I delete something from the subscriber, I get the following error message:

    Login failed for user 'sa'.

     

    Would you know what could be causing it?

     

    Thank you

  • Guys,

    To solve the problem stated above, I am currently considering to implement the following replication model:

    No user, other than the replication user, will be allowed to perform any insert/delete/update to the tables in the subscriber.

    That way, we can avoid the problem that happened this time, which resulted in a replication failure.

    In order to implement this, I am assuming that I should create 'replication' user and then give execution priviliges to this user for all the replication stored procedures.

    The issues I need help with are the following:

    1. There are too many stored procedures related to replication (and I don't know if I can find all of them)

    2. I am not sure whether it would be sufficient just to do that (i.e. is there anything more related with replication in addition to the sps, that I should be concerned about?)

    Can anyone suggest a more elegant solution for implementing this model or any other suggestions?

    Thank you very much

  • We just went through this issue. I right clicked on the distributer agent, selected the profile and made a new profile setting the skiperrors to the proper code (btw-there should be a builtin profile that skips errors, but it skips three, I made one to just skip the error we were getting).

    That let the replication continue. Then I looked at the order the tables replicate. I compared the number of rows in each (original/copy) until I found a mismatch. We also use RedGate Compare to do this. Then I fix the incorrect table.

    The second time it happened (two days in a row), we shut off all stored procedures that we didn't know what they did (meaning someone didn't go through us to create them). Then we waited for the screaming - which hasn't happened yet (two days later).

    Document whatever you do, including the reasons so you can undo as needed.

    -SQLBill

  • Bill,

    Where can I find information about what error value corresponds to which error (i.e. I want to know what types of errors I will be skipping when I set a certain code value)?

    Thank you

  • Ah ...

    - Replication Monitor - Agents - Distribution Agents  - "double click on publication" opened up the dialog box to show the error number.

    Thanks Bill ... the info is quite useful - it can definitely save us in the future!

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply