Merge Replication - Lost Change

  • I have a server set up as publisher/distributor and another as subscriber to a merge publication that includes tables/views/sprocs. The merge process runs every 10 seconds and is generally working fine.

    Recently, a change was pushed to a table included in the publication that added a column and assigned an FK constraint to another table NOT included in the publication (dbo.PublishedTable1.UnpublishedTable1Id -> FK -> dbo.UnpublishedTable1.Id). The article properties are configured to propagate FKs and so, as expected, the merge attempt started throwing errors on the subscriber because the FK referenced a table that it didn't have. The published table at the subscriber was given the new column, but no FK (AS EXPECTED).

    These errors occurred every 10 seconds for about 7 hours, until a new change was made to the Published database. These new changes did NOT relate to the previous article. Yet from this point on, there are no more errors relating to that FK on the subscriber referencing a table that doesn't exist. The subscriber table does not have that FK (makes sense), but it's like replication just forgot it was supposed to handle that change. Throwing stones in the darkness, I created a new snapshot, reinitialized/validated the subscription, but none cause the process to remember that FK.

    I've been digging around the published database's system tables, and the master databases distribution tables, attempting to understand how replication tracks the changes it needs to apply over time, but it's a lot and I haven't yet found a good map of the data flow or relationships.

    Does anyone know why replication "forgot" about trying to apply this FK to a published article? Or can someone help me understand the replication "queue" when dealing with merge, so that I can look through the tables and try to piece together what happened?

    I'm somewhat partial to my current configuration, so please don't recommend changing the way the replication is set up as an initial "solution". I really want to understand this incident, whether it's something intentional, and ways to detect/mitigate it in the future.

    Thank you!

  • In case it wasn't clear, I'm not asking for help resolving the initial error, or how to avoid those errors. Those are expected outcomes based on the configuration of our replication. I'm asking why the error seemed to disappear WITHOUT being resolved. (and for anyone having heartburn about the error going ignored for so long, this is in a pre-prod environment).

    Thanks again!

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply