May 15, 2011 at 2:55 pm
I have a basic invoice -> invoiceitem relationship in my merge replication setup on SQL2K8 R2.
[Invoice] article is replicated with overlapping partition options and filter
WHERE [Invoice].[SubscriptionHostName] = HOST_NAME() AND [Invoice].[Replicate] = 'True'
[InvoiceItem] article is replicated with overlapping partition options and join condition
INNER JOIN [Invoice] ON [InvoiceItem].[InvoiceId] = [Invoice].[InvoiceId] AND [InvoiceItem].[Replicate] = 'True'
The [Replicate] columns are flags that are always true right now, but will be used in the future to reduce the subset of records included in replication. The have a default constraint of 'True'.
Our merge replication system only has new [Invoice] and new [InvoiceItem] records being created at the subscriber under normal conditions, we insert at the publisher extremely rarely although it does happen. There are approximately 2000 new [Invoice] records per day, and about 10,000 joined [InvoiceItem] records per day across 131 subscriptions. I have no triggers in the entire system other than the merge system triggers.
The following scenario happens to about 2 out of the 10,000 [InvoiceItem] records per day. The [InvoiceItem] gets inserted, the insert statement contains the proper [InvoiceId] and defaults [Replicate] to true. This [InvoiceItem] record belongs at the subscriber, yet the next sync causes the record to be deleted. The delete is only temporary, when the subscriber synchronizes next, 10 minutes later, that same record that was just deleted is inserted again because it belongs there! The example that I investigated today, the [InvoiceItem] record was inserted at 11:02AM. at 11:04AM that record was removed from the subscriber. At 11:14AM that record was inserted back into the subscriber. It means that between 11:04AM and 11:14AM that record didn't exist, and it caused an error in the software at 11:05AM that was running on the subscriber, the record was inserted again, and at 11:14AM the subscriber ended up with two [InvoiceItem] records.
The [InvoiceItem] record was not modified in any way at the publisher, it was only inserted at the subscriber.
Can anyone tell me in what scenario that a record would be out of partition one sync, then back in for the next sync even though it wasn't modified?
If it was out of partition for one sync, how could it then be added back in without being updated in any way?
The articles only contain row-level tracking, but is there any way through the merge system tables that I can investigate why this record would have been removed from the subscriber in the first place?
May 25, 2011 at 10:09 am
I believe I have this issue resolved, so I wanted to post the answer incase anyone else comes across this issue.
The problem was apparently in the partition options, and my description of the problem was actually inaccurate. The [Invoice] article was actually set to partition_options 1, overlapping partitions with disallow out-of-partition changes. The [InvoiceItem] article was set to partition_options 0 by mistake, overlapping partitions.
Logically, since [InvoiceItem] is in a join filter to [Invoice] with a FK/PK relationship, and because [Invoice] disallows out-of-partition changes, ipso facto I believe that means that [InvoiceItem] also technically can't be modified to be out of partition despite the fact that it's set to partition_options 0. But apparently the fact that it wasn't explicitly set to disallow out-of-partition changes itself was confusing sql server replication.
The [InvoiceItem] records were not actually being modified to be out-of-partition, so replication had no reason to remove them regardless of the partition_options.
Although it should have been set to partition_options 1, and doing so appears to have resolved the issue, I don't see why it would have been required and why replication would have been removing and then adding back some records without them being updated. In my mind, this looks like a bug in replication that can be easily avoided.
September 11, 2016 at 4:21 pm
Ran into this again, 5yrs later, and stumbled across my own post. I believe I've got a handle on what's happening here, and thought I could update this in case anyone else struggles with the same issue.
This happens with a join filter. My posts from 2011 weren't really accurate about what the cause/fix was. But changing partition_options to 1 instead of 0 buries the issue slightly.
The problem is caused by a child article being joined to a parent, when that parent depends on the child - this is an issue of processing order in my case. So for example, if you have a "Customer" record that is joined to a "CustomerLocation" record to determine the subscription partition. So replication evaluates "CustomerLocation" records for each partition and will join the "Customer" to those.
But in most scenarios like this, a FK will be placed on CustomerLocation that references Customer.Id. When the merge agent uploads changes it follow the FK dependencies and tries to upload "Customer" before it uploads "CustomerLocation". This is where the problem is caused because that "Customer" record is out-of-partition since the "CustomerLocation" record hasn't been uploaded yet. Even though the "CustomerLocation" is uploaded immediately after, the publisher determines "Customer" to be out of partition, and triggers a delete at the subscriber. The very next sync when all of the data is there to evaluate, the "Customer" record is added right back. This causes the child record to disappear temporarily.
My original post suggested changing partition_options to 1 to fix the problem, but all that does is raise conflicts in the publisher that are resolved at the next sync. So for one sync you end up with "Subscriber has modified a record that is out of partition when partition_options is set to disallow" or something like that, and the next sync that conflict is resolved. This prevents the temporary delete pending conflict resolution, but it still sucks for a solution.
I'd read in BOL that using NOT FOR REPLICATION on a constraint excludes it from the process that determines partition_order based on constraints, and it should fall through to using JOIN filters to determine processing order. But that doesn't appear to be true in my tests. Even though all the constraints are NOT FOR REPLICATION, the system would still continue to upload changes to "Customer" before "CustomerLocation" even though "CustomerLocation" was the parent in the join filter.
To fix this I've had to manually set processing_order on the "Customer" article to merge after "CustomerLocation", allowing the parent to be uploaded first. Since the FK is NOT FOR REPLICATION, it's allowed to merge in that order.
December 14, 2020 at 4:47 pm
Has changing processing_order solved your problem?
I have the same issue with merge replication. If new records inserted into a join filtered table during a replication cycle, It will cause a delete on subscriber, and an insert on publisher. The next sync will download the missing rows to the subscriber again and everything's fine. It's annoying.
I tried to change processing_order, but It didn't helped, however I don't have identity columns, so I can't use the not for replication flag.
Is It possibe that setting partiton_options to 1 is also needed to resolve this?
Many thx in advance for your reply.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply