September 23, 2020 at 8:17 pm
Hello,
I have a transactional replication configured which subscription fails to sync to often due to 20598 errors. I've enabled verbose logging on the Distribution agent, nothing more detailed captured than the message that the UPDATE fails due to the row missing at the Subscriber.
I've run profiler on the Subscriber side and verified that very often INSERT commands are not being replicated for the articles. But UPDATE commands do, and the 20598 error occurs to often. I can't just insert the missing rows all the time for to fix the process. Currently, I have an automated PowerShell solution that inserts these missing rows as 20598 errors appear. But in need to find the source of the problem.
The articles are configured properly, inserts are to be propagated by calling the procedure sp_MSIns_{article_name}.
I tried to extract all the commands that are being inserted in the distribution database for to be replicated by the Log Reader Agent, from the MSRepl_commands and MSRepl_transactions, but I had difficulties decoding the command from the MSRepl_commands without using sp_browsereplcmds (I am trying not to call this procedure all the time but run a job to capture what everything is being captured for replication and be able to search that result set just to confirm that the missing INSERT commands are really missing and are not even brought to the distributor server).
I am using a Remote Distributor, one Publisher server and one Subscriber server. The tables I am replicating are around 100 Gb in size and grow fast.
How can I debug this further?
Thanks in advance
September 24, 2020 at 9:10 pm
Thanks for posting your issue and hopefully someone will answer soon.
This is an automated bump to increase visibility of your question.
September 24, 2020 at 11:30 pm
You'll probably have to dig in a bit more before anyone comes up with some ideas on the issue. I'd check the MSdistribution_history and MSrepl_errors tables in the distribution database. I'd also check the logs on the server with the distributor. Those should hopefully have more information to track things down.
Sue
September 27, 2020 at 3:33 am
This is a bit of a hunch but you might want to increasing the "LanMan" timeout in Windows. We had a similar problem with remote backups that would work just fine until the very end when the "file resize" bit kicked in. Increasing the "LanMan" timeout in the registry "changed it from 30 seconds to 600 seconds" fixed it all.
Like I said, just a hunch.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 13, 2020 at 4:46 pm
Hello Sue and Jeff,
And thanks a lot for replying. The MSdistribution_history and MSrepl_errors table will give me the same 20598 error message that I am receiving within the alert I have configured - that the row was not found at the Subscriber when applying the replicated UPDATE command for a particular table with a specific primary key. The error messages are always as follow:
And I can't find a pattern for the reoccurrence, it happens during the whole day. I did run a profiler on the source database and compared the flow of commands with the one run on the subscriber for a particular batch within which the error is registered - the insert is just missing at the subscriber, every other command is properly captured.
'LanMan' is not an option for me Jeff as all the SQL Servers are Linux-hosted ones.
Thanks again for replying.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply