April 9, 2013 at 8:26 am
I've had multi-terabyte transactional replication going for a couple years now, for the most part things flow quite smoothly with the occasional slowdown when massive updates are performed. However since March 22nd, 2013 I've had the log reader agent blow up and stop responding.
There's not much in the way of a descriptive error, nor much to read in the log/dump files other than it appears to point out an issue with the EXE itself...perhaps a memory leak or a bug somewhere in the distribution agent executable?
In the dump file, this is basically the only useful information I can see - hoping experts around here have experienced something similar and can offer suggestions:
Process Name:DISTRIB.exe : C:\Program Files\Microsoft SQL Server\100\COM\DISTRIB.exe
Process Architecture:x64
Exception Code:0xC0000005
Exception Information:The thread tried to read from or write to a virtual address for which it does not have the appropriate access.
Here's the output from the Replication Agent job:
Message
The replication agent encountered a failure. See the previous job step history message or Replication Monitor for more information. The step failed.
Date3/23/2013 1:34:26 AM
LogJob History (MYSERVER-CA_TABLES-SUBSCRIBER-22)
Step ID2
ServerDISTRIBUTOR
Job NameMYSERVER-CA_TABLES-SUBSCRIBER-22
Step NameRun agent.
Duration17.06:24:48
Sql Severity0
Sql Message ID0
Operator Emailed
Operator Net sent
Operator Paged
Retries Attempted0
Message
2013-04-09 12:26:55.798 88 transaction(s) with 232 command(s) were delivered.
2013-04-09 12:26:55.798 Delivering replicated transactions
2013-04-09 12:27:11.251 Delivering replicated transactions
2013-04-09 12:27:21.751 114 transaction(s) with 321 command(s) were delivered.
2013-04-09 12:27:32.673 101 transaction(s) with 206 command(s) were delivered.
2013-04-09 12:27:37.938 100 transaction(s) with 168 command(s) were delivered.
2013-04-09 12:27:50.814 Delivering replicated transactions
2013-04-09 12:27:58.907 Delivering replicated transactions
2013-04-09 12:28:00.142 107 transaction(s) with 288 command(s) were delivered.
2013-04-09 12:28:19.376 Delivering replicated transactions
2013-04-09 12:28:24.564 101 transaction(s) with 256 command(s) were delivered.
2013-04-09 12:28:35.783 100 transaction(s) with 216 command(s) were delivered.
2013-04-09 12:28:54.658 Delivering replicated transactions
2013-04-09 12:29:01.751 Delivering replicated transactions
2013-04-09 12:29:01.783 101 transaction(s) with 253 command(s) were delivered.
2013-04-09 12:29:14.377 100 transaction(s) with 162 command(s) were delivered.
2013-04-09 12:59:14.404
HYT00 Query timeout expired 0
2013-04-09 12:59:14.404 <stats state="2" fetch="1543" wait="71597" cmds="1252" callstogetreplcmds="519922"><sincelaststats elapsedtime="1838" fetch="0" wait="1838" cmds="1252" cmdspersec="0.000000"/></stats>
************************ STATISTICS SINCE AGENT STARTED ***********************
04-09-2013 07:59:14
Total Run Time (ms) : 1491874890 Total Work Time : 70358184
Total Num Trans : 1282424 Num Trans/Sec : 18.23
Total Num Cmds : 3599854 Num Cmds/Sec : 51.16
Total Idle Time : 1375300094
Writer Thread Stats
Total Number of Retries : 4
Time Spent on Exec : 4520346
Time Spent on Commits (ms): 149885 Commits/Sec : 1.26
Time to Apply Cmds (ms) : 8228658 Cmds/Sec : 437.48
Time Cmd Queue Empty (ms) : -1297083607 Empty Q Waits > 10ms: 24054
Total Time Request Blk(ms): 78216487
P2P Work Time (ms) : 0 P2P Cmds Skipped : 0
Reader Thread Stats
Calls to Retrieve Cmds : 519922
Time to Retrieve Cmds (ms): 70358184 Cmds/Sec : 51.16
Time Cmd Queue Full (ms) : 2784738 Full Q Waits > 10ms : 5585
#1Num Cmds : 774239 Exec (ms) : 4520346 Commit (ms) : 145004
Process (ms): 7831768 Last xact : 0x00068c0e0001f56e000c
#2Num Cmds : 802248 Exec (ms) : 4539826 Commit (ms) : 149885
Process (ms): 7910496 Last xact : 0x00068c0e0001f5fb000b
#3Num Cmds : 779073 Exec (ms) : 4451619 Commit (ms) : 131556
Process (ms): 7929972 Last xact : 0x00068c0e0001f62a000b
#4Num Cmds : 1244294 Exec (ms) : 6389841 Commit (ms) : 146076
Process (ms): 8228658 Last xact : 0x00068c0e0001f59d000d
Last global update to sub xact : 0x00068c0e0001f62a000b
*******************************************************************************
2013-04-09 12:59:14.404 Delivering replicated transactions
2013-04-09 12:59:14.404 Delivering replicated transactions
2013-04-09 12:59:14.404 Delivering replicated transactions
2013-04-09 12:59:14.404 <stats state="1" work="70358" idle="1375300"><reader fetch="1543" wait="71597"/><writer write="8228" wait="2997883"/><sincelaststats elapsedtime="1947" work="109" cmds="2146" cmdspersec="19.000000"><reader fetch="0" wait="1947"/><writer write="169" wait="1863"/></sincelaststats></stats>
********************************************************************************
Microsoft (R) SQL Server Replication Agent
A replication agent encountered a fatal error and was shut down. A mini-dump has been generated at the following location:
C:\Program Files\Microsoft SQL Server\100\Shared\ErrorDumps\ReplAgent20130409075914_0.mdmp
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
April 9, 2013 at 9:10 am
Is this the log reader agent or the distributor agent? You mention both.
What happens when the agent job restarts?
Do you have the job set to auto restart on failure (common recommendation)?
April 9, 2013 at 9:16 am
arnipetursson (4/9/2013)
Is this the log reader agent or the distributor agent? You mention both.What happens when the agent job restarts?
Do you have the job set to auto restart on failure (common recommendation)?
My mistake, this is the Replication Distributor. When restarting the job it runs just fine. I do not have it set up to auto restart on failure but do send a page/email to the DBA operator to alert of the error.
I s'pose there's no harm in setting it to retry at least once...
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
April 9, 2013 at 9:17 am
Actually, by default this agent is set up to retry at 1-minute intervals...
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
April 10, 2013 at 9:30 am
You say this occurs when massive updates are performed. Notice the error Query timeout expired in your log. Try increasing the Distribution Agent -QueryTimeout parameter. The default is 1800 seconds, or 30 minutes, which corresponds with the timestamps for which the timeout occurred.
2013-04-09 12:29:14.377 100 transaction(s) with 162 command(s) were delivered.
2013-04-09 12:59:14.404
HYT00 Query timeout expired 0
Try increasing the -QueryTimeout to 5400.
April 11, 2013 at 7:24 am
Also increase default value of the ReadBatchSize in Log reader agent & CommitBatchSize in Distribution Agent.
April 11, 2013 at 9:14 am
Thanks for the advice, I will try these and monitor over the upcoming weeks
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply