December 27, 2016 at 12:00 am
I have a transactional replication configured as per below:
ServerA.DBA (pub) --> ServerB.DBB (Sub)
ServerB.DBB (pub1) --> ServerC.DBC (Sub)
ServerB.DBB (pub2) --> ServerD.DBD (Sub)
ServerB.DBB (pub3) --> ServerE.DBE (Sub)
ServerB.DBB (pub4) --> ServerF.DBF (Sub)
ServerB.DBB (pub5) --> ServerG.DBG (Sub)
Today, I got an alert saying that my replication ServerA.DBA (pub) --> ServerB.DBB (Sub) is failing with below error:
Replication-Replication Distribution Subsystem: agent failed. The transaction log for database is full due to 'REPLICATION'.
Upon investigation, I found that my log file was capped at 100GB on ServerB.DBB which was causing the issue. To fix this, I added a new log file as I was unable to extend the existing log file due to same error. This fixed my issue and replication caught up in some time. However, now I am unable to shrink the log file and it always shows my log file utilization as 99% and increasing. This raised some doubts in my mind whether my replication from Server B to C,D,E,F and G is actually running or not and why I am unable to shrink the log file.
Log_reuse_wait_desc on my ServerB.DBB always shows 'REPLICATION'. I ran DBCC OPENTRAN on my ServerB.DBB and it shows below:
Replicated Transaction Information:
Oldest distributed LSN : (0:0:0)
Oldest non-distributed LSN : (10417370:9406:1)
This value is not changing since many hours now. Does it mean my replication from Server B to C,D,E,F and G is not working? However when I see in replication monitor it shows everything as fine. i.e. logreader, all distributor agents are running and moving transactions for all publications. Why I am unable to shrink the log file and what should I do to shrink the log file?
Also, want to add that CDC is not enabled on any of my DBs involved in replication. DBCC LOGINFO shows all my VLFs in use.
December 27, 2016 at 10:45 am
Log_reuse_wait_desc on my ServerB.DBB always shows 'REPLICATION'. I ran DBCC OPENTRAN on my ServerB.DBB and it shows below:
Replicated Transaction Information:
Oldest distributed LSN : (0:0:0)
Oldest non-distributed LSN : (10417370:9406:1)
This value is not changing since many hours now. Does it mean my replication from Server B to C,D,E,F and G is not working? However when I see in replication monitor it shows everything as fine. i.e. logreader, all distributor agents are running and moving transactions for all publications. Why I am unable to shrink the log file and what should I do to shrink the log file?
Also, want to add that CDC is not enabled on any of my DBs involved in replication. DBCC LOGINFO shows all my VLFs in use.
That's normal output from DBCC OPENTRAN when you have replication set up. It returns the oldest distributed and non-distributed LSN numbers.
Have you tried executing some of the replication checks such as sp_repltrans, sp_replshowcmds. There are probably others but you want to get a look at any possible pending transactions.
Sue
December 28, 2016 at 6:08 am
Thanks Sue for your reply! Regarding distributed and non distributed LSN, my understanding is it should change continuously depending on your transaction flow. But in this case it is constant which lead me to think if my replication is actually working. I stopped my log reader agent job and ran sp_repltrans, sp_replshowcmds. It didn't show anything for some time. After a min it showed few records and stayed there for some time. When I restarted my log reader agent job, it cleared all those records. This makes me believe that my log reader agent is working. So not sure what is preventing it from getting shrunk and why it thinks replication is the reason behind it.
December 28, 2016 at 10:12 am
Are the databases in full recovery? Are you doing frequent log backups? Are the log backups succeeding? Did you try doing two log backups in succession and then checking the VLFs?
Sue
December 28, 2016 at 10:26 am
Also please look for any uncommitted transactions sysprocesses/ dm_exec_sessions
December 28, 2016 at 11:49 pm
DB is in Simple recovery mode.
December 28, 2016 at 11:58 pm
open_tran > 0 gives me 2 rows both showing distribution agent as program name. This is because there are 2 other subscriptions on this server. This doesn't look to be an issue as I have checked other servers where replication is configured and that also shows the same.
December 30, 2016 at 5:45 am
Good day
I had a similar problem a while ago. The only way to solve it was to flush the log.
http://www.cb-net.co.uk/microsoft-articles/sql/sql-database-replication-repairing/
December 30, 2016 at 9:46 am
I already did sp_repldone once and was able to shrink the log file but it is happening again. Also it caused data inconsistency and fixing it is a pain.
December 30, 2016 at 9:58 am
stepping back...since you've been troubleshooting for a week on three different sites, might I suggest either opening a case with MS, or re-initializing. The re-publishing thing will make this more cumbersome, but it might be worth it to not have to add yet another log file...
------------------------------------------------------------------------------------------------Standing in the gap between Consultant and ContractorKevin3NFDallasDBAs.com/BlogWhy is my SQL Log File HUGE?!?![/url]The future of the DBA role...[/url]SQL Security Model in Plain English[/url]
December 30, 2016 at 10:24 am
Thanks Kevin! That's the only option left now. :crying:
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply