August 20, 2017 at 7:52 pm
We have done the following checks.
1) Regular Transaction log backups - every 30 minutes using CommVault
2) Always On is showing as synchronised
3) There's no Mirroring or CDC
4) log_reuse_wait_desc indicates "Log Backup", even after a log backup has occurred. (Done by CommVault)
5) Removing one of the databases from the AG seems to have fixed the problem for that DB. Further testing is still needed. This is not a long term solution.
Some odd things we have found, but that don't seem likely to cause an issue:
1) CommVault is being used - I've got no control of this.
2) All The DBs that don't have issues only getting one daily full and 30 minute Tran Log backups.
3) One problem DB is getting hourly full and 1/2 hourly log backups
4) The other problem DB appears to have backups to 2 virtual files, as we would do to improve backup times.
Does anyone have any other things we can look into? Are there any know issues like this? I've found one current, but not very helpful article an the web: VLF Status = 2
Cheers
Leo
Leo
Nothing in life is ever so complicated that with a little work it can't be made more complicated.
August 21, 2017 at 12:59 am
Are the log backups being done with no_truncate or copy_only options?
If you run a checkpoint after the log backup, what does the log_reuse_wait change to (it shows the thing that last prevented VLFs from being marked reusable)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 22, 2017 at 5:41 am
Leo.Miller - Sunday, August 20, 2017 7:52 PM1) Regular Transaction log backups - every 30 minutes using CommVault
30 mins may not be enough for a busy database
Leo.Miller - Sunday, August 20, 2017 7:52 PM
3) There's no Mirroring
And at the core, that's exactly what Availability Groups are 😉
Leo.Miller - Sunday, August 20, 2017 7:52 PM
4) log_reuse_wait_desc indicates "Log Backup", even after a log backup has occurred. (Done by CommVault)
is it taking a log backup and truncating the log?
Leo.Miller - Sunday, August 20, 2017 7:52 PM
5) Removing one of the databases from the AG seems to have fixed the problem for that DB.
Indicates that log usage due to a mirror session is a possible issue
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
August 22, 2017 at 6:15 pm
We were looking to see if the AoHA (mirroring) was causing the problem, but the log_reuse_wait_desc doesn't show this. The AoHA status shows as being Synchronised and testing show no outstanding synchronised updates. Updates on the primary are visible on the secondary within milliseconds.
No, no_truncate and/or copy_only are not being used.
This is a live system so the testing we can do is limited. We can't leave the DBs in the AG without some risk, but every time we add it back to test something the Sys Admin throws his toys out the cot because our full backup to establish the AG messes with his CommVault schedule (apparently). Hence we are still looking for a chance to check what doing a manual checkpoint shows up.
All the databases are using the same routines, same backup config via CommVault, same AoHA group, etc. but only two of the DBs are showing this odd behaviour, MailMarshal and LANSweeper.
Thanks for the ideas so far.
Leo
Leo
Nothing in life is ever so complicated that with a little work it can't be made more complicated.
August 23, 2017 at 3:52 am
Leo.Miller - Tuesday, August 22, 2017 6:15 PMbut every time we add it back to test something the Sys Admin throws his toys out the cot because our full backup to establish the AG messes with his CommVault schedule (apparently).
And he'd be correct, you don't need to take a new full backup, just pull the existing full and any logs to reinitialise the db.
Use the following to monitor send rate and size, redo rate and size
SELECT ag.name AS [AG Name], ar.replica_server_name, ar.availability_mode_desc, adc.[database_name],
drs.synchronization_state_desc, drs.is_commit_participant,
drs.synchronization_health_desc, drs.last_sent_time, drs.last_received_time,
drs.last_hardened_time, drs.last_redone_time, drs.log_send_queue_size,
drs.log_send_rate, drs.redo_queue_size, drs.redo_rate, drs.last_commit_time, drs.database_state_desc
FROM sys.dm_hadr_database_replica_states AS drs
INNER JOIN sys.availability_databases_cluster AS adc
ON drs.group_id = adc.group_id
AND drs.group_database_id = adc.group_database_id
INNER JOIN sys.availability_groups AS ag
ON ag.group_id = drs.group_id
INNER JOIN sys.availability_replicas AS ar
ON drs.group_id = ar.group_id
AND drs.replica_id = ar.replica_id
WHERE drs.is_local <> 1
ORDER BY ag.name, ar.replica_server_name, adc.[database_name]
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
August 24, 2017 at 4:23 pm
Perry Whittle - Wednesday, August 23, 2017 3:52 AMLeo.Miller - Tuesday, August 22, 2017 6:15 PMbut every time we add it back to test something the Sys Admin throws his toys out the cot because our full backup to establish the AG messes with his CommVault schedule (apparently).And he'd be correct, you don't need to take a new full backup, just pull the existing full and any logs to reinitialise the db.
Use the following to monitor send rate and size, redo rate and size
Easier said than done. We don't have access to the CommVault backups and getting the SysAdmin to do this for us is a nightmare. One of the joys of being a consultant.
I've attached the file. Names have been changed, but the Primary is 02 and the secondary is 03, the problem DB is App02. The other one that was causing a problem isn't in the AG at the moment.
The only thing I can see as a potential problem is that the Redo_Rate looks a bit low, but this is quite a bit lower than the send rate for all databases, so this may be normal. I've also compared other AoAG servers for this client and these show similar numbers, so they don't look like an issue.
I'm interested in hearing what you think.
Leo
Nothing in life is ever so complicated that with a little effort it cant be made more complicated.
Leo
Nothing in life is ever so complicated that with a little work it can't be made more complicated.
August 25, 2017 at 2:41 am
Leo.Miller - Tuesday, August 22, 2017 6:15 PMevery time we add it back to test something the Sys Admin throws his toys out the cot because our full backup to establish the AG messes with his CommVault schedule (apparently).
I don't think this is a great attitude on his part, and I wouldn't be deferring to it if it were me. You are charged with keeping your Production environment working, and he should respect that. (Anyway, you could use copy-only backups (Full and Log) to re-synch the AGs.)
Also, have you double and triple-checked that his Commvault log backups are definitely not copy-only, by checking in msdb.dbo.backupset (there's a column called is_copy_only)? Sorry if you've done that already, but I wasn't sure if you're having to go on what your sysadmin tells you.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply