December 21, 2017 at 12:23 am
Hi Experts,
The active server in 2 Node Active\Passive Cluster having another Always On AG node was not responding till we restart the server.
SQL server didnt failover to another node . DBAs observed the issue when they received below SCOM alerts.
1.An error occurred during recovery, preventing the database 'ABCCorp' (14:0) from restarting. Diagnose the recovery errors and fix them, or restore from a known good backup. If errors are not corrected or expected, contact Technical Support.
2.description: fcb::close-flush: Operating systemerror (null) encountered
3.The operating system returned error 170(The requested resource is in use.) to SQL Server during a read at offset 0000000000000000 in file 'I:\Log\ABCCorp_NEW_01162017_log.ldf'. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
Got below error from Eventviewer.
Cluster service failed to update the cluster configuration data on the witness resource. Please ensure that the witness resource is online and accessible.
The cluster service detected a problem with the witness resource. The witness resource will be failed over to another node within the cluster in an attempt to reestablish access to cluster configuration data.
The system failed to flush data to the transaction log. Corruption may occur in VolumeId: Q:, DeviceName: \Device\HarddiskVolume130.
({Device Busy}
The device is currently busy.)
Cluster resource 'Quorum' of type 'Physical Disk' in clustered role 'Cluster Group' failed.
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.
The system failed to flush data to the transaction log. Corruption may occur in VolumeId: J:, DeviceName: \Device\HarddiskVolume137.
({Device Busy}
The device is currently busy.)
I tried to use handle to figure out if the file is held by any other process but unfortunately the handle file was also through resource in use error.
Please Help.
December 26, 2017 at 12:30 am
Experts..Any idea on what went wrong?
December 26, 2017 at 8:15 am
Have you run a DBCC CheckDB? I'd be concerned about corruption here
December 26, 2017 at 11:04 am
Before that, have you tried accessing any of the drives that have been reported as a problem? I'm not an expert here but it sounds like a disk failure.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 27, 2017 at 1:49 am
Steve Jones - SSC Editor - Tuesday, December 26, 2017 8:15 AMHave you run a DBCC CheckDB? I'd be concerned about corruption here
Thanks Steve for the reply.
Databases was inaccessible and later did a failover to Availability Group. Ran CheckDB in all databases and everything looks healthy.
December 27, 2017 at 1:50 am
Jeff Moden - Tuesday, December 26, 2017 11:04 AMBefore that, have you tried accessing any of the drives that have been reported as a problem? I'm not an expert here but it sounds like a disk failure.
Thanks Jeff.
That was the first thing i did . Check all disk ,was expecting disk offline but all disk were online . I am suspecting Antivirus but the AV team said nothing was there in their log.
December 28, 2017 at 3:16 pm
Do you know if something was thrashing one of the disks? That is, some process doing some intense read or write? An anti-virus would make sense. If you approach the server and it has a disk drive you may hear the strong sound of a disk spinning at 100%.
----------------------------------------------------
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply