February 18, 2016 at 10:38 pm
We have SQL 2014 cluster in vmware. Suddenly the user db is in suspect mode.
From the event viewer the sequences of events
1) Backup database successfully processed (around 11 pm). Scheduled full backup job ran that time
2) The desktop window manager has exited with (around 7 am)
3) SQLServerLogMgr::LogWriter: Operating system error 170(The requested resource is in use) (around 2pm)
4) The log for database "testdb" is not available. Check the event log for related error messages. Resolve any errors and restart the database (around 2 pm)
5) Database "testdb" was shutdown due to error 9001 in routine 'XdesRMFull::CommitInternal'. Restart for non-snapshot databases will be attempted after all connections to the database are aborted (Around 2pm)
6)fcb:close-flush Operating system error (null) encountered (around 2 om)
7) fcb:close-flush Operating system error (null) encountered (around 2 pm)
8) Starting up database "testdb" (around 2 pm)
From step 3 - step 8 has same timestamp
9) 55 transactions rolled forward in database "testdb". This is info msg only. No user action required
10)FCB::ZeroFile(), GetOverLappedResult(): Operating system error 170(The requested resource is in use) (Time stamp 4 sec after step8)
11) An error occured during recovery, preventing db "Testdb" from restarting. Diagnose the recovery errors and fix them or restore from a known good backup. If errors not corrected or expected, contact technical support
12) Login failed for the user "TestUser". Reason: Failed to open the explicitly specified database
13) Continuous login failures for 1 hr
14) SQLServerLogMgr::LogWriter: Operating system error 170(The requested resource is in use) (around 3pm)
15) The log for database "tempdb" is not available. Check the event log for related error messages. Resolve any errors and restart the database (around 3 pm)
Continuously seen these msg.
So restarted the active node. It makes the SQL failover. After that able to see the DBs normally.
No clue what happened?
February 19, 2016 at 12:26 am
Check the Windows logs on the server where the errors were thrown, because it looks like you lost the transaction log file, or access to it. My gut says you lost the disk/LUN that held the transaction log, or it was locked by the host/SAN/whatever. Failing the instance over disconnected the disks from the first node and connects them to the new one. That can often clear issues with 'stuck' disks.
Are the data files on the same disk/LUN as the transaction log file? If so, I encourage you to get a DBCC CHECKDB going as soon as you can. Flapping disks usually resolve without incident, but they can also introduce corruption in your data files that can lay dormant for a very long time.
-Eddie
Eddie Wuerch
MCM: SQL
February 19, 2016 at 2:33 am
have you run a cluster validation report
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
February 19, 2016 at 10:56 am
After the failover I can SQL Server automatically did the DBCC.
We have 3 disks. 1 for quoram and 1 logs and 1 for data
I see an error The run-time environment was unable to initialize for transacctions required to support transactional. Make sure that MS-DTC is running (I can see this from Application logs around 9 am)
The open procedure for service BITS in DLL C:\\windows\system32\bitsperf.dll failed. Performance for this service will bot be available. The first four bytes (DWORD) of the data section contains the error code.
I believe that there will default extended events runs from SQL 2012. Is that will help us to find route cause?
February 20, 2016 at 5:57 am
I reckon Eddie is on the right track. We had a Situation whereby a LUN was accidentally removed as a Cluster resource and that threw a spanner in our works.
I would be tempted to restore from the latest backup. Because it seems your TempDB seems to be on the same Volumes as your user and System databases I am guessing it isn't enormously transactional and you could withstand the Impact of a few minutes for the Job.
February 21, 2016 at 11:08 pm
I am not sure what happened exactly. I think dbs stand 2-3 hr on that node after the issue start. Then we failover to Node B. It started working fine.
I run dbcc check db next day, it ran fine. After that second day we failover to the same node A now it looks fine no errors found so far
February 27, 2016 at 11:59 pm
Which version of vmware are you running? We are seeing the exact same sequence of events on ESXi 6.0 update 1b. We have been able to recreate the situation by v-motioning the vm where SQL Server is running.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply