db suspect & error

Question

db suspect & error

ramana3327

SSCoach

Points: 19722
More actions
February 18, 2016 at 10:38 pm

#324430

We have SQL 2014 cluster in vmware. Suddenly the user db is in suspect mode.
From the event viewer the sequences of events
1) Backup database successfully processed (around 11 pm). Scheduled full backup job ran that time
2) The desktop window manager has exited with (around 7 am)
3) SQLServerLogMgr::LogWriter: Operating system error 170(The requested resource is in use) (around 2pm)
4) The log for database "testdb" is not available. Check the event log for related error messages. Resolve any errors and restart the database (around 2 pm)
5) Database "testdb" was shutdown due to error 9001 in routine 'XdesRMFull::CommitInternal'. Restart for non-snapshot databases will be attempted after all connections to the database are aborted (Around 2pm)
6)fcb:close-flush Operating system error (null) encountered (around 2 om)
7) fcb:close-flush Operating system error (null) encountered (around 2 pm)
8) Starting up database "testdb" (around 2 pm)
From step 3 - step 8 has same timestamp
9) 55 transactions rolled forward in database "testdb". This is info msg only. No user action required
10)FCB::ZeroFile(), GetOverLappedResult(): Operating system error 170(The requested resource is in use) (Time stamp 4 sec after step8)
11) An error occured during recovery, preventing db "Testdb" from restarting. Diagnose the recovery errors and fix them or restore from a known good backup. If errors not corrected or expected, contact technical support
12) Login failed for the user "TestUser". Reason: Failed to open the explicitly specified database
13) Continuous login failures for 1 hr
14) SQLServerLogMgr::LogWriter: Operating system error 170(The requested resource is in use) (around 3pm)
15) The log for database "tempdb" is not available. Check the event log for related error messages. Resolve any errors and restart the database (around 3 pm)
Continuously seen these msg.
So restarted the active node. It makes the SQL failover. After that able to see the DBs normally.
No clue what happened?

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Eddie Wuerch SSChampion Points: 12535 More actions · Answer 1

Check the Windows logs on the server where the errors were thrown, because it looks like you lost the transaction log file, or access to it. My gut says you lost the disk/LUN that held the transaction log, or it was locked by the host/SAN/whatever. Failing the instance over disconnected the disks from the first node and connects them to the new one. That can often clear issues with 'stuck' disks.

Are the data files on the same disk/LUN as the transaction log file? If so, I encourage you to get a DBCC CHECKDB going as soon as you can. Flapping disks usually resolve without incident, but they can also introduce corruption in your data files that can lay dormant for a very long time.

-Eddie

Eddie Wuerch
MCM: SQL

Perry Whittle SSC Guru Points: 234013 More actions · Answer 2

have you run a cluster validation report

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" 😉

ramana3327 SSCoach Points: 19722 More actions · Answer 3

After the failover I can SQL Server automatically did the DBCC.

We have 3 disks. 1 for quoram and 1 logs and 1 for data

I see an error The run-time environment was unable to initialize for transacctions required to support transactional. Make sure that MS-DTC is running (I can see this from Application logs around 9 am)

The open procedure for service BITS in DLL C:\\windows\system32\bitsperf.dll failed. Performance for this service will bot be available. The first four bytes (DWORD) of the data section contains the error code.

I believe that there will default extended events runs from SQL 2012. Is that will help us to find route cause?

kevaburg SSCoach Points: 18131 More actions · Answer 4

I reckon Eddie is on the right track. We had a Situation whereby a LUN was accidentally removed as a Cluster resource and that threw a spanner in our works.

I would be tempted to restore from the latest backup. Because it seems your TempDB seems to be on the same Volumes as your user and System databases I am guessing it isn't enormously transactional and you could withstand the Impact of a few minutes for the Job.

ramana3327 SSCoach Points: 19722 More actions · Answer 5

I am not sure what happened exactly. I think dbs stand 2-3 hr on that node after the issue start. Then we failover to Node B. It started working fine.

I run dbcc check db next day, it ran fine. After that second day we failover to the same node A now it looks fine no errors found so far

Michael Lynn SSC Veteran Points: 229 More actions · Answer 6

Which version of vmware are you running? We are seeing the exact same sequence of events on ESXi 6.0 update 1b. We have been able to recreate the situation by v-motioning the vm where SQL Server is running.