January 23, 2012 at 10:47 pm
We have a two node setup, Node and Disk Majority, Win Server 2008 with SQL Server 2008, Enterprise. Cluster configuration is set to restart on current node (15 min) if resource fails and to fail over all resources if restart is unsuccessful. One of the instances became unavailable, could not remotely connect to it and could not fail it over from the second node. Cluster manager froze up. Here is part of the cluster log right around the time this happened from the victim server:
00000988.000027f0::2012/01/19-23:32:23.350 ERR [API] s_ApiCloseNetwork: ERROR_INVALID_HANDLE(6)' because of 'Cannot unregister handle 11
'
00000988.000027f0::2012/01/19-23:32:23.350 ERR [API] s_ApiCloseNetwork: ERROR_INVALID_HANDLE(6)' because of 'Cannot unregister handle 10
'
00000988.000027f0::2012/01/19-23:32:23.350 ERR [API] s_ApiCloseNetwork: ERROR_INVALID_HANDLE(6)' because of 'Cannot unregister handle 8
'
00000988.000027f0::2012/01/19-23:32:23.350 ERR [API] s_ApiCloseNetwork: ERROR_INVALID_HANDLE(6)' because of 'Cannot unregister handle 7
What could this mean?
More errors after that:
[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
[sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Server Native Client 10.0]TCP Provider: An existing connection was forcibly closed by the remote host.
printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Server Native Client 10.0]Communication link failure
OnlineThread: QP is not online.
What is puzzling is why the resources did not fail over for about 45 minutes until we hard rebooted the unresponsive server?
Thanks.
January 24, 2012 at 6:25 am
Any troubleshooting tips?
January 24, 2012 at 11:55 am
If you can, I would bring both nodes down. Then power on the one with the problem first and see if the clusters go online, then bring the second node up.
This has worked for me on several occasions.
January 24, 2012 at 12:10 pm
Are you by any chance using NOD32 Antivirus and firewall?
Leo
Leo
Nothing in life is ever so complicated that with a little work it can't be made more complicated.
January 24, 2012 at 2:17 pm
_Beetlejuice (1/24/2012)
If you can, I would bring both nodes down. Then power on the one with the problem first and see if the clusters go online, then bring the second node up.This has worked for me on several occasions.
Had to do that eventually. Once restarted the victim server, it failed over properly to the other server but both servers stayed online only for several minutes before going offline again. So had to reboot the second server as well. :angry:
January 24, 2012 at 2:17 pm
Leo.Miller (1/24/2012)
Are you by any chance using NOD32 Antivirus and firewall?Leo
No antivirus.
January 25, 2012 at 11:11 am
Have you checked for any errors in the windows event logs?
January 25, 2012 at 11:53 am
check the cluster error events within failover cluster manager for more info
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
January 27, 2012 at 8:52 am
Did check the logs, but nothing stands out...
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply