Node failed to come up

Question

Post reply

Node failed to come up

Lexa

Hall of Fame

Points: 3314
More actions
January 23, 2012 at 10:47 pm

#255342

We have a two node setup, Node and Disk Majority, Win Server 2008 with SQL Server 2008, Enterprise. Cluster configuration is set to restart on current node (15 min) if resource fails and to fail over all resources if restart is unsuccessful. One of the instances became unavailable, could not remotely connect to it and could not fail it over from the second node. Cluster manager froze up. Here is part of the cluster log right around the time this happened from the victim server:
00000988.000027f0::2012/01/19-23:32:23.350 ERR [API] s_ApiCloseNetwork: ERROR_INVALID_HANDLE(6)' because of 'Cannot unregister handle 11
'
00000988.000027f0::2012/01/19-23:32:23.350 ERR [API] s_ApiCloseNetwork: ERROR_INVALID_HANDLE(6)' because of 'Cannot unregister handle 10
'
00000988.000027f0::2012/01/19-23:32:23.350 ERR [API] s_ApiCloseNetwork: ERROR_INVALID_HANDLE(6)' because of 'Cannot unregister handle 8
'
00000988.000027f0::2012/01/19-23:32:23.350 ERR [API] s_ApiCloseNetwork: ERROR_INVALID_HANDLE(6)' because of 'Cannot unregister handle 7
What could this mean?
More errors after that:
[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
[sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Server Native Client 10.0]TCP Provider: An existing connection was forcibly closed by the remote host.
printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Server Native Client 10.0]Communication link failure
OnlineThread: QP is not online.
What is puzzling is why the resources did not fail over for about 45 minutes until we hard rebooted the unresponsive server?
Thanks.

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Lexa Hall of Fame Points: 3314 More actions · Answer 1

Lexa

Hall of Fame

Points: 3314

January 24, 2012 at 6:25 am

#1437633

Any troubleshooting tips?

_Beetlejuice Ten Centuries Points: 1064 More actions · Answer 2

If you can, I would bring both nodes down. Then power on the one with the problem first and see if the clusters go online, then bring the second node up.

This has worked for me on several occasions.

Leo.Miller SSChampion Points: 13070 More actions · Answer 3

Are you by any chance using NOD32 Antivirus and firewall?

Leo

Leo
Nothing in life is ever so complicated that with a little work it can't be made more complicated.

Lexa Hall of Fame Points: 3314 More actions · Answer 4

_Beetlejuice (1/24/2012)
If you can, I would bring both nodes down. Then power on the one with the problem first and see if the clusters go online, then bring the second node up.
This has worked for me on several occasions.

Had to do that eventually. Once restarted the victim server, it failed over properly to the other server but both servers stayed online only for several minutes before going offline again. So had to reboot the second server as well. :angry:

Lexa Hall of Fame Points: 3314 More actions · Answer 5

Leo.Miller (1/24/2012)
Are you by any chance using NOD32 Antivirus and firewall?
Leo

No antivirus.

_Beetlejuice Ten Centuries Points: 1064 More actions · Answer 6

Have you checked for any errors in the windows event logs?

Perry Whittle SSC Guru Points: 234065 More actions · Answer 7

check the cluster error events within failover cluster manager for more info

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" 😉

Lexa Hall of Fame Points: 3314 More actions · Answer 8

Lexa

Hall of Fame

Points: 3314

January 27, 2012 at 8:52 am

#1439440

Did check the logs, but nothing stands out...