SQL Server 2008 cluster node going down unexpectedly

Question

Post reply

SQL Server 2008 cluster node going down unexpectedly

Len-153367

Say Hey Kid

Points: 672
More actions
May 1, 2013 at 11:17 am

#277332

Last night our primary SQL Server node went down and failed over to the secondary node.
I was actually on the server at the moment having just launched a trace to troubleshoot a particular query when suddenly I lost all connectivity to SQL Server.
Our setup is:
Microsoft SQL Server 2008 R2 (SP1) - 10.50.2796.0 (X64) 2 Node Active/Passive Cluster.
Here is what I found in the Administrative Log :
[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
[sqsrvres] printODBCError: sqlstate = HYT00; native error = 0; message = [Microsoft][SQL Server Native Client 10.0]Query timeout expired
[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Server Native Client 10.0]The connection is no longer usable because the server failed to respond to a command cancellation for a previously executed statement in a timely manner. Possible causes include application deadlocks or the server being overloaded. Open a new connection and re-try the operation.
We have SQL Server and SQL Server agent are running under designated network accounts.
SQL Server Browser is running under a Local account.
Never had that issue before in 2 years we've been using the server.
The SQL Server error log did not reveal much. The very last event in the error log before the node went down is:
2013-04-30 20:06:48.970spid133SQL Trace ID 2 was started by login "sa".
Thank you for your help

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

arnipetursson SSCertifiable Points: 6557 More actions · Answer 1

arnipetursson

SSCertifiable

Points: 6557

May 1, 2013 at 3:25 pm

#1611735

What is in the windows error log?

Len-153367 Say Hey Kid Points: 672 More actions · Answer 2

Administrative log was the most informative:

[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed

[sqsrvres] printODBCError: sqlstate = HYT00; native error = 0; message = [Microsoft][SQL Server Native Client 10.0]Query timeout expired

[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed

[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Server Native Client 10.0]The connection is no longer usable because the server failed to respond to a command cancellation for a previously executed statement in a timely manner. Possible causes include application deadlocks or the server being overloaded. Open a new connection and re-try the operation

System Log:

Cluster resource 'SQL Server' in clustered service or application 'SQL Server (MSSQLSERVER)' failed.

Application log:

[sqagtres] SvcStop: service did not stop; giving up.

[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed

[sqsrvres] printODBCError: sqlstate = 08S01; native error = 40; message = [Microsoft][SQL Server Native Client 10.0]TCP Provider: The specified network name is no longer available.

Error4/30/2013 8:16:55 PMMSSQLSERVER19019Failover

arnipetursson SSCertifiable Points: 6557 More actions · Answer 3

What is in the system log before the SQL Server cluster resource became unavailable?

vandana.1103 SSC Veteran Points: 204 More actions · Answer 4

Hello All,

I am also facing the similar issue.

Please let e know if the issue was resolved and share the resolution.

Regards,

Vandy

Vinod Pal Old Hand Points: 345 More actions · Answer 5

Just to give you some background clustering works on a heartbeat which is configured in Failover Cluster Manager for each each clustered resource or cluster goup.

What were you collecting in your trace and was it through the Profiler GUI - if this was a large volume of events the server\instance could have been too busy to respond to the heartbeat (health check) and as a result the failover occurred.