Frequent restart of SQL services in cluster

Question

Frequent restart of SQL services in cluster

haware_amol-811900

Mr or Mrs. 500

Points: 568
More actions
January 5, 2017 at 4:27 am

#316590

One of our cluster box is on Windows Sever 2008 R2 Enterprise where we have Microsoft SQL Server 2012 (SP2) services on one node and SQL Server 2012 Analysis server on the second node. Its and active-active setup. Recently we are facing problem for frequent restart of SQL services. During last 4 months it has restarted almost 5-6 time.
We are not able to find out any error in windows event log which might give reason of restarting SQL services. We checked cluster log we found below error messages where “SQL Server component 'query_processing' health state has been changed from 'clean' to 'warning'” just before the SQL services restarted.
If anyone has faced similar problem and have resolution for the same will be really helpful.
2016/12/22-13:12:15.029 INFO [RES] SQL Server <SQL Server>: [sqsrvres] SQL Server component 'query_processing' health state has been changed from 'clean' to 'warning' at 2016-12-22 13:12:15.027
2016/12/22-13:12:35.044 INFO [RES] SQL Server <SQL Server>: [sqsrvres] SQL Server component 'query_processing' health state has been changed from 'warning' to 'clean' at 2016-12-22 13:12:35.043
2016/12/22-13:14:02.033 INFO [NM] Received request from client address SERVERNAME.
2016/12/22-13:16:58.242 ERR [RES] SQL Server <SQL Server>: [sqsrvres] Failure detected, diagnostics heartbeat is lost
2016/12/22-13:16:58.242 INFO [RES] SQL Server <SQL Server>: [sqsrvres] IsAlive returns FALSE
2016/12/22-13:16:58.242 WARN [RHS] Resource SQL Server IsAlive has indicated failure.
2016/12/22-13:16:58.242 INFO [RCM] HandleMonitorReply: FAILURENOTIFICATION for 'SQL Server', gen(10) result 1.
2016/12/22-13:16:58.242 INFO [RCM] TransitionToState(SQL Server) Online-->ProcessingFailure.
2016/12/22-13:16:58.242 ERR [RCM] rcm::RcmResource::HandleFailure: (SQL Server)
2016/12/22-13:16:58.242 INFO [RCM] resource SQL Server: failure count: 2, restartAction: 2.
2016/12/22-13:16:58.242 INFO [RCM] Greater than restartPeriod time has elapsed since first failure, resetting failureTime and failureCount.
2016/12/22-13:16:58.242 INFO [RCM] Will restart resource in 500 milliseconds.
2016/12/22-13:16:58.242 INFO [RCM] TransitionToState(SQL Server) ProcessingFailure-->[WaitingToTerminate to DelayRestartingResource].
2016/12/22-13:16:58.242 INFO [RCM] rcm::RcmGroup::UpdateStateIfChanged: (SQL Server (MSSQLSERVER), Online --> Pending)
2016/12/22-13:16:58.242 INFO [RCM] TransitionToState(SQL Server Agent) Online-->[WaitingToTerminate to OnlineCallIssued].
2016/12/22-13:16:58.242 INFO [RCM] TransitionToState(SQL Server Agent) [WaitingToTerminate to OnlineCallIssued]-->[Terminating to OnlineCallIssued].

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

Steve Jones - SSC Editor SSC Guru Points: 741233 More actions · Answer 1

Moved this to the SQL 2012 forum.

I see in the errors the heartbeat is lost. Are your two (or more) nodes on a separate network? Or are you sure there are not issues with a witness?

Perry Whittle SSC Guru Points: 234065 More actions · Answer 2

answered here

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" 😉