Frequent restart of SQL services in cluster

  • One of our cluster box is on Windows Sever 2008 R2 Enterprise where we have Microsoft SQL Server 2012 (SP2) services on one node and SQL Server 2012 Analysis server on the second node. Its and active-active setup. Recently we are facing problem for frequent restart of SQL services. During last 4 months it has restarted almost 5-6 time.

    We are not able to find out any error in windows event log which might give reason of restarting SQL services. We checked cluster log we found below error messages where “SQL Server component 'query_processing' health state has been changed from 'clean' to 'warning'” just before the SQL services restarted.

    If anyone has faced similar problem and have resolution for the same will be really helpful.

    2016/12/22-13:12:15.029 INFO [RES] SQL Server <SQL Server>: [sqsrvres] SQL Server component 'query_processing' health state has been changed from 'clean' to 'warning' at 2016-12-22 13:12:15.027

    2016/12/22-13:12:35.044 INFO [RES] SQL Server <SQL Server>: [sqsrvres] SQL Server component 'query_processing' health state has been changed from 'warning' to 'clean' at 2016-12-22 13:12:35.043

    2016/12/22-13:14:02.033 INFO [NM] Received request from client address SERVERNAME.

    2016/12/22-13:16:58.242 ERR [RES] SQL Server <SQL Server>: [sqsrvres] Failure detected, diagnostics heartbeat is lost

    2016/12/22-13:16:58.242 INFO [RES] SQL Server <SQL Server>: [sqsrvres] IsAlive returns FALSE

    2016/12/22-13:16:58.242 WARN [RHS] Resource SQL Server IsAlive has indicated failure.

    2016/12/22-13:16:58.242 INFO [RCM] HandleMonitorReply: FAILURENOTIFICATION for 'SQL Server', gen(10) result 1.

    2016/12/22-13:16:58.242 INFO [RCM] TransitionToState(SQL Server) Online-->ProcessingFailure.

    2016/12/22-13:16:58.242 ERR [RCM] rcm::RcmResource::HandleFailure: (SQL Server)

    2016/12/22-13:16:58.242 INFO [RCM] resource SQL Server: failure count: 2, restartAction: 2.

    2016/12/22-13:16:58.242 INFO [RCM] Greater than restartPeriod time has elapsed since first failure, resetting failureTime and failureCount.

    2016/12/22-13:16:58.242 INFO [RCM] Will restart resource in 500 milliseconds.

    2016/12/22-13:16:58.242 INFO [RCM] TransitionToState(SQL Server) ProcessingFailure-->[WaitingToTerminate to DelayRestartingResource].

    2016/12/22-13:16:58.242 INFO [RCM] rcm::RcmGroup::UpdateStateIfChanged: (SQL Server (MSSQLSERVER), Online --> Pending)

    2016/12/22-13:16:58.242 INFO [RCM] TransitionToState(SQL Server Agent) Online-->[WaitingToTerminate to OnlineCallIssued].

    2016/12/22-13:16:58.242 INFO [RCM] TransitionToState(SQL Server Agent) [WaitingToTerminate to OnlineCallIssued]-->[Terminating to OnlineCallIssued].

  • Moved this to the SQL 2012 forum.

    I see in the errors the heartbeat is lost. Are your two (or more) nodes on a separate network? Or are you sure there are not issues with a witness?

  • answered here

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply