SQL server service restarted frequently in cluster

  • One of our cluster box is on Windows Sever 2008 R2 Enterprise where we have Microsoft SQL Server 2012 (SP2) services on one node and SQL Server 2012 Analysis server on the second node. Its and active-active setup. Recently we are facing problem for frequent restart of SQL services. During last 4 months it has restarted almost 5-6 time.

    We are not able to find out any error in windows event log which might give reason of restarting SQL services. We checked cluster log we found below error messages where “SQL Server component 'query_processing' health state has been changed from 'clean' to 'warning'” just before the SQL services restarted.

    If anyone has faced similar problem and have resolution for the same will be really helpful.

    2016/12/22-13:12:15.029 INFO [RES] SQL Server <SQL Server>: [sqsrvres] SQL Server component 'query_processing' health state has been changed from 'clean' to 'warning' at 2016-12-22 13:12:15.027

    2016/12/22-13:12:35.044 INFO [RES] SQL Server <SQL Server>: [sqsrvres] SQL Server component 'query_processing' health state has been changed from 'warning' to 'clean' at 2016-12-22 13:12:35.043

    2016/12/22-13:14:02.033 INFO [NM] Received request from client address SERVERNAME.

    2016/12/22-13:16:58.242 ERR [RES] SQL Server <SQL Server>: [sqsrvres] Failure detected, diagnostics heartbeat is lost

    2016/12/22-13:16:58.242 INFO [RES] SQL Server <SQL Server>: [sqsrvres] IsAlive returns FALSE

    2016/12/22-13:16:58.242 WARN [RHS] Resource SQL Server IsAlive has indicated failure.

    2016/12/22-13:16:58.242 INFO [RCM] HandleMonitorReply: FAILURENOTIFICATION for 'SQL Server', gen(10) result 1.

    2016/12/22-13:16:58.242 INFO [RCM] TransitionToState(SQL Server) Online-->ProcessingFailure.

    2016/12/22-13:16:58.242 ERR [RCM] rcm::RcmResource::HandleFailure: (SQL Server)

    2016/12/22-13:16:58.242 INFO [RCM] resource SQL Server: failure count: 2, restartAction: 2.

    2016/12/22-13:16:58.242 INFO [RCM] Greater than restartPeriod time has elapsed since first failure, resetting failureTime and failureCount.

    2016/12/22-13:16:58.242 INFO [RCM] Will restart resource in 500 milliseconds.

    2016/12/22-13:16:58.242 INFO [RCM] TransitionToState(SQL Server) ProcessingFailure-->[WaitingToTerminate to DelayRestartingResource].

    2016/12/22-13:16:58.242 INFO [RCM] rcm::RcmGroup::UpdateStateIfChanged: (SQL Server (MSSQLSERVER), Online --> Pending)

    2016/12/22-13:16:58.242 INFO [RCM] TransitionToState(SQL Server Agent) Online-->[WaitingToTerminate to OnlineCallIssued].

    2016/12/22-13:16:58.242 INFO [RCM] TransitionToState(SQL Server Agent) [WaitingToTerminate to OnlineCallIssued]-->[Terminating to OnlineCallIssued].

  • there are indications that the isalive check has failed this will initiate a restart.

    Can you provide more detail on the network setup on the nodes.

    Are these virtual machines or physical machines

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • They are Physical boxes..

    I checked with respective team, but there is no disturbance found at network level or infrastructure level.

    Event error log as below:

    Cluster resource 'SQL Server' in clustered service or application 'SQL Server [MSSQLSERVER]' failed.

  • you clearly have some sort of network or config issue.

    Can you provide more detail on the network setup on the machines

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • what exactly you looking for on network setup?

  • no of NICs?

    Teamed?

    No of networks?

    etc, etc

    Throw a dog a bone here!!

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Check for an update version of antivirus software? A certain version caused unexpected cluster shutdowns at our firm

  • Which antivirus software & version causes cluster SQL restart at your firm?

  • It was Norton antivirus some months ago. Can't tell the version

    They have released a new version in november 2016, which didn't have the problem

  • Okay..what type of evidence you found in logs that confirms SQL service was restarted because of Antivirus?

  • I don't have the details as the infrastructure team was assigned to it. After excluding other sources, the plausible cause was software.

    The antivirus vendor supplied a diagnostic utility (to detect if the random clusterfailure was triggered by their software)

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply