January 5, 2017 at 4:48 am
One of our cluster box is on Windows Sever 2008 R2 Enterprise where we have Microsoft SQL Server 2012 (SP2) services on one node and SQL Server 2012 Analysis server on the second node. Its and active-active setup. Recently we are facing problem for frequent restart of SQL services. During last 4 months it has restarted almost 5-6 time.
We are not able to find out any error in windows event log which might give reason of restarting SQL services. We checked cluster log we found below error messages where “SQL Server component 'query_processing' health state has been changed from 'clean' to 'warning'” just before the SQL services restarted.
If anyone has faced similar problem and have resolution for the same will be really helpful.
2016/12/22-13:12:15.029 INFO [RES] SQL Server <SQL Server>: [sqsrvres] SQL Server component 'query_processing' health state has been changed from 'clean' to 'warning' at 2016-12-22 13:12:15.027
2016/12/22-13:12:35.044 INFO [RES] SQL Server <SQL Server>: [sqsrvres] SQL Server component 'query_processing' health state has been changed from 'warning' to 'clean' at 2016-12-22 13:12:35.043
2016/12/22-13:14:02.033 INFO [NM] Received request from client address SERVERNAME.
2016/12/22-13:16:58.242 ERR [RES] SQL Server <SQL Server>: [sqsrvres] Failure detected, diagnostics heartbeat is lost
2016/12/22-13:16:58.242 INFO [RES] SQL Server <SQL Server>: [sqsrvres] IsAlive returns FALSE
2016/12/22-13:16:58.242 WARN [RHS] Resource SQL Server IsAlive has indicated failure.
2016/12/22-13:16:58.242 INFO [RCM] HandleMonitorReply: FAILURENOTIFICATION for 'SQL Server', gen(10) result 1.
2016/12/22-13:16:58.242 INFO [RCM] TransitionToState(SQL Server) Online-->ProcessingFailure.
2016/12/22-13:16:58.242 ERR [RCM] rcm::RcmResource::HandleFailure: (SQL Server)
2016/12/22-13:16:58.242 INFO [RCM] resource SQL Server: failure count: 2, restartAction: 2.
2016/12/22-13:16:58.242 INFO [RCM] Greater than restartPeriod time has elapsed since first failure, resetting failureTime and failureCount.
2016/12/22-13:16:58.242 INFO [RCM] Will restart resource in 500 milliseconds.
2016/12/22-13:16:58.242 INFO [RCM] TransitionToState(SQL Server) ProcessingFailure-->[WaitingToTerminate to DelayRestartingResource].
2016/12/22-13:16:58.242 INFO [RCM] rcm::RcmGroup::UpdateStateIfChanged: (SQL Server (MSSQLSERVER), Online --> Pending)
2016/12/22-13:16:58.242 INFO [RCM] TransitionToState(SQL Server Agent) Online-->[WaitingToTerminate to OnlineCallIssued].
2016/12/22-13:16:58.242 INFO [RCM] TransitionToState(SQL Server Agent) [WaitingToTerminate to OnlineCallIssued]-->[Terminating to OnlineCallIssued].
January 5, 2017 at 5:09 am
there are indications that the isalive check has failed this will initiate a restart.
Can you provide more detail on the network setup on the nodes.
Are these virtual machines or physical machines
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
January 5, 2017 at 5:46 am
They are Physical boxes..
I checked with respective team, but there is no disturbance found at network level or infrastructure level.
Event error log as below:
Cluster resource 'SQL Server' in clustered service or application 'SQL Server [MSSQLSERVER]' failed.
January 6, 2017 at 4:27 am
you clearly have some sort of network or config issue.
Can you provide more detail on the network setup on the machines
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
January 6, 2017 at 4:36 am
what exactly you looking for on network setup?
January 6, 2017 at 6:35 am
no of NICs?
Teamed?
No of networks?
etc, etc
Throw a dog a bone here!!
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
January 6, 2017 at 8:50 am
Check for an update version of antivirus software? A certain version caused unexpected cluster shutdowns at our firm
January 10, 2017 at 4:27 am
Which antivirus software & version causes cluster SQL restart at your firm?
January 11, 2017 at 2:09 am
It was Norton antivirus some months ago. Can't tell the version
They have released a new version in november 2016, which didn't have the problem
January 11, 2017 at 2:50 am
Okay..what type of evidence you found in logs that confirms SQL service was restarted because of Antivirus?
January 11, 2017 at 7:34 am
I don't have the details as the infrastructure team was assigned to it. After excluding other sources, the plausible cause was software.
The antivirus vendor supplied a diagnostic utility (to detect if the random clusterfailure was triggered by their software)
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply