April 11, 2020 at 2:17 am
Taking windows role default settings - (n-1) failovers ok within 6 hrs , where n=# of nodes, if nodes = 2, 1 failover within 6 hrs is allowed.
Taking resource default settings - Within 15mins period, 1 restart is allowed on the same node, else failover, RetryPeriodOnFailure is 1 hr
Taking a example of a simple 2 node cluster (node1 & Node2), single sqlFCI (sql), say sql is on node 1 at 9AM.
9AM - sql fails - attempts to restart on Node1 - fails - failsover to Node2 - attempts to restart on Node2 - fails - stays failed.
Now where I am confused is, https://support.microsoft.com/en-us/help/947712/failover-cluster-resource-recovery-behavior-in-windows-server-2008 , "If there is no intervention and the resource remains in the failed state for 60 minutes, Windows Server tries to bring the resource online again. "
So, according to above, after 60mins, say sometime after 10AM - Does sql try a cycle again :
Attempts sql restart on Node2 - fails - (but cannot failover as 6hrs have not passed)
11AM- Attempts sql restart on Node2 - fails - (but cannot failover as 6hrs have not passed)
12PM - Attempts sql restart on Node2 - fails - (but cannot failover as 6hrs have not passed)
1PM - Attempts sql restart on Node2 - fails - (but cannot failover as 6hrs have not passed)
2PM - Attempts sql restart on Node2 - fails - (but cannot failover as 6hrs have not passed)
3PM - Attempts sql restart on Node2 - fails - failsover to node1 as 6 hrs have passed, tries coming online there , and so on.. is this how it happens? Thanks.
Viewing 0 posts
You must be logged in to reply to this topic. Login to reply