Conditions Required for an Automatic Failover on AlwaysOn?

  • Hi Experts.

    An automatic failover set exists. This set consists of a primary replica and a secondary replica (the automatic failover target) that are both configured for synchronous-commit mode and set to AUTOMATIC failover.

    Configured the both AG Group database automatic failover and synchronous-commit mode.

    But automatic Failover failed also Cluster service not started automatically at Node2.

    It got connected through AO Listerner after starting Node1.

    As below SQL Error log during shutdown Node1

    Date,Source,Severity,Message

    10/27/2015 10:44:20,spid37s,Unknown,AlwaysOn Availability Groups: Waiting for local Windows Server Failover Clustering node to come online. This is an informational message only. No user action is required.

    10/27/2015 10:44:20,spid37s,Unknown,AlwaysOn Availability Groups: Local Windows Server Failover Clustering node started. This is an informational message only. No user action is required.

    10/27/2015 10:44:20,spid37s,Unknown,AlwaysOn Availability Groups: Waiting for local Windows Server Failover Clustering node to start. This is an informational message only. No user action is required.

    10/27/2015 10:44:20,spid37s,Unknown,AlwaysOn Availability Groups: Local Windows Server Failover Clustering service started. This is an informational message only. No user action is required.

    10/27/2015 10:37:16,spid54,Unknown,Skipping the default startup of database 'APS_DEV' because the database belongs to an availability group (Group ID: 65536). The database will be started by the availability group. This is an informational message only. No user action is required.

    10/27/2015 10:37:16,spid54,Unknown,Error: 35262<c/> Severity: 17<c/> State: 1.

    10/27/2015 10:37:16,spid54,Unknown,Starting up database 'APS_DEV'.

    10/27/2015 10:34:44,spid37s,Unknown,AlwaysOn Availability Groups: Waiting for local Windows Server Failover Clustering service to start. This is an informational message only. No user action is required.

    10/27/2015 10:34:44,spid37s,Unknown,AlwaysOn: The availability replica manager is starting. This is an informational message only. No user action is required.

    10/27/2015 10:34:44,spid42s,Unknown,The availability group '5db90977-6ef2-45c7-9973-3872638280a1' and/or its local availability replica does not exist. Verify that the specified availability group name is correct<c/> and that the local availability replica has joined the availability group<c/> then retry the operation.

    10/27/2015 10:34:44,spid42s,Unknown,Error: 41100<c/> Severity: 16<c/> State: 40.

    10/27/2015 10:34:44,spid37s,Unknown,The state of the local availability replica in availability group 'APS_Orion' has changed from 'SECONDARY_NORMAL' to 'RESOLVING_NORMAL'. The replica state changed because of either a startup<c/> a failover<c/> a communication issue<c/> or a cluster error. For more information<c/> see the availability group dashboard<c/> SQL Server error log<c/> Windows Server Failover Cluster management console or Windows Server Failover Cluster log.

    10/27/2015 10:34:43,spid42s,Unknown,Nonqualified transactions are being rolled back in database APS_DEV for an AlwaysOn Availability Groups state change. Estimated rollback completion: 100%. This is an informational message only. No user action is required.

    10/27/2015 10:34:43,spid37s,Unknown,AlwaysOn: The local replica of availability group 'APS_Orion' is stopping. This is an informational message only. No user action is required.

    10/27/2015 10:34:43,spid37s,Unknown,AlwaysOn: The availability replica manager is going offline because the local Windows Server Failover Clustering (WSFC) node has lost quorum. This is an informational message only. No user action is required.

    10/27/2015 10:34:43,Server,Unknown,AlwaysOn Availability Groups: Local Windows Server Failover Clustering node is no longer online. This is an informational message only. No user action is required.

    10/27/2015 10:34:43,spid15s,Unknown,The Windows Server Failover Clustering (WSFC) resource control API returned error code 1726. The WSFC service may not be running or may not be accessible in its current state<c/> or the specified arguments are invalid. For information about this error code<c/> see "System Error Codes" in the Windows Development documentation.

    10/27/2015 10:34:43,spid15s,Unknown,Error: 41009<c/> Severity: 16<c/> State: 4.

    SQL Error log after starting Node1

    Date,Source,Severity,Message

    10/27/2015 10:56:25,spid37s,Unknown,The recovery LSN (5191:326:1) was identified for the database with ID 5. This is an informational message only. No user action is required.

    10/27/2015 10:56:25,spid37s,Unknown,AlwaysOn Availability Groups connection with primary database established for secondary database 'APS_DEV' on the availability replica with Replica ID: {f404bf92-515d-4eb1-b65a-6d2bd253151b}. This is an informational message only. No user action is required.

    10/27/2015 10:56:25,spid42s,Unknown,Recovery completed for database APS_DEV (database ID 5) in 4 second(s) (analysis 0 ms<c/> redo 0 ms<c/> undo 0 ms.) This is an informational message only. No user action is required.

    10/27/2015 10:56:25,spid42s,Unknown,1 transactions rolled forward in database 'APS_DEV' (5:0). This is an informational message only. No user action is required.

    10/27/2015 10:56:25,spid42s,Unknown,The recovery LSN (5191:326:1) was identified for the database with ID 5. This is an informational message only. No user action is required.

    10/27/2015 10:56:25,spid42s,Unknown,AlwaysOn Availability Groups connection with primary database established for secondary database 'APS_DEV' on the availability replica with Replica ID: {f404bf92-515d-4eb1-b65a-6d2bd253151b}. This is an informational message only. No user action is required.

    10/27/2015 10:56:20,spid42s,Unknown,Starting up database 'APS_DEV'.

    10/27/2015 10:56:20,spid37s,Unknown,AlwaysOn Availability Groups connection with primary database terminated for secondary database 'APS_DEV' on the availability replica with Replica ID: {f404bf92-515d-4eb1-b65a-6d2bd253151b}. This is an informational message only. No user action is required.

    10/27/2015 10:56:20,spid42s,Unknown,Nonqualified transactions are being rolled back in database APS_DEV for an AlwaysOn Availability Groups state change. Estimated rollback completion: 100%. This is an informational message only. No user action is required.

    10/27/2015 10:56:20,spid42s,Unknown,A connection for availability group 'APS_Orion' from availability replica 'Node2' with id [11C698ED-99C8-45E1-9A3A-81E9EC2FC3BA] to 'Node1' with id [F404BF92-515D-4EB1-B65A-6D2BD253151B] has been successfully established. This is an informational message only. No user action is required.

    10/27/2015 10:56:20,spid35s,Unknown,The state of the local availability replica in availability group 'APS_Orion' has changed from 'RESOLVING_NORMAL' to 'SECONDARY_NORMAL'. The replica state changed because of either a startup<c/> a failover<c/> a communication issue<c/> or a cluster error. For more information<c/> see the availability group dashboard<c/> SQL Server error log<c/> Windows Server Failover Cluster management console or Windows Server Failover Cluster log.

    10/27/2015 10:56:12,spid37s,Unknown,The state of the local availability replica in availability group 'APS_Orion' has changed from 'NOT_AVAILABLE' to 'RESOLVING_NORMAL'. The replica state changed because of either a startup<c/> a failover<c/> a communication issue<c/> or a cluster error. For more information<c/> see the availability group dashboard<c/> SQL Server error log<c/> Windows Server Failover Cluster management console or Windows Server Failover Cluster log.

    10/27/2015 10:56:12,spid37s,Unknown,AlwaysOn: The local replica of availability group 'APS_Orion' is starting. This is an informational message only. No user action is required.

    10/27/2015 10:56:12,spid37s,Unknown,AlwaysOn: The availability replica manager is waiting for the instance of SQL Server to allow client connections. This is an informational message only. No user action is required.

    10/27/2015 10:56:12,spid37s,Unknown,AlwaysOn Availability Groups: Local Windows Server Failover Clustering node is online. This is an informational message only. No user action is required.

    Please Help.. what need to be change existing AO configuration?

    Thanks

  • Can you brief your configuration? Is it 2 node cluster and another server for AO?

  • Yes. It is two nodes cluster configured in AO. No shared disk.

    OS: Windows 2008 R2

    DB: SQL Server 2012

    Get-ClusterQuorum details

    PS C:\Windows\system32> Get-ClusterQuorum -Cluster ClusterName

    Cluster QuorumResource QuorumType

    ------- -------------- ----------

    CLusterName NodeMajority

    PS C:\Windows\system32> Get-ClusterQuorum -Cluster Node1

    Cluster QuorumResource QuorumType

    ------- -------------- ----------

    CLusterName NodeMajority

    PS C:\Windows\system32> Get-ClusterQuorum -Cluster Node2

    Cluster QuorumResource QuorumType

    ------- -------------- ----------

    CLusterName NodeMajority

  • You mean to say you have a 2 node SQL Server failover cluster and AO is in one of the nodes?

  • Yes.

    Total 2 Nodes and AO configured these two nodes. There is no shared disk only local VHD. also not configured File share witness option.

    Simulated AO setup.

    1. Just stop the SQL Services in Node1 and connect AO Listener at client. It got connect and succeed - OK.

    2. started Node1 SQL services and stop the SQL Services in Node2 and connect AO Listener at client. It got connect and succeed - OK.

    3. checked Manual Failover method on both the nodes by using AG group name then right click to start failover option and AO Listener got connected at client side.

    Test type 2:

    Just shutdown Node1 server then try to connect AO Listener at client side but not successes also Cluster name service went offline and not started at Node2 side.

    Database keep on Recovery pending status due to cluster resource node1 not available.

    Thanks

  • In your step up automatic failover of AO wont happen but cluster failover should happen. Did you check cluster logs?

  • Your loosing quorum so the cluster is going offline.

    You need an odd number of objects to get a true quorum, add in a 3rd node or add in a file share witness or add in a shared disk quorum drive

  • Yes.. I am loosing quorum resource between two nodes...at first time I did not configure File share features during AO setup.

    Is it mandatory to add 3rd node or just enable File share witness configuration on each nodes?

    I want to know how to add File share witness drive each these two nodes?

    For configure File share witness configuration.

    1. Enable File server feature in Windows OS level?

    2. create folder at D drive and given sharing, to provide full access SQL service account. ( this steps to perform each node)

    3. Go to failover Cluster manager -> More action and choose configure cluster quorum settings then next..( to perform each nodes)

    Could you Pls.confirm all above steps can able to create File share witness successfully?

    Thanks

  • Perry's stairway has all the steps necessary on setting up the cluster from the ground up for Always-On

    http://www.sqlservercentral.com/stairway/112556/

    The share has to be aware from the cluster, eg on your corporate fileserver somewhere where the computer account has access too

    https://technet.microsoft.com/en-GB/library/jj612870.aspx

    PoSh script

    Set-ClusterQuorum -NodeAndFileShareMajority "\\fileserver\fsw"

  • Thanks a lot Mr. Anthony.green..

    Created file share to another file server and configured NodeFileMajority in cluster.

    Performed Tested as below

    - It is working smoothly..just shutdown Node1 and got connected at client side through AO Listener name. Now Secondary Replica become a primary mode also database synchronized.

    The same way done shutdown Node2 also - it is working and connected automatically through AO Listener name.

  • Great news that its working as expected.

    Need to remember with AlwaysOn Availability Groups that it still uses the underlying WSFC services so a cluster still needs to be setup as per a normal cluster with a quorum/majority of some kind to always ensure the cluster always stays online

  • Thank you anthony.green..

    AM I correct as following condition.

    1. Two nodes cluster with AO setup - it should be require to configure file share to another server for reading cluster resource and cluster services become a online.

    2. if Three nodes cluster with AO setup and all three synchronize commit, automatic failover mode in AG Group. In this case, does not require to configure file share witness on another server.

    I have One doubts.

    Third Node have different subnet IP address and located another data center (DR). but all the three nodes are same domain name also cluster name, AO Listener all same domain group.

    For above setup, Does it require to configure any special kind of setting in cluster side? Does it require to create file share witness?

  • You need a majority vote in the cluster, be that a node majority (3 or more nodes), node and disk majority (2 nodes, 1 disk) or node and fileshare majority (2 nodes, 1 fileshare) to establish a quorum.

    You should generally always have an odd number to generate the majority, so if I have 4 nodes again it would have to be node&disk or node&fileshare

    5 nodes you could do a node majority

  • SQL Galaxy (10/27/2015)


    Yes.. I am loosing quorum resource between two nodes...at first time I did not configure File share features during AO setup.

    Is it mandatory to add 3rd node or just enable File share witness configuration on each nodes?

    I want to know how to add File share witness drive each these two nodes?

    For configure File share witness configuration.

    1. Enable File server feature in Windows OS level?

    2. create folder at D drive and given sharing, to provide full access SQL service account. ( this steps to perform each node)

    3. Go to failover Cluster manager -> More action and choose configure cluster quorum settings then next..( to perform each nodes)

    Could you Pls.confirm all above steps can able to create File share witness successfully?

    Thanks

    You're losing quorum as you are using NodeMajority (majority node set), confirmed by the result below

    SQL Galaxy (10/27/2015)


    Get-ClusterQuorum details

    PS C:\Windows\system32> Get-ClusterQuorum -Cluster Node1

    Cluster QuorumResource QuorumType

    ------- -------------- ----------

    CLusterName NodeMajority

    With 2 servers in the cluster you will need to introduce a witness and with no shared storage your only option is a fileshare witness. Please read my stairway for further info

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • SQL Galaxy (10/27/2015)


    1. Two nodes cluster with AO setup - it should be require to configure file share to another server for reading cluster resource and cluster services become a online.

    With just 2 nodes in the cluster you need to either

    • add a third node
    • add a fileshare witness
    • add a shared disk witness

    SQL Galaxy (10/27/2015)


    2. if Three nodes cluster with AO setup and all three synchronize commit, automatic failover mode in AG Group. In this case, does not require to configure file share witness on another server.

    Synchr commit and auto failover have nothing to do with the Quorum level set in the Windows Server Failover Cluster.

    For 3 nodes you would not require a witness

    SQL Galaxy (10/27/2015)


    I have One doubts.

    Third Node have different subnet IP address and located another data center (DR). but all the three nodes are same domain name also cluster name, AO Listener all same domain group.

    For above setup, Does it require to configure any special kind of setting in cluster side? Does it require to create file share witness?

    For the love of god and all that's holy and for the last time

    !!!Please read the stairway!!!

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply