Bringing availability group's online during a quorum failure

  • Hey all,

    Experimenting in a local lab with an always on setup and have some questions. The setup is pretty easy. 3 node cluster with a file share witness.

    I've forcefully shut off two of the nodes (witness still online). This killed the cluster, the availability group and the listener.

    So a few things:

    1) For a file share witness, should it be setup for access-based enumeration or continuous availability?

    2) Shouldn't I have been able to withstand the loss of 50% of the voting?

    3) I forced the quorum online (net start clussvc /fq). I verified in sys.dm_hard_cluster_members that both node 1 and the witness are online with a vote; however the AG is still down. When bringing it back online, it results in: The Cluster resource cannot be brought online. The owner node cannot run this resource.. I can however manually bringing the listener back online and that works fine.

    4) The AG role is shown as owned by node 1 in cluster manageer; however after issuing a FORCE_FAILOVER_ALLOW_DATA_LOSS on node1, it came back online. Is this by design that you have to force the failover to the node, even though it already owned the resource?

    Thanks!

  • Adam Bean (2/4/2015)


    Hey all,

    Experimenting in a local lab with an always on setup and have some questions. The setup is pretty easy. 3 node cluster with a file share witness.

    If you have a 3 node cluster then drop the fileshare witness!

    You should be using Majority Node Set only

    Adam Bean (2/4/2015)


    I've forcefully shut off two of the nodes (witness still online). This killed the cluster, the availability group and the listener.

    Sounds like the expected action, you lost 2 of 4 votes.

    Adam Bean (2/4/2015)


    2) Shouldn't I have been able to withstand the loss of 50% of the voting?

    Now here's the "it depends" part. If the node shutdowns are sudden and unexpected then the quorum voting will likely not have time to reconfigure, this is why you should still design and set the quorum vote count appropriately.

    Adam Bean (2/4/2015)


    3) I forced the quorum online (net start clussvc /fq). I verified in sys.dm_hard_cluster_members that both node 1 and the witness are online with a vote; however the AG is still down. When bringing it back online, it results in: The Cluster resource cannot be brought online. The owner node cannot run this resource.. I can however manually bringing the listener back online and that works fine.

    forcing service is not a practice I would recommend, last resort more like.

    What is it you are trying to prove, that you can kill 2 nodes and the cluster will still operate, if so adjust the vote count appropriately.

    Adam Bean (2/4/2015)


    4) The AG role is shown as owned by node 1 in cluster manageer; however after issuing a FORCE_FAILOVER_ALLOW_DATA_LOSS on node1, it came back online. Is this by design that you have to force the failover to the node, even though it already owned the resource?

    Thanks!

    Once the cluster goes it is not known who owns the resource. Again what is it you are trying to prove by mangling your cluster?

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Hey Perry,

    - The real world setup is that I'll have two nodes in one data center, with a file share witness and another node in a separate data center (async) that has no vote. So in essence, three voters, why do you state to drop the fileshare witness?

    - So losing 50% of the votes should result in a quorum failure?

    - Trying to prove that if one data center / geographical location goes offline, that I can manually bring up the secondary location.

    Thanks

  • Adam Bean (2/5/2015)


    Hey Perry,

    - The real world setup is that I'll have two nodes in one data center, with a file share witness and another node in a separate data center (async) that has no vote. So in essence, three voters, why do you state to drop the fileshare witness?

    taking a vote away from a node and creating a fileshare witness is pointless. With the node vote you had 3 votes in the cluster, an odd number, exactly what you require for MNS.

    Adam Bean (2/5/2015)


    - So losing 50% of the votes should result in a quorum failure?

    Last time I did the maths you cant lose 50% of that config, 33% or 66% yes.

    Adam Bean (2/5/2015)


    - Trying to prove that if one data center / geographical location goes offline, that I can manually bring up the secondary location.

    Thanks

    What makes you think you cant bring up the secondary location with the MNS configuration?

    If 2 nodes in DC1 have a vote and 1 node in DC2 has a vote and you lose DC1 the node in DC2 still has a vote. If the shutdown was unexpected\ungraceful, you'll be in the same situation you're in now, manually forcing service on the cluster.

    Note this is last resort. If you expect this to happen regularly you probably want to put more effort into ensuring the site connectivity is as redundant as possible, that involves a deep wallet.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Thanks Perry.

    Don't have the article handy (on the road) but I had read that an async DR node shouldn't have a vote. Is that not accurate?

  • Adam Bean (2/6/2015)


    Thanks Perry.

    Don't have the article handy (on the road) but I had read that an async DR node shouldn't have a vote. Is that not accurate?

    The node vote is at the cluster level, so whether the group is synch or asynch is outside of this. The cluster votes determine what happens to the WSFC at the Windows OS level.

    You may have another group on that replica which is set to synch, what would happen then??

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply