AwaysOn BAG Witness Share fails

  • Hi

    The server containing the witness share went down, which resulted in the WFCS failing over but not the database Availability groups, is there a setting that I'm missing?

     

    Any help will be appreciated

    Clive

  • Do you have an FCI combined with AGs?

  • I have 2 nodes configured in failover cluster manager (Windows server 2012 R2), and file share on another server.

    with a number of BAG on SQL Server 2016 SP2.

    Clive

  • Update to this,

    I'm not using shared storage disks.

  • I can't think how to phrase this, but how are you seeing the cluster fail over without the AGs?

  • Hi,

    I can see the primary server fall over to the secondary but the SQL Server AG on the secondary has not fallen over to become the primary.

  • Ah, ok. I'm looking at one of my AG clusters right now and I can't see where you'd see that, though? In the Failover Cluster Manager, under Nodes, both of mine just say Up, and each has an Assigned Vote of 1. Any chance of a screenshot of what you're looking at?

  • Hi I have attached, ignore the offline witness, moved it away from the DC.

    Attachments:
    You must be logged in to view attached files.
  • Thanks for humouring me and sorry for all of the questions, but where are you seeing a primary and secondary cluster node there? (It's so much easier when you can see the server in question!)

  • Both nodes have 1 vote

  • On the screen shot the current host server is the primary. in this case NSKLHR-GWDB02

    Within the nodes you have

    nsklhr-gwdb01  1 vote

    nsklhr-gwdb02  1 vote

     

    When we lost the filewitness the current host server went to NSKLHR-GWDB01, but in SQL Server for node NSKLHR-GWDB01 the availability group was showing the availability replicas as Secondary for NSKLHR-GWDB01, if I went to NSKLHR-GWDB02 this would still be primary in SQL Server.

  • Ok, sorry, I see what you're saying there (Friday afternoon syndrome). The Current Host Server is not the primary for the AGs, it's just the node that hosts the cluster resources at the moment. It has nothing to do with AGs really.

  • that's correct,  Though once fileshare became available I rebooted NSKLHR-GWDB01 to bring everything in sync and for databases to be accessible, I'm trying to understand why when the fileshare failed, the node failed over but none of the AG's.

  • Also why did windows need to failover as it was the witness that went down and I still had 2 servers.

  • This is Windows Server 2016, is it? I think that was dynamic quorum at play. Because you lost the FSW you had a cluster with an even number of votes, which obviously isn't ideal because of the risk of split brain, so it would have removed the vote of one of the nodes until the FSW came back. What I'm a little hazy on is why it chose the server it did- not sure about that bit.

    I don't think the cluster did actually fail over, by the way; it probably just adjusted the quorum as above, hence no AG failover.

     

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply