AwaysOn BAG Witness Share fails

Question

Post reply

AwaysOn BAG Witness Share fails

clive.wightman 77203

SSC Journeyman

Points: 77
More actions
September 20, 2019 at 8:29 am

#3681718

Hi
The server containing the witness share went down, which resulted in the WFCS failing over but not the database Availability groups, is there a setting that I'm missing?

Any help will be appreciated
Clive

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply

Beatrix Kiddo SSC-Dedicated Points: 32407 More actions · Answer 1

Beatrix Kiddo

SSC-Dedicated

Points: 32407

September 20, 2019 at 8:45 am

#3681725

Do you have an FCI combined with AGs?

clive.wightman 77203 SSC Journeyman Points: 77 More actions · Answer 2

I have 2 nodes configured in failover cluster manager (Windows server 2012 R2), and file share on another server.

with a number of BAG on SQL Server 2016 SP2.

Clive

clive.wightman 77203 SSC Journeyman Points: 77 More actions · Answer 3

clive.wightman 77203

SSC Journeyman

Points: 77

September 20, 2019 at 9:42 am

#3681749

Update to this,

I'm not using shared storage disks.

Beatrix Kiddo SSC-Dedicated Points: 32407 More actions · Answer 4

I can't think how to phrase this, but how are you seeing the cluster fail over without the AGs?

clive.wightman 77203 SSC Journeyman Points: 77 More actions · Answer 5

Hi,

I can see the primary server fall over to the secondary but the SQL Server AG on the secondary has not fallen over to become the primary.

Beatrix Kiddo SSC-Dedicated Points: 32407 More actions · Answer 6

Ah, ok. I'm looking at one of my AG clusters right now and I can't see where you'd see that, though? In the Failover Cluster Manager, under Nodes, both of mine just say Up, and each has an Assigned Vote of 1. Any chance of a screenshot of what you're looking at?

clive.wightman 77203 SSC Journeyman Points: 77 More actions · Answer 7

Hi I have attached, ignore the offline witness, moved it away from the DC.

Attachments:

You must be logged in to view attached files.

Beatrix Kiddo SSC-Dedicated Points: 32407 More actions · Answer 8

Thanks for humouring me and sorry for all of the questions, but where are you seeing a primary and secondary cluster node there? (It's so much easier when you can see the server in question!)

clive.wightman 77203 SSC Journeyman Points: 77 More actions · Answer 9

clive.wightman 77203

SSC Journeyman

Points: 77

September 20, 2019 at 12:52 pm

#3681848

Both nodes have 1 vote

clive.wightman 77203 SSC Journeyman Points: 77 More actions · Answer 10

On the screen shot the current host server is the primary. in this case NSKLHR-GWDB02

Within the nodes you have

nsklhr-gwdb01 1 vote

nsklhr-gwdb02 1 vote

When we lost the filewitness the current host server went to NSKLHR-GWDB01, but in SQL Server for node NSKLHR-GWDB01 the availability group was showing the availability replicas as Secondary for NSKLHR-GWDB01, if I went to NSKLHR-GWDB02 this would still be primary in SQL Server.

Beatrix Kiddo SSC-Dedicated Points: 32407 More actions · Answer 11

Ok, sorry, I see what you're saying there (Friday afternoon syndrome). The Current Host Server is not the primary for the AGs, it's just the node that hosts the cluster resources at the moment. It has nothing to do with AGs really.

clive.wightman 77203 SSC Journeyman Points: 77 More actions · Answer 12

that's correct, Though once fileshare became available I rebooted NSKLHR-GWDB01 to bring everything in sync and for databases to be accessible, I'm trying to understand why when the fileshare failed, the node failed over but none of the AG's.

clive.wightman 77203 SSC Journeyman Points: 77 More actions · Answer 13

Also why did windows need to failover as it was the witness that went down and I still had 2 servers.

Beatrix Kiddo SSC-Dedicated Points: 32407 More actions · Answer 14

This is Windows Server 2016, is it? I think that was dynamic quorum at play. Because you lost the FSW you had a cluster with an even number of votes, which obviously isn't ideal because of the risk of split brain, so it would have removed the vote of one of the nodes until the FSW came back. What I'm a little hazy on is why it chose the server it did- not sure about that bit.

I don't think the cluster did actually fail over, by the way; it probably just adjusted the quorum as above, hence no AG failover.