AlwaysON goes down from Secondary Replica

Question

AlwaysON goes down from Secondary Replica

Farik013

SSC-Addicted

Points: 465
More actions
May 22, 2019 at 1:35 pm

#3646102

Hi All.
I have a very strange problem, don't know what to write to find solution. I will try to describe problem.
We have an AlwaysON with two replicas: primary and secondary and 1 listener ip.
Some days ago we had problems with disaster recovery site and connection with secondary replica went down. When connection down, AlwaysOn IP went down too. After 2-3 minutes when connection back, I saw that secondary replica has became primary replica and old primary replica node went to quarantine in Windows Failover Cluster. We have synchronize commit and automatic failover. But I don't understand why AlwaysOn IP goes down when secondary replica down and after it secondary replica has became primary and primary down.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

e4d4 SSCertifiable Points: 5777 More actions · Answer 1

How is Quorum configured for this cluster? Looks like the primary replica lost quorum and went down, this is intended behavior.

https://docs.microsoft.com/en-us/windows-server/failover-clustering/manage-cluster-quorum

What is the result of

select * from sys.dm_hadr_cluster_members

Farik013 SSC-Addicted Points: 465 More actions · Answer 2

Thanks for reply. How can I check lost quorum? Now it's ok, because there isn't any problem in secondary node. But I'm afraid if there will be problem again, it will down again.

Here is a result of query

Capture

Sreekanth B SSCertifiable Points: 6145 More actions · Answer 3

You should get that information in detail in your cluster logs.

BTW, SQL tiger team released an outstanding utility which can be used to analyze (including the Cluster Logs, SQL Error Logs, and the Availability groups extended events logs) and troubleshoot Availability Group failover issues.

https://techcommunity.microsoft.com/t5/SQL-Server/Failover-Detection-Utility-Availability-Group-Failover-Analysis/ba-p/386021

Farik013 SSC-Addicted Points: 465 More actions · Answer 4

It happens rarely, but after last time we have problems. And now I want to know why it happens.

There are errors like:

Cluster resource 'dbalwson' of type 'SQL Server Availability Group' in clustered role 'dbalwson' failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

and

The Cluster service failed to bring clustered role 'dbalwson' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

I don't understand why primary replica and cluster IP goes down when secondary replica goes down. Where can I check it?