AlwaysON goes down from Secondary Replica

  • Hi All.

    I have a very strange problem, don't know what to write to find solution. I will try to describe problem.

    We have an AlwaysON with two replicas: primary and secondary and 1 listener ip.

    Some days ago we had problems with disaster recovery site and connection with secondary replica went down. When connection down, AlwaysOn IP went down too. After 2-3 minutes when connection back, I saw that secondary replica has became primary replica and old primary replica node went to quarantine in Windows Failover Cluster. We have synchronize commit and automatic failover. But I don't understand why AlwaysOn IP goes down when secondary replica down and after it secondary replica has became primary and primary down.

  • How is Quorum configured for this cluster? Looks like the primary replica lost quorum and went down, this is intended behavior.

    https://docs.microsoft.com/en-us/windows-server/failover-clustering/manage-cluster-quorum

    What is the result of

    select * from sys.dm_hadr_cluster_members
  • Thanks for reply. How can I check lost quorum? Now it's ok, because there isn't any problem in secondary node. But I'm afraid if there will be problem again, it will down again.

    Here is a result of query

    Capture

  • You should get that information in detail in your cluster logs.

    BTW, SQL tiger team released an outstanding utility which can be used to analyze (including the Cluster Logs, SQL Error Logs, and the Availability groups extended events logs) and troubleshoot Availability Group failover issues.

    https://techcommunity.microsoft.com/t5/SQL-Server/Failover-Detection-Utility-Availability-Group-Failover-Analysis/ba-p/386021

     

  • It happens rarely, but after last time we have problems. And now I want to know why it happens.

    There are errors like:

    Cluster resource 'dbalwson' of type 'SQL Server Availability Group' in clustered role 'dbalwson' failed.

    Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

    and

    The Cluster service failed to bring clustered role 'dbalwson' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

    I don't understand why primary replica and cluster IP goes down when secondary replica goes down. Where can I check it?

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply