AG has disappeared in SSMS

Question

AG has disappeared in SSMS

snomadj

SSCarpal Tunnel

Points: 4491
More actions
March 13, 2023 at 5:13 pm

#4160232
I had a WSFC issue, resolved I think. I wasn't here and various tries have been made to resolve leaving things in a messy state.
Now my AG only exists on 1 server. Not on the the other.
Node 1 = AG in resolving state. dbs = not synchronising. Not available.
Node 2 = AG doesn't exist. Not in SMSS or metadata. DBs in restoring.
In the WSFC I cannot bring up the AG resource. It fails. I have rebooted / started but still my AG is missing. How should I resolve? I'm nervous to mess it up.
FI there is another AG on this server pair that is happily up & running. VMS are fine, connections good etc. This is a legacy issue that has left my AG in a mess and I'm not sure how to resolve. There are about 100 dbs involved.
Thanks in advance.

Many thanks.
- This topic was modified 1 year, 9 months ago by snomadj.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

snomadj SSCarpal Tunnel Points: 4491 More actions · Answer 1

I'm thinking I either need to recreate the AG on Node 2. Or bring Node 1 up as the primary somehow. Yikes, this isn't a nice issue.

durai nagarajan SSCoach Points: 15778 More actions · Answer 2

Failover cluster is mandatory requirement, so fix that and if it is re-builded , connect to the same cluster name and hope all will connect.

Based on the logs it will auto sync or have to sync manually.

Regards
Durai Nagarajan

as_1234 SSCrazy Points: 2865 More actions · Answer 3

Does the WSFC tell you why it fails to bring the AG resource online? I haven't done clustering for a while but I seem to remember there is a log/event viewer that gives more details.

onetsql Valued Member Points: 57 More actions · Answer 4

We had a very similar issue where the Primary replica was dropped leaving our systems offline. The following article explains the situations that can cause the replica to be removed from the AG.

https://techcommunity.microsoft.com/t5/sql-server-support-blog/issue-replica-unexpectedly-dropped-in-availability-group/ba-p/318175

To fix it you need to find out which node has the most up to date data to avoid data loss, use the following script - you need to identify the node where is_failover_ready = 1. This will be your new primary:

select a.*, ar.replica_Server_name 2from sys.dm_hadr_database_replica_cluster_states a 3left join sys.availability_replicas ar on ar.replica_id = a.replica_id

If Node1 = is_failover_ready = 1. Then you can failover onto that node with dataloss (there won't actually be any dataloss but you have to force it back online)

ALTER AVAILABILITY GROUP group_name FORCE_FAILOVER_ALLOW_DATA_LOSS

If Node2 = is_failover_ready = 1. You need to connect to Node1, delete the AG, then recreate the AG on Node2, specifying Node 2 as the Primary Replica. This is what we had to do to get the AG back online.

If in doubt, get help from Microsoft support if dataloss is a concern.

What version of windows and SQL are you on?

Also did you get an error like the following one?

2014-01-21 11:53:16.53 spid30s AlwaysOn: The local replica of availability group 'groupname' is being removed. The instance of SQL Server failed to validate the integrity of the availability group configuration in the Windows Server Failover Clustering (WSFC) store. This is expected if the availability group has been removed from another instance of SQL Server. This is an informational message only. No user action is required.