March 13, 2023 at 5:13 pm
I had a WSFC issue, resolved I think. I wasn't here and various tries have been made to resolve leaving things in a messy state.
Now my AG only exists on 1 server. Not on the the other.
Node 1 = AG in resolving state. dbs = not synchronising. Not available.
Node 2 = AG doesn't exist. Not in SMSS or metadata. DBs in restoring.
In the WSFC I cannot bring up the AG resource. It fails. I have rebooted / started but still my AG is missing. How should I resolve? I'm nervous to mess it up.
FI there is another AG on this server pair that is happily up & running. VMS are fine, connections good etc. This is a legacy issue that has left my AG in a mess and I'm not sure how to resolve. There are about 100 dbs involved.
Thanks in advance.
Many thanks.
March 13, 2023 at 5:43 pm
I'm thinking I either need to recreate the AG on Node 2. Or bring Node 1 up as the primary somehow. Yikes, this isn't a nice issue.
March 14, 2023 at 6:25 am
Failover cluster is mandatory requirement, so fix that and if it is re-builded , connect to the same cluster name and hope all will connect.
Based on the logs it will auto sync or have to sync manually.
Regards
Durai Nagarajan
March 16, 2023 at 8:15 pm
Does the WSFC tell you why it fails to bring the AG resource online? I haven't done clustering for a while but I seem to remember there is a log/event viewer that gives more details.
March 30, 2023 at 10:03 am
We had a very similar issue where the Primary replica was dropped leaving our systems offline. The following article explains the situations that can cause the replica to be removed from the AG.
To fix it you need to find out which node has the most up to date data to avoid data loss, use the following script - you need to identify the node where is_failover_ready = 1. This will be your new primary:
select a.*, ar.replica_Server_name 2from sys.dm_hadr_database_replica_cluster_states a 3left join sys.availability_replicas ar on ar.replica_id = a.replica_id
If Node1 = is_failover_ready = 1. Then you can failover onto that node with dataloss (there won't actually be any dataloss but you have to force it back online)
ALTER AVAILABILITY GROUP group_name FORCE_FAILOVER_ALLOW_DATA_LOSS
If Node2 = is_failover_ready = 1. You need to connect to Node1, delete the AG, then recreate the AG on Node2, specifying Node 2 as the Primary Replica. This is what we had to do to get the AG back online.
If in doubt, get help from Microsoft support if dataloss is a concern.
What version of windows and SQL are you on?
Also did you get an error like the following one?
2014-01-21 11:53:16.53 spid30s AlwaysOn: The local replica of availability group 'groupname' is being removed. The instance of SQL Server failed to validate the integrity of the availability group configuration in the Windows Server Failover Clustering (WSFC) store. This is expected if the availability group has been removed from another instance of SQL Server. This is an informational message only. No user action is required.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply