AG has disappeared in SSMS

  • I had a WSFC issue, resolved I think.  I wasn't here and various tries have been made to resolve leaving things in a messy state.

    Now my AG only exists on 1 server.  Not on the the other.

    Node 1 = AG in resolving state. dbs = not synchronising.  Not available.

    Node 2 = AG doesn't exist.  Not in SMSS or metadata. DBs in restoring.

    In the WSFC I cannot bring up the AG resource.  It fails.  I have rebooted / started but still my AG is missing.  How should I resolve?  I'm nervous to mess it up.

    FI there is another AG on this server pair that is happily up & running.  VMS are fine, connections good etc.  This is a legacy issue that has left my AG in a mess and I'm not sure how to resolve.  There are about 100 dbs involved.

    Thanks in advance.

     

     

     

    Many thanks.

     

     

     

    • This topic was modified 1 year, 7 months ago by  snomadj.
  • I'm thinking I either need to recreate the AG on Node 2.  Or bring Node 1 up as the primary somehow.  Yikes, this isn't a nice issue.

  • Failover cluster is mandatory requirement, so fix that and if it is re-builded , connect to the same cluster name and hope all will connect.

    Based on the logs it will auto sync or have to sync manually.

    Regards
    Durai Nagarajan

  • Does the WSFC tell you why it fails to bring the AG resource online? I haven't done clustering for a while but I seem to remember there is a log/event viewer that gives more details.

  • We had a very similar issue where the Primary replica was dropped leaving our systems offline.  The following article explains the situations that can cause the replica to be removed from the AG.

    https://techcommunity.microsoft.com/t5/sql-server-support-blog/issue-replica-unexpectedly-dropped-in-availability-group/ba-p/318175

    To fix it you need to find out which node has the most up to date data to avoid data loss, use the following script - you need to identify the node where is_failover_ready = 1.  This will be your new primary:

    select a.*, ar.replica_Server_name 2from sys.dm_hadr_database_replica_cluster_states a 3left join sys.availability_replicas ar on ar.replica_id = a.replica_id

    If Node1 = is_failover_ready = 1.  Then you can failover onto that node with dataloss (there won't actually be any dataloss but you have to force it back online)

    ALTER AVAILABILITY GROUP group_name FORCE_FAILOVER_ALLOW_DATA_LOSS

    If Node2 = is_failover_ready = 1.  You need to connect to Node1, delete the AG, then recreate the AG on Node2, specifying Node 2 as the Primary Replica.  This is what we had to do to get the AG back online.

    If in doubt, get help from Microsoft support if dataloss is a concern.

    What version of windows and SQL are you on?

    Also did you get an error like the following one?

    2014-01-21 11:53:16.53 spid30s     AlwaysOn: The local replica of availability group 'groupname' is being removed. The instance of SQL Server failed to validate the integrity of the availability group configuration in the Windows Server Failover Clustering (WSFC) store.  This is expected if the availability group has been removed from another instance of SQL Server. This is an informational message only. No user action is required.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply