AOAG resources partially disappearing

  • Dear all,

    I am posting here as I have been struggling 3 days now on an issue and am looking for guidance to solve my issue :

    I deployed a Failover Cluster composed of 3 VMs on the same VLAN

    • 2 servers that will run MSSQL instances
    • Each instance is on an independent VM
    • A common AD service account is used to install both MSSQL engines

      1 server hosting a File Witness share, full access to the shared granted to the cluster computer object in AD

    On top 3 availabiity groups, with all the same behaviour:

    • Availability group setup was super smooth. All databases were created without issue by automatic seeding
    • Database inclusion / removal to availability group is done without any error / warning

    Still I encounter issues on AGs with or without listners

    On normal behavious : Primary instance indicates everything is fine and I see no issue on replication (all changes done on instance #1 are reflected correctly on instance #2)

    1

    However, when I look at instance #2, I already have something that doesn't look the same (is that question mark expected due to the fact the second instance is "closed" and just taking replication streams?)

    2

    Now, the fun stuff happens when I attempt a manual failover of any of the AGs to the second node. The failover operation is all green :

    failover

    Still the situation just gets worse until I reboot the secondary node or fail back (in all cases, it only runs well when coming back to instance #1)

    One of the nodes seems to be working as expected :

    3

    However the other is completely lost and seems to only have the secondary node visible

    4

    I have dumped all instance logs & cluster logs but I don't see any of the obvious issues that I came across on these forums (no permission errors raised on cluster, DNS or even instance access). The cluster seems healthy and I re-run all validation tests without issue (warnings on network resilience which isn't an issue on ESX servers).

    Would anyone have a hint of where I can look to start understanding what could be causing this asymetric behaviour between my nodes?

    Thanks for your help & support

    Marko

    • This topic was modified 1 year, 4 months ago by  Marko.
  • Hi,

    I think, these small icons ara just icons in the ssms, I wouldn't care about it.

    Do you start the aoag dashboard from a single node, or from the listener. If you start the aoag dashboard from the secondary node, only the secondary node is in the dashboard visible.

    If you start the aoag dashboard on the active node, both sql server are visible.

    Kind regards,

    Andreas

     

  • Hi Andreas

    Thanks for the prompt feedback. Indeed, when connecting through the listner it works exactly the way it should. I think it's just those question marks & the fast that primary & secondary nodes can't independently show the health of the AG that triggered my worries.

    Thanks again for the info, very appreciated !

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply