Unable to fail over AlwaysOn Role via cluster admin, can via SSMS

  • So, I have a 2 node cluster.  Nodes are both up.  Cluster validation is happy.  Win Server 2016 DC & SQL Server 2016 SP2.

    I create an Always On AG & listener, add a random db for testing.  Again, success.

    I can fail my new AG over (back & forth) happily using SQL via SSMS.

    When I go to the Failover Cluster Manager, I try to move the Always On Role & it says 'Move Failed' & stays on the same node.

    This happens regardless of which node the role is hosted on (remembering that I can failover via SSMS).

    So ... any ideas why this is happening?

    I can see nothing illuminating in the cluster.log, FCM, errorLog or the 2 x eventViewers.  I can only guess that the FCM has an extra level of comms to do before it hits SQL.  So perhaps I need to open up a different port, or something ... help, I'm so stuck.

    Thank you!

  • I should probably mention that this client has the most ridiculous lockdown policy so it could easily be GPO, permissions, firewall ... but my server admins are completely unhelpful until I prove it's on them so any pointers most gratefully received.  I would like to be able to sock it to 'em & fix this bloody problem.  Thanks.

  • Failover Cluster Manager *cannot* be used to fail over an availability group.  The only method that can be used is SQL - whether from SSMS or through SQL statements.

    The reason for this is simple - in order to be able to fail over to the secondary, SQL Server needs to switch from asynchronous to synchronous (if not already set) and validate that all transactions have been hardened on the secondary before it can transition to the other node.  If this is not done, then your fail over could (probably will) cause a loss of data - because there could be transactions that are still in the REDO queue waiting to be committed and hardened on the secondary.

    Failover Cluster Manager does not have any visibility to the internals within SQL Server and has no awareness of the state of synchronization on all of the databases participating in the availability group.

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Hey Jeffrey, thanks for the reply.

    So I've just checked on my estate & I am able to failover the AlwaysOnAG via the FCM for the other clusters.  Right click, move the AlwaysOnAG role to another server.  Check within both FCM & SSMS & the AlwaysOnAG & associated dbs have successfully moved to node 2.

    Still confused and appreciative of advice …

  • Hi again,

    Just a quick note to say that it seems you can failover an AlwaysOnAG role via the FCM if it is set to synchronous with auto failover.  You can't if it isn't.  I guess this makes sense in the context of Jeffrey's response above.  I'm guessing the FCM knows enough to check if failover is automatic & therefore synchronous & can proceed.  If it isn't it wants nothing to do with a failover.

    Not disagreeing with whether or not it should be done.

    Thanks.

     

  • Here are 2 references:

    https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/failover-clustering-and-always-on-availability-groups-sql-server?view=sql-server-ver15#:~:text=Using%20the%20Failover%20Cluster%20Manager,online%20on%20the%20target%20node.

    https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/prereqs-restrictions-recommendations-always-on-availability?view=sql-server-ver15

    In both documents - they expressly state that you should not use FCM to manipulate availability groups - including such things as:

    • Do not add or remove resources in the clustered services (resource group) for the availability group
    • Do not change any availability group properties...
    • Do not use the Failover Cluster Manager to move availability groups to different nodes or to fail over availability groups. The Failover Cluster Manager is not aware of the synchronization status of the availability replicas, and doing so can lead to extended downtime. You must use Transact-SQL or SQL Server Management Studio.

    Regardless of the setup and configuration of the availability group - the warnings are very clear from Microsoft.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • I agree to Jeff. Do not use FCM but SSMS/SQL for failovers.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply