Using vMotion to migrate an AlwaysOn node

  • Hi All

    Our network guys have to carry out an IBM Flex Chassis move at our data centre, which will affect the primary replica of one of our SQL 2012 AlwaysOn Availability Group nodes (the secondary replica won't be affected).

    They have suggested using vMotion to migrate the primary replica to another virtual host, which will result in a very brief period of network outage for the node.

    I've done some reading and have seen a few potential issues regarding Stun During Page Send (SDPS) and increasing thresholds within WSFC. Unfortunately, we're not able to test this prior to the migration, so I have a few questions...

    1. Does anyone have any experience in the impact of doing this?

    2. Would it be necessary to failover to the secondary replica node before performing the vMotion (and back again afterwards)?

    Thanks very much!

    Innerise

  • vMotion is a fun topic in the SQL Server world. Allowing a SQL Server VM to be moved using vMotion when an AG is involved is a reasonably safe operation. Our AG's are periodically transitioned to new/different datacenters with vMotion and aside from a few incidents of vmdk's being mis-directed to the wrong targets and the associated performance impacts they have been uneventful. This being said however we are extra vigilant with our processes to ensure that the impacts and duration of user experienced outages are minimal/non-existent. For reference the high level process we follow when migrating SQL Server VM's to a new VMWare cluster/host are as follows:

    Notes:

    Server A = Original Primary

    Server B = Original Secondary

    1. Fail-Over AG(s) on "Server A" to "Server B". This is performed to minimize the end user impacts of the move. Should issues arise with Server B during the move the users will only experience this controlled fail-over instead of potentially longer term interruptions due to instability/outage.

    2. Suspend data movement on the Availability Databases on "Server A". We do this to ensure that activity on the "Server A" files are minimized. This excludes any read-intent workloads performed after the fail-over if read access is granted to the secondaries within your AG. Of note, suspending data movement has its caveats which should you require a longer term outage on a high volume server the log file on Server B will need to be large enough to retain the records for sync.

    3. vMotion "Server A" to the new VMware target.

    4. Enable data movement on the Availability Databases on "Server A".

    It is optional to fail-over the AG primary from Server B back to Server A unless your topology requires it.

    So in a nutshell do yourself and your user community a favor and fail-over the AG first before using vMotion to move the SQL Server guest to a new target.

    Scott

  • this all depends on how well the VMware infrastructure was designed and built. The vmotion network should be segregated and high speed, this network carries the live migration of the nodes memory map and needs to be highly specced. Typically you should be able to migrate the node without worry, but it would be worth testing this on a test system with similar network capacity

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • For the next person coming from google, the link is invalid, here is the new link:

    https://techcommunity.microsoft.com/t5/Failover-Clustering/Tuning-Failover-Cluster-Network-Thresholds/ba-p/371834

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply