High Availability Group Failover SNAFU

  • I'm kicking tires and driving into ditches testing our proposed AG confiuguration when I come across this:

    Two servers with multilple AGs.  Server 1 is primary for Group A and  all other AGs.  Server 2 is primary for Group B.  Each is secondary to the other server's primary.
    One friday I'm consolidating some AGs on server 1.  I add them to group A manually using TSQL.  I plan to delete the replicas (now in a 'restoring state' after having been removed from a group) restore them and run the ALTER Database command to set the HADR group.....but I get interrupted.
    When I get back to it, I decide I want to add some databases to Group A from Group B...Group B is a secondary  on server 1.  If I remove the databases from Group B on Server 2, that will leave them in a restoring state on server1 (inconvenient).  So I think, Hey!  I'll just failover that ONE group to Server 1.  Then I can remove the databases and add them to GROUP A and do all the restores and HADR group setting at once.
    I open a session on Server 1 and check the synchonization status on the databases in Group B.  They are all good.  Now, failover Group B from Server 2 to Server 1.  The session takes about 5 seconds then tells me it successfully submitted the request.
    Result:
    Group B goes into a Resolving state and....doesn't resolve.  More perplexing, Group A databases (which were synchonized just fine before the failover) all become unavailable, and apparently begin re-seeding on Server 2 (?!?).  So do the databases from ALL groups on server 1 !!!.   All databases in ALL groups are unavailable across both servers.
    At this point, my primary concern is GROUP B stuck in 'Resolving'.  So, I REBOOT server 1.
    GROUP A now fails over to Server 2...All databases unavailable.  GROUP B shows up on the rebooted Server 1 as primary !!.  After a few hours, The synchronizations completes.  All the groups from Server 1 (except group A) have failed over to Server 2 and are back to synchronizing.  Group B is now primary on Server 1 (?? as if the original failover command was finally received when the service resumed).  And Group A is pretty much hosed and unavailable running on Server 2.
    What really bothers me about this is, I was not expecting the failover of a group from Server 2 to Server 1 to affect ALL the groups on the destination server.  Not that I plan to have split primary/secondary roles on the servers, but I would like think I could.
    So yeah, one of the groups on Server1 was 'in progress'.  But I wasn't failing it over, I was failing over GROUP B that is only a secondary on Server1.  The damage in this case it that the databases (while seeding) were unavailable, until they ALL complete their seeding. (How were they seeding anyway?  I thought if the database already existed on the other server this operation would fail?  The wizard refuses to do it - again, I was using TSQL.)  If this happened in our production environment, it would be bad.  I won't say catastrophic, since there isn't any unexpected dataloss, but the unavailablity it a big hit.

    As a preventative measure,
    Do I need to check the health status of every database in every group on the destination server before I allow a failover of any ONE group to that server? 
    Do I need to check the Seeding and make sure EVERY AG is set to MANUAL before the server becomes the target of a failover?

    Any one else experience FailOver Snafus like this?

  • what seeding setting do you currently have on your AGs ?
    I've set it to "manual" once I have my AGs in a fucntioning state.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • I have a mix of seeding options.  I started with Automatic while I was setting up, then switched to manual on a single AG  to practice the manual seeding I might do in production if I were fixing a database synchronization problem and had to drop and re add a live database.  I'm thinking once the initial setup is done, I want everything set to manual.  If I decide I want to use the automatic seed, Ill flip it on, add the database(s), then flip it back.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply