Rebooting clustered SQL 2014 problem

  • Hi there, 
    I reboot our clustered SQL 2014 (2 nodes, in VMWare environment) regularly. The basic steps are:

    1) check that everything is okay
    2) reboot Secondary
    3) once it is up again, and synchronized again, fail over from Primary to Secondary
    4) reboot Primary
    5) once it is up again, and synchronized again, fail over from Secondary to Primary 

    The problem I have is in steps 3) and 5): after a reboot of the server all databases (approx. 120) have to get synced again and this process most of the times does not finish. My solution is to restart the SQL Server Service on the machine that just got rebooted, after which in most cases all databases all get synced (occasionally a second restart of the service is necessary).

    I can't imagine that this is what MS has intended as normal operations. What must I do to to improve my rebooting routine?

  • Hello Raymond, so, when you say "synched" do you mean after a failover the databases becoming available?  Or, is there an additional HA option being performed, such as log shipping, mirroring, or replication?  Sql Server, during a cluster failover, performs a restart of the sql instance, which in turn prompts a recovery of all the databases within the instance.  One thing which can cause the restore times to be expanded is possibly the number of vlfs for each databases.  If one or some of them have too many (>1000) for example, this can cause significant delays in your recovery times.  As a check, you may want to run a query which checks the number of vlfs per database.  That code is [dbcc loginfo].

  • RVSC48 - Thursday, April 6, 2017 6:52 AM

    Sql Server, during a cluster failover, performs a restart of the sql instance, which in turn prompts a recovery of all the databases within the instance.

    For a failover cluster instance but i believe the OP is referring to an Availability Group here

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Raymond van Laake - Thursday, April 6, 2017 1:58 AM

    Hi there, 
    I reboot our clustered SQL 2014 (2 nodes, in VMWare environment) regularly. The basic steps are:

    1) check that everything is okay
    2) reboot Secondary
    3) once it is up again, and synchronized again, fail over from Primary to Secondary
    4) reboot Primary
    5) once it is up again, and synchronized again, fail over from Secondary to Primary 

    The problem I have is in steps 3) and 5): after a reboot of the server all databases (approx. 120) have to get synced again and this process most of the times does not finish. My solution is to restart the SQL Server Service on the machine that just got rebooted, after which in most cases all databases all get synced (occasionally a second restart of the service is necessary).

    I can't imagine that this is what MS has intended as normal operations. What must I do to to improve my rebooting routine?

    Is this an alwayson availability group configuration you're referencing here

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Perry Whittle - Thursday, April 6, 2017 7:15 AM

    Raymond van Laake - Thursday, April 6, 2017 1:58 AM

    Hi there, 
    I reboot our clustered SQL 2014 (2 nodes, in VMWare environment) regularly. The basic steps are:

    1) check that everything is okay
    2) reboot Secondary
    3) once it is up again, and synchronized again, fail over from Primary to Secondary
    4) reboot Primary
    5) once it is up again, and synchronized again, fail over from Secondary to Primary 

    The problem I have is in steps 3) and 5): after a reboot of the server all databases (approx. 120) have to get synced again and this process most of the times does not finish. My solution is to restart the SQL Server Service on the machine that just got rebooted, after which in most cases all databases all get synced (occasionally a second restart of the service is necessary).

    I can't imagine that this is what MS has intended as normal operations. What must I do to to improve my rebooting routine?

    Is this an alwayson availability group configuration you're referencing here

    Before your fail over, check your HA redo queue - if some queues are large this may be what is impacting your failover.  Also suggest taking a log backup of all your DB's on the Primary before fail over.

    See how that goes.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply