Cluster failover test

  • Hi,

    We had built a active/passive cluster and and installed sql server 2005. We have 4 groups as below

    1.Cluster group

    2.Msdtc group

    3.SQL group

    Iam testing the failover between node1 and node2 as below. Here node1 is active and node2 is passive.

    1.Moved all the 3 groups from node1 to node2 and vice versa -> working fine

    2.Restarted the node1. all the groups moved to node2 and when node1 came online all groups moved back to node1 except Cluster group (Is this normal behaviour or Cluster group must move back to node1)

    3.Restarted the node2. all the groups moved to node1

    are the above failover tests are fine and enough?

    or do I need any further failover tests? please tell me other failover tests to make sure the cluster is working fine so that We can use it for production use.

    thanks

  • There are many more tests you can do; at the end of it you have to be confident that the cluster will stay up. If you are happy with the tests you done that is enough .. in addition you might want to consider following tests..

    1) Pull cable test on active node; from public network.

    - Should initiate a full failover of all resources from active to passive.

    2) Pull cable test on active node; from private network.

    - They should not be any failover.

    3) HBA Pull cable test; if you have dual HBA in server it is good for testing that HBA failover.

    Make sure when doing these testing you configure the maximum number of failovers allowed in 6 hours; or automatic failover will stop working.

    Thanks.

    MOhit.

    [font="Arial"]---

    Mohit K. Gupta, MCITP: Database Administrator (2005), My Blog, Twitter: @SQLCAN[/url].
    Microsoft FTE - SQL Server PFE

    * Some time its the search that counts, not the finding...
    * I didn't think so, but if I was wrong, I was wrong. I'd rather do something, and make a mistake than be frightened and be doing nothing. :smooooth:[/font]

    How to ask for help .. Read Best Practices here[/url].

  • from sql-server-performance article:

    T

    esting the Cluster Service

    * Once the cluster is built, test it extensively before it goes into production. Test for the following:

    o Group moves.

    o Initiate manual failover.

    o Turn each node off.

    o Disconnect network connections from each node.

    o Disconnect shared array (this sometimes can be interesting) connection from each node.

    I can move the groups from node1 to node and vice versa

    I initiated the manual failover as below

    Cluster Administrator->Resources->SQL Server(InstanceName)->RightClick->click Initiate failover. Here, the SQL server(InstanceName) failed(also SQL Server Agent) and immediately came online and it did not move to node1 when it failed.

    I thought that SQL server(InstanceName) should move to node2. Please clarify me?

    Turn each node off--- what should I do here? I right clicked the node->I see the options pause node, stop cluster service, but I did not see turnoff option. Is this means to shutdown/reboot the servers to make sure the groups are moving?

    and also clarify me 2

    .Restarted the node1. all the groups moved to node2 and when node1 came online all groups moved back to node1 except Cluster group (Is this normal behaviour or Cluster group must move back to node1)

    Please clarify meabove things.

  • klnsuddu (6/9/2009)


    from sql-server-performance article:

    Cluster Administrator->Resources->SQL Server(InstanceName)->RightClick->click Initiate failover. Here, the SQL server(InstanceName) failed(also SQL Server Agent) and immediately came online and it did not move to node1 when it failed.

    What this is a Simulation; it fails the resources to see if they can come back online successfully. What you have to do to to send resources to other node is right click -> Move -> Node 2. This will initiate a failover in proper order.

    Turn each node off--- what should I do here? I right clicked the node->I see the options pause node, stop cluster service, but I did not see turnoff option. Is this means to shutdown/reboot the servers to make sure the groups are moving?

    It is not referring to cluster services; it is referring to physical hardware. This is testing in case of power supply failure; so shut down node 1 and see if node 2 takes over responsibility.

    and also clarify me 2

    .Restarted the node1. all the groups moved to node2 and when node1 came online all groups moved back to node1 except Cluster group (Is this normal behaviour or Cluster group must move back to node1)

    This bit depends on your companies/your policies; if you are using active-passive configuration leaving that setting off is probably the best option why have extra failovers when not needed?

    But now lets consider you set up a 2 Node cluster with Active-Active configuration?

    So we have Instance 1 (Active Node 1->Node 2 Passive) and Instance 2 (Active Node 2->Node 1 Passive); but failover happens so now we have Instance 1 (Active Node 2->Node 1 Passive) and Instance 2 (Active Node 2->Node 1 Passive). So Node 2 is serving both instances; so if your Node 2 is powerful enough to handle two instances no problem; but now CPU might be taxed a bit. So in this case at first chance you wast the Instance 1 services to go back to Node 1 at first available chance so it is again at the balance stage where each node is only serving one instance only.

    Thanks.

    [font="Arial"]---

    Mohit K. Gupta, MCITP: Database Administrator (2005), My Blog, Twitter: @SQLCAN[/url].
    Microsoft FTE - SQL Server PFE

    * Some time its the search that counts, not the finding...
    * I didn't think so, but if I was wrong, I was wrong. I'd rather do something, and make a mistake than be frightened and be doing nothing. :smooooth:[/font]

    How to ask for help .. Read Best Practices here[/url].

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply