How to test Failover in production sql server cluster?

  • How to test Failover in production sql server cluster?

    Thanks

  • stop the SQL service on the Active node....

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • ah, where to start... skipping the conversation and questions about "testing" on "production", and assuming you know what you are doing....

    one way to test is to pull the network cable from the active node.

  • NJ-DBA (9/20/2011)


    ah, where to start... skipping the conversation and questions about "testing" on "production", and assuming you know what you are doing....

    one way to test is to pull the network cable from the active node.

    Details :-D!

  • pull the power plug and see what happens

    this is assuming you have tested this setup in a test environment first.

  • the easiest way to test as NJ-DBA has already pointed out is to simulate a failure of the public NIC on the active node. Either pull the cable or disable the NIC in Windows, just be sure you let users know its happening and agree it with management 😉

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Thanks

    Can we stop the sql service or cluster service also?

    Thanks

  • If you stop the cluster service - it wont failover!

  • But stopping a sql service is a failover ?

    1.Why stopping a cluster service is not comes under failovers test?

    2.May I know all the possible ways to test a failover.

    Thanks

  • The cluster service is the service that is in charge of the failover. It will perform a failover if it detects an issue, so in order to test this you need to simulate an issue - stop SQL Agent or service, somehow disable the network connection to the active node, stop the active server etc...

    Stop the cluster service and you no longer have a cluster - and failover doesn't make any sense.

  • I would get a test plan together using a test cluster (if you have one) and include in it;

    Rebooting nodes (single & multiple) / removing the witness

    Disabling HBAs

    Unpresenting disks

    Disabling NICs (public & private)

    The plan I put together took about 2 days to work through, very tedious, but at least I know whats going to happen to the cluster in all eventualities.

    Cheers

    Vultar

  • forsqlserver (9/20/2011)


    But stopping a sql service is a failover ?

    1.Why stopping a cluster service is not comes under failovers test?

    2.May I know all the possible ways to test a failover.

    At this point we need to know what type of cluster we are dealing with here, Windows 2003 or Windows 2008, number of nodes and the quorum configuration currently in use?

    In my experience failovers are initiated by the following scenarios

    ➡ someone logs onto the active node and instead of logging off hits shutdown

    ➡ someone updates the server from automatic updates and restarts the server

    ➡ someone unwittingly unplugs the public NICs network cable from switch

    ➡ someone changes the VLAN assignments on the public NICs switch port

    ➡ The server blue screens

    ➡ The C drive fails

    By disabling the public NIC or shutting the server down you are replicating most of these.

    You could pull the power cord on the active node but i dont recommend doing that as you could cause damage to the shared disk filesystems. You dont want to be repairing or restoring data to these unnecessarily

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply