September 20, 2011 at 6:53 am
How to test Failover in production sql server cluster?
Thanks
September 20, 2011 at 7:58 am
stop the SQL service on the Active node....
Lowell
September 20, 2011 at 8:00 am
ah, where to start... skipping the conversation and questions about "testing" on "production", and assuming you know what you are doing....
one way to test is to pull the network cable from the active node.
September 20, 2011 at 8:02 am
NJ-DBA (9/20/2011)
ah, where to start... skipping the conversation and questions about "testing" on "production", and assuming you know what you are doing....one way to test is to pull the network cable from the active node.
Details :-D!
September 20, 2011 at 8:10 am
pull the power plug and see what happens
this is assuming you have tested this setup in a test environment first.
September 20, 2011 at 10:04 am
the easiest way to test as NJ-DBA has already pointed out is to simulate a failure of the public NIC on the active node. Either pull the cable or disable the NIC in Windows, just be sure you let users know its happening and agree it with management 😉
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
September 20, 2011 at 11:02 pm
Thanks
Can we stop the sql service or cluster service also?
Thanks
September 20, 2011 at 11:04 pm
If you stop the cluster service - it wont failover!
September 20, 2011 at 11:17 pm
But stopping a sql service is a failover ?
1.Why stopping a cluster service is not comes under failovers test?
2.May I know all the possible ways to test a failover.
Thanks
September 20, 2011 at 11:22 pm
The cluster service is the service that is in charge of the failover. It will perform a failover if it detects an issue, so in order to test this you need to simulate an issue - stop SQL Agent or service, somehow disable the network connection to the active node, stop the active server etc...
Stop the cluster service and you no longer have a cluster - and failover doesn't make any sense.
September 21, 2011 at 3:35 am
I would get a test plan together using a test cluster (if you have one) and include in it;
Rebooting nodes (single & multiple) / removing the witness
Disabling HBAs
Unpresenting disks
Disabling NICs (public & private)
The plan I put together took about 2 days to work through, very tedious, but at least I know whats going to happen to the cluster in all eventualities.
Cheers
Vultar
September 21, 2011 at 3:45 am
forsqlserver (9/20/2011)
But stopping a sql service is a failover ?1.Why stopping a cluster service is not comes under failovers test?
2.May I know all the possible ways to test a failover.
At this point we need to know what type of cluster we are dealing with here, Windows 2003 or Windows 2008, number of nodes and the quorum configuration currently in use?
In my experience failovers are initiated by the following scenarios
➡ someone logs onto the active node and instead of logging off hits shutdown
➡ someone updates the server from automatic updates and restarts the server
➡ someone unwittingly unplugs the public NICs network cable from switch
➡ someone changes the VLAN assignments on the public NICs switch port
➡ The server blue screens
➡ The C drive fails
By disabling the public NIC or shutting the server down you are replicating most of these.
You could pull the power cord on the active node but i dont recommend doing that as you could cause damage to the shared disk filesystems. You dont want to be repairing or restoring data to these unnecessarily
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply