Cluster Failover Testing

Question

Cluster Failover Testing

TaoCanis

Old Hand

Points: 352
More actions
April 25, 2007 at 12:54 pm

#103349

I was shown these two articles related to testing cluster failure and failover:

http://technet2.microsoft.com/WindowsServer/en/library/20e7a35f-2477-4f9d-acb9-5146c92152211033.mspx?mfr=true

http://msdn2.microsoft.com/en-us/library/ms189117.aspx

Was just wondering if anyone had any other tests that they have performed or it would just be overkill? We're using SQL Server 2000 on MS 2003.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply

Rudyx - the Doctor SSC-Forever Points: 43695 More actions · Answer 1

Aside from the testing with the Cluster Administrator as outlines in Technet we ALWAYS perform the following tests after clustering and SQL Server is configured and installed:

Both nodes up - On the 'active' node perform a shutdown no restart - things should failover to the 'passive' node.
Both nodes up - On the 'passive' node perform a shutdown no restart - things should remain 'active' on the active node.
Both nodes up - On the 'active' node pull the power core (YUP, that is what happens when a power supply fails) - things should failover to the 'passive' node.
Both nodes up - On the 'passive' node pull the power core (YUP, again) - things should remain 'active' on the active node.
Both nodes up - On the 'active' node pull the network cable out (YUP, that is what happens when your primary switch fails) - things should failover to the 'passive' node.
Both nodes up - On the 'passive' node pull the network cable (YUP, again) - things should remain 'active' on the active node.
Both nodes up, unplug the crossover cable - both nodes should remain up but the cluster administrator will complain.

Now if you shared storage is on a SAN and you have a 'solid' environment:

Both nodes up - On the 'active' node fiber cable out (YUP, that is what happens when your primary SAN switch fails) - things should failover to the 'passive' node.
Both nodes up - On the 'passive' node pull the fiber cable (YUP, again) - things should remain 'active' on the active node.

Some might call it 'overkill' we prefer to call it 'due diligence' !!!

The 'stuff' above is what I prefer to call 'fun'. But you need to make it educational while you do it. To accomplish this examine the System, Application and Security Event logs before and after each test on both nodes. By doing this you will be one up on diagnosis of a real problem when it occurs. Also do not forget to examine the actual cluster log located at: C:\Windows\Cluster\cluster.log. This file wraps like a transaction log. Also, the time used is in GMT.

RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

TaoCanis Old Hand Points: 352 More actions · Answer 2

TaoCanis

Old Hand

Points: 352

April 27, 2007 at 8:18 am

#702937

This is great information! Thanks Rudy!