October 11, 2011 at 1:38 pm
I'm new to actually supporting a cluster. What steps do you take to test a cluster before bringing into production? How do you simulate a failover?
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
October 11, 2011 at 1:46 pm
Jack,
Its really easy to do a failover. In the Failover Cluster Management app, open your cluster and open up "Services and Applications", you'll see SQL Server (or whatever you named the resource group) there. Right click on it look down until you see "Move this service or application to another node", click on it, and it should list all OTHER nodes, chose one that SQL has been added to as well and it will fail it over to that one.
CEWII
October 11, 2011 at 1:54 pm
As far as tests..
I fail it over a bunch of times and to every possible node.
I make sure I don't have any connection problems when it is runing on any particular node.
I make sure that the nodes are EXACTLY the same. If I do something on one I immediately go do it on the other(s).
Make sure you have your DTC configured the same on all nodes.
Make sure that the SQL Login has the same rights on both nodes.
Make sure that if lock pages in memory is set on one node it is set on both.
The most common thing I see happen is that after a failover to the other node things don't work right, and most often that is tied to something done on the primary node but forgotten on the other node(s). Things done IN SQL will carry from node to node, but things done in the OS will NOT.
CEWII
October 11, 2011 at 5:12 pm
Jack, the easiest way to test a failover of the cluster is to kill the public network connection on the active node. This should cause failover to a partner node.
Also kill the sqlservr.exe process via task manager on an active node to simulate failover.
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
October 11, 2011 at 5:17 pm
Thanks guys. Unfortunately this is for a client, and they put the cluster into production before doing any testing of the fail over and before letting me know so the SQL Server side of things is not up to snuff. I hesitate to say not right because it is serving the web site.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
October 11, 2011 at 8:42 pm
Thanks guys. Here's what happened. I did the move to the other node and everything came up, in cluster manager okay, but I couldn't connect to SQL from outside. I was remoted in to node A, failed over to Node B, and from Node A I could connect to SQL using the virtual cluster name, but from home over the VPN I could not connect to SQL using the virtual cluster name or IP. I could ping both the virtual name and IP so it was talking over the network. It looks like windows firewall is not setup the same on both nodes so I think that might be the issue. The network/sysadmin is going to look at that tomorrow. Failed back over to Node A and everything works fine. Failovers took less than a minute. Back to Node A was faster than to Node B.
I also got to do some tempdb maintenance and got to fix a VLF issue on one of the main DB's so it wasn't a lost night.
If you have any suggestions for why I couldn't connect to SQL when on Node B I'm all ears.
Thanks again.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
October 11, 2011 at 11:28 pm
Firewall is where I would start as well..
Given that port and IP will be the same no matter what node it is on I doubt it would be a name resolution problem.
If you are using named instances the browser port might be blocked.
CEWII
October 12, 2011 at 12:43 am
Jack
I always disable windows firewall on corporate domain computers. If you need to ringfence a set of servers use a dedicated firewall product. If you want a good freebie Smoothwall is excellent.
How many cluster networks are setup on the nodes (I.e. How many active NICs)?
What network type have you set for each cluster network?
Public should allow all cluster communications.
Heartbeat should allow internal communications only.
ISCSI if used should be disabled for cluster use.
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
October 12, 2011 at 1:35 am
I always disable windows firewall on corporate domain computers
Considering the security, enabling few ports (adding exceptions) would be a good idea.
October 12, 2011 at 1:38 am
this is for a client, and they put the cluster into production before doing any testing of the fail over
Didn’t you try to explain them the consequences of this BAD practice?
October 12, 2011 at 2:05 am
Dev @ +91 973 913 6683 (10/12/2011)
I always disable windows firewall on corporate domain computers
Considering the security, enabling few ports (adding exceptions) would be a good idea.
No, as i mentioned above if you want to ring fence a group of servers use a dedicated product. Otherwise you need to manage x amount of separate firewall rules instead of just one. In any case why do you want a firewall on a server (and not a very good one at that) in your corporate network that's already protected by a corporate firewall?
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
October 12, 2011 at 6:33 am
Perry Whittle (10/12/2011)
JackI always disable windows firewall on corporate domain computers. If you need to ringfence a set of servers use a dedicated firewall product. If you want a good freebie Smoothwall is excellent.
How many cluster networks are setup on the nodes (I.e. How many active NICs)?
What network type have you set for each cluster network?
Public should allow all cluster communications.
Heartbeat should allow internal communications only.
ISCSI if used should be disabled for cluster use.
Since I'm now at my real office with no connection I can't necessarily accurately answer all the questions. But I"ll do what I can.
There are:
1 private NIC for heartbeat which I believe is configured to allow only internal communications
1 NIC for iSCSI which does not allow any other use, just I/O
I could be wrong on some of it, especially what communications are allowed on each NIC. Networking is probably my weakest area as in every other position I've had a good network admin who set things up propoerly.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
October 12, 2011 at 6:35 am
Dev @ +91 973 913 6683 (10/12/2011)
this is for a client, and they put the cluster into production before doing any testing of the fail over
Didn’t you try to explain them the consequences of this BAD practice?
Of course I did, but as a part-time remote consulting DBA I can't stop them from doing bad things. If they had had me on board when they started up this new cluster wouldn't even be needed. This one is a transition from a completely screwed up cluster that they were running on.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
October 12, 2011 at 7:52 am
if you want to ring fence a group of servers use a dedicated product
I didn’t assume group of servers. I just assume one server.
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply