November 27, 2012 at 8:28 am
I have a 2 node cluster. When I move the instance of SQL from node B to Node A, SQL fails. I haven't been able to determine why, but I suspect SQL has become corrupt on node A.
To resolve the issue I thought of repairing the node. When I get to the step where I'm to select the instance of SQL, there isn't one to select, but the node is listed. If I click Next, the error states: This node is not in any sql server failover cluster.
So I tried to remove the node, but got the same result.
(Something has gotten really screwed up. This cluster has been working well for 2 years.)
Can I evict the node in failover management, and then rebuild the node from scratch?
This is a prod environment & I need my databases available at all times, so will this affect the available of them?
I'm running W2k8 Enterprise & SQL2k8 R2.
Thanks in advance.
BigSam
November 27, 2012 at 8:39 am
what you should have done is to take the group offline and move it to Node A, then try to bring the resources online one at a time in this order
If i read it right, you're saying that the uninstall has failed on Node A?
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
November 27, 2012 at 11:20 am
How you are trying to remove the node?
November 28, 2012 at 6:38 am
I configured the node to not automatically failover. Then I moved the instance from Node B to Node A. When SQL was in a failed state on Node A I tried to bring it online from SQL Configuration Manager - still failed to come online. Nothing helpful in the logs. Everything comes online except for SQL & Agent.
I would love to repair or remove the node. Whatever it takes to get it working; repaired or rebuilt.
November 28, 2012 at 6:40 am
In trying to repair or remove the node, I'm using the SQL Setup -> Maintenance options on the CD.
November 28, 2012 at 9:17 am
BigSam (11/28/2012)
A. When SQL was in a failed state on Node A I tried to bring it online from SQL Configuration Manager
Incorrect, bring the resources online manually from failover cluster manager in the order I specified above. When the resource fails go check the event logs you will find information in there.
If you'd like to email me a copy of the event log after the failure I'd be happy to take a look 😉
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
November 29, 2012 at 1:50 pm
I understand where you're going with these steps. Getting the window to do this on a prod server isn't always easy.
When I failed-over from Node B to A, the instance of SQL was unable to automatically move back. When I tried to start SQL on Node A I was able to capture lots of information in the Error Log; unfortunately there wasn't a smoking gun in the log. All of my databases started, etc. & then boom. I've been trying to work with Microsoft on this & they seem confounded, too. Also, they seem more interested in the root cause, which I understand.
However, now my boss is on me to get something done ASAP, which I also understand. That's why I wanted to either repair or remove the instance. Since I cannot do either in the recommended way, I need to know what or if there is a work around that doesn't take my databases offline, such as just evicting the node from within failover manager, then rebuilding the server & node from scratch.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply