January 25, 2014 at 8:16 pm
Problem: Cluster not failing over after adding physical disk
Specs:
Server 2003
SQL 2005
Nodes: Active/Active, two node cluster each with an instance installed
Node1 has group A
Node2 has group B
Recent Event: I added a Physical Disk resource to group B on node 2 of the cluster
Story: I recently added a physical disk resource to group B on node 2 of a two node cluster. We are patching the servers and moved group A from node1 to node2, restarted node1 and moved group A back from node 2 to node 1.
An issue occurred when moving group B from node 2 to node 1. What happens is that the resources, SQL, IP… start moving over to node1 then the disk I just added fails and the whole group B moves back to node 2 (Fail Back).
I checked all the setting of the disk resource and they match perfectly with the others.
What I noticed is in computer management --disk manager of node 1 I do not see the disks for node2.
More specifically, in node 2 disk manager I can see all the local disk and the disks for node 1. The disk for node 1 are marked unreachable and have a red x. On node 1 I cannot see the disks for node 2 in the same way. I am only seeing the local disks on that node.
Another thing I noticed is that disk Manager on both nodes show a Disk 2
I rescanned disks on both server and still no luck.
Any help would be appreciated.
Jeff
January 26, 2014 at 7:06 pm
More information:
I am in Node 2 in Computer Management, but this time SAN Disk Manager and under Disks I do not see the disk that is failing during a Move Group operation.
On node one all the disks are listed but node two does not have the disk that I just added.
Could this be my trouble.
Any help is appreciated.
Jeff
January 27, 2014 at 6:08 am
Once you add the new disks you must add them as a dependency to the SQL Server service cluster resource, this will require a restart of the clustered service
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
January 27, 2014 at 7:53 am
I did add the resource as a dependency to SQL Server Service, but have not yet restarted the "Cluster Service" on that node.
I will give it a try tonight after hours and let you know what happens.
Your help is appreciated.
Thanks
Jeff
January 27, 2014 at 8:33 am
jayoub (1/27/2014)
I did add the resource as a dependency to SQL Server Service, but have not yet restarted the "Cluster Service" on that node.I will give it a try tonight after hours and let you know what happens.
Your help is appreciated.
Thanks
Please run the following from a command prompt on the server and post results
cluster node
cluster res
Take the new disks resource name from cluster res and put into the following
cluster res "disk resource name" /listowners
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
January 27, 2014 at 11:39 am
Cluster res
Resource Group NodeStatus
Disk L: GroupsharePointZ010online
Disk M: GroupsharePointZ010online
SQL Network Name (SQLSHAREPOINT)GroupsharePointZ010online
SQL IP Address 1 (SQLSHAREPONT)GroupsharePointZ010online
SQL Server (SHAREPOINT) GroupsharePointZ010online
SQL Server Agent (SHAREPOINT)GroupsharePointZ010online
SQL Server Fulltext (SHAREPOINT)GroupsharePointZ010online
Disk J: GroupDiscoZ009online
Disk S: GroupDiscoZ009online
SQL Network Name (SQLVIRTUAL)GroupDISCOMPZ009online
SQL IP Address 1 (SQLVIRTUAL)GroupDISCOMPZ009online
SQL Server (DISCOMP) GroupDISCOMPZ009online
Disk T: GroupDISCOMPZ009online
NEWDB (this is a disk) GroupDISCOMPZ009online
Disk K: GroupDISCOMPZ009online
SQL Server Agent (DISCOMP)GroupDISCOMPZ009online
SQL Server Fulltext (DISCOMP)GroupDISCOMPZ009online
Cluster IP Address Cluster GroupZ009online
Cluster Name Cluster GroupZ009online
Disk Q: Cluster GroupZ009online
MSDTC Cluster GroupZ009online
Disk O: (problem disk) GroupsharePointZ010online
Cluster Node
NodeNodeIDStatus
Z0102UP
Z0091UP
Cluster res "Disk O:" /listowners
Z010
Z009
Jeff
January 27, 2014 at 11:44 am
I just found out that the SAN Admin provisioned the disk to only the one node Z010 and did not include the other node Z009 in the storage software. The other drives have both hosts listed and probably this is the cause of the problem.
I will restart the cluster services or even the whole box tonight and check it and let you know.
I provided the information as best i could. I had to retype the whole thing and the formatting did not come out in a nice way.
Thank you very much
Jeff
January 27, 2014 at 12:26 pm
jayoub (1/27/2014)
I just found out that the SAN Admin provisioned the disk to only the one node Z010 and did not include the other node Z009 in the storage software.
Thats your problem right there
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
January 27, 2014 at 4:29 pm
Still no luck. I rebooted the server and tried the move and it was the same results.
I still feel like the SAN Admin has not provisoned the drive correctly. I have done the job before with another SAN admin and it went without a hitch
In computer Management there is a folder called SAN Disk Manager and the trouble drive is not listed there. All other drives are listed and working. I have a feeling that there is more to the provisioning process that must be done.
I will keep trying and let you know what happens. I may have to start digging into the SAN myself and see. Sometimes a second set of eyes can spot something.
Thanks
Jeff
January 28, 2014 at 8:02 am
jayoub (1/27/2014)
Still no luck. I rebooted the server and tried the move and it was the same results.I still feel like the SAN Admin has not provisoned the drive correctly. I have done the job before with another SAN admin and it went without a hitch
In computer Management there is a folder called SAN Disk Manager and the trouble drive is not listed there. All other drives are listed and working. I have a feeling that there is more to the provisioning process that must be done.
I will keep trying and let you know what happens. I may have to start digging into the SAN myself and see. Sometimes a second set of eyes can spot something.
Thanks
Its easy to get the IDs wrong and leave a device masked, i have experienced this in the past
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
January 30, 2014 at 10:44 am
Update to the issue.
The failover is still not working and here is the problem. We have an HP EVA SAN and we also have software called Falconstore that is managing the cluster drives.
The current SAN admin needs to get Falcostore out of the equation, so he wants to provision the drives using only the HP EVA software and get that to work. Once he gets this working I am sure the physical disk will begin failing over correctly
Again thanks for the help and I will update once it is completely figured out.
Jeff
January 30, 2014 at 11:25 am
Update to the issue.
The failover is still not working and here is the problem. We have an HP EVA SAN and we also have software called Falconstore that is managing the cluster drives.
The current SAN admin needs to get Falcostore out of the equation, so he wants to provision the drives using only the HP EVA software and get that to work. Once he gets this working I am sure the physical disk will begin failing over correctly
Again thanks for the help and I will update once it is completely figured out.
Jeff
Jeff
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply