January 25, 2016 at 8:23 am
Hi,
We are relatively new to clustering. We are experience a lot of failovers and was wondering if someone could help me figure out where to start looking for the issue. We see regular errors in the cluster manager. Also the remote desktop experience is poor, very laggy taking 20-30 seconds to drag windows around the screen. Disk latency is also rather high. For example the average read latency for the drives are 70 milliseconds each and I read that over 20 milliseconds you may start seeing problems. I'm not sure if this would cause a failover or if what I read was accurate. The average read stall times in ms is several hundred. The most common wait types are PAGEIOLATCH_SH and LATCH_EX and SOS_SCHEDULER_YIELD.
I also ran the network cluster validation test and did get some warnings. They are below.
-Adapters iscsi_vlan192_slot6_top and iscsi_vlan192_slot7_top on node serverfqn have IP addresses on the same subnet.
-Adapters iscsi_vlan197_slot6_bottom and iscsi_vlan197_slot7_bottom on node serverfqn have IP addresses on the same subnet.
-The RegisterAllProvidersIP property for network name 'Name: serverfqn' is set to 1. For the current cluster configuration this value should be set to 0.
I sent this info out to the systems team who built the cluster but did not get a response. Below are some more errors, this time from the cluster manager. I am just wondering as a DBA who doesn't have much control over the windows cluster if there are other places I can look to help narrow down the bottleneck. Thanks for any help.
System
Event ID: 1129
Level: Error
Task: Network Manager
Source: Microsoft_windows-FailoverCluster
Cluster network 'Cluster' is partitioned. Some attached failover cluster nodes cannot communicate with each other over the network. The failover cluster was not able to determine the location of the failure. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
----------------------------------------------------------
System
Event ID: 1126
Level: Warning
Task: Network Manager
Source: Microsoft_windows-FailoverCluster
Cluster network interface 'bnhbiscl05-01 - Cluster' for cluster node 'bnhbiscl05-01' on network 'Cluster' is unreachable by at least one other cluster node attached to the network. The failover cluster was not able to determine the location of the failure. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
---------------------------------------------------------
System
Event ID: 1130
Level: Error
Task: Network Manager
Source: Microsoft_windows-FailoverCluster
Cluster network 'Cluster' is down. None of the available nodes can communicate using this network. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
----------------------------------------------------
System
Event ID: 1127
Level: Error
Task: Network Manager
Source: Microsoft_windows-FailoverCluster
Cluster network interface 'bnhbiscl05-01 - Cluster' for cluster node 'bnhbiscl05-01' on network 'Cluster' failed. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
January 25, 2016 at 8:26 am
Let me add that the MDF and LDF files are on the same drive. I told the people who created the database to not do this as it's bad practice and the two file types have different IO access patterns but they are so convinced the drive speed is so great that it shouldn't matter.
January 25, 2016 at 9:58 am
lmacdonald (1/25/2016)
Hi,We are relatively new to clustering. We are experience a lot of failovers and was wondering if someone could help me figure out where to start looking for the issue. We see regular errors in the cluster manager. Also the remote desktop experience is poor, very laggy taking 20-30 seconds to drag windows around the screen. Disk latency is also rather high. For example the average read latency for the drives are 70 milliseconds each and I read that over 20 milliseconds you may start seeing problems. I'm not sure if this would cause a failover or if what I read was accurate. The average read stall times in ms is several hundred. The most common wait types are PAGEIOLATCH_SH and LATCH_EX and SOS_SCHEDULER_YIELD.
I also ran the network cluster validation test and did get some warnings. They are below.
-Adapters iscsi_vlan192_slot6_top and iscsi_vlan192_slot7_top on node serverfqn have IP addresses on the same subnet.
-Adapters iscsi_vlan197_slot6_bottom and iscsi_vlan197_slot7_bottom on node serverfqn have IP addresses on the same subnet.
-The RegisterAllProvidersIP property for network name 'Name: serverfqn' is set to 1. For the current cluster configuration this value should be set to 0.
I sent this info out to the systems team who built the cluster but did not get a response. Below are some more errors, this time from the cluster manager. I am just wondering as a DBA who doesn't have much control over the windows cluster if there are other places I can look to help narrow down the bottleneck. Thanks for any help.
System
Event ID: 1129
Level: Error
Task: Network Manager
Source: Microsoft_windows-FailoverCluster
Cluster network 'Cluster' is partitioned. Some attached failover cluster nodes cannot communicate with each other over the network. The failover cluster was not able to determine the location of the failure. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
----------------------------------------------------------
System
Event ID: 1126
Level: Warning
Task: Network Manager
Source: Microsoft_windows-FailoverCluster
Cluster network interface 'bnhbiscl05-01 - Cluster' for cluster node 'bnhbiscl05-01' on network 'Cluster' is unreachable by at least one other cluster node attached to the network. The failover cluster was not able to determine the location of the failure. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
---------------------------------------------------------
System
Event ID: 1130
Level: Error
Task: Network Manager
Source: Microsoft_windows-FailoverCluster
Cluster network 'Cluster' is down. None of the available nodes can communicate using this network. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
----------------------------------------------------
System
Event ID: 1127
Level: Error
Task: Network Manager
Source: Microsoft_windows-FailoverCluster
Cluster network interface 'bnhbiscl05-01 - Cluster' for cluster node 'bnhbiscl05-01' on network 'Cluster' failed. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
Firstly, are these virtual machines?
Secondly, how many network cards does each node have and how are they configured?
How many nodes are in the WSFC?
Are the nodes on separate geographical sites?
What quorum configuration are you using?
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
January 25, 2016 at 12:18 pm
I'll answer the questions I know. I'll have to ask systems for the other information.
There are only two nodes in the cluster. I believe the quorum is a disk share.
They are not in separate geographical locations.
I believe they are virtualized but this and the networking questions I have sent to someone else.
January 25, 2016 at 12:38 pm
More information I was able to get.
The two nodes a giant physicals
Currently four 10gb ports dedicated to iscsi - two on each iscsi networks
Two one gb ports bound to a team (resource network)
One 10gb crossover for cluster communication.
June 23, 2020 at 12:18 pm
I have the same problem for my cluster. Please hlep me to fix this problem
Errors :
Event ID: 1129
Level: Error
Task: Network Manager
Source: Microsoft_windows-FailoverCluster
Cluster network 'Cluster' is partitioned. Some attached failover cluster nodes cannot communicate with each other over the network. The failover cluster was not able to determine the location of the failure. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
----------------------------------------------------------
System
Event ID: 1126
Level: Warning
Task: Network Manager
Source: Microsoft_windows-FailoverCluster
Cluster network interface 'bnhbiscl05-01 - Cluster' for cluster node 'bnhbiscl05-01' on network 'Cluster' is unreachable by at least one other cluster node attached to the network. The failover cluster was not able to determine the location of the failure. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
---------------------------------------------------------
System
Event ID: 1130
Level: Error
Task: Network Manager
Source: Microsoft_windows-FailoverCluster
Cluster network 'Cluster' is down. None of the available nodes can communicate using this network. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply