Cluster Node Down

  • Hi Experts,

    One of our node in a 2 node SQL cluster is down . Checked the Port 3343 is open , no network issues as per network team.

    Cluster node KPDCSVDB101-UAT could not to join the cluster because it failed to communicate over the network

    with any other node in the cluster. Verify the network connectivity and configuration of any network firewalls.

    Cluster network name resource 'SQL Network Name (BUSINESS100-ENT)' encountered an error enabling the network name on this node. The reason for the failure was:

    'Unable to obtain a logon token'.

    cxl::ConnectWorker::operator (): GracefulClose(1226)' because of 'channel to remote endpoint 192.168.1.12:~3343~ is closed'

    [RES] IP Address <SQL IP Address 1 (BUSINESS100-ENT)>: Failed to delete IP interface 3C60000A, status 87.

    Node '%1' failed to form a cluster. This was because the witness was not accessible. Please ensure that the witness resource is online and available.

    Witness disk is online and available .Tried rebooting the failed node but no use.

  • is this a virtual machine?

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • open these ports

    Windows Server Clustering –

    TCP/UDPPortDescription

    TCP/UDP53User & Computer Authentication [DNS]

    TCP/UDP88User & Computer Authentication [Kerberos]

    UDP123Windows Time [NTP]

    TCP135Cluster DCOM Traffic [RPC, EPM]

    UDP137User & Computer Authentication [NetLogon, NetBIOS]

    UDP138DSF, Group Policy [DFSN, NetLogon, NetBIOS Datagram Service]

    TCP139DSF, Group Policy [DFSN, NetLogon, NetBIOS Datagram Service]

    UDP161SNMP

    TCP/UDP162SNMP Traps

    TCP/UDP389User & Computer Authentication [LDAP]

    TCP/UDP445User & Computer Authentication [SMB, SMB2, CIFS]

    TCP/UDP464User & Computer Authentication [Kerberos Change/Set Password]

    TCP636User & Computer Authentication [LDAP SSL]

    TCP3268Microsoft Global Catalog

    TCP3269Microsoft Global Catalog [SSL]

    TCP/UDP3343Cluster Network Communication

    TCP5985WinRM 2.0 [Remote PowerShell]

    TCP5986WinRM 2.0 HTTPS [Remote PowerShell SECURE]

    TCP/UDP49152-65535Dynamic TCP/UDP [Defined Company/Policy {CAN BE CHANGED}]

  • Perry Whittle (11/4/2016)


    is this a virtual machine?

    Yes

  • goher2000 (11/4/2016)


    open these ports

    Windows Server Clustering –

    TCP/UDPPortDescription

    TCP/UDP53User & Computer Authentication [DNS]

    TCP/UDP88User & Computer Authentication [Kerberos]

    UDP123Windows Time [NTP]

    TCP135Cluster DCOM Traffic [RPC, EPM]

    UDP137User & Computer Authentication [NetLogon, NetBIOS]

    UDP138DSF, Group Policy [DFSN, NetLogon, NetBIOS Datagram Service]

    TCP139DSF, Group Policy [DFSN, NetLogon, NetBIOS Datagram Service]

    UDP161SNMP

    TCP/UDP162SNMP Traps

    TCP/UDP389User & Computer Authentication [LDAP]

    TCP/UDP445User & Computer Authentication [SMB, SMB2, CIFS]

    TCP/UDP464User & Computer Authentication [Kerberos Change/Set Password]

    TCP636User & Computer Authentication [LDAP SSL]

    TCP3268Microsoft Global Catalog

    TCP3269Microsoft Global Catalog [SSL]

    TCP/UDP3343Cluster Network Communication

    TCP5985WinRM 2.0 [Remote PowerShell]

    TCP5986WinRM 2.0 HTTPS [Remote PowerShell SECURE]

    TCP/UDP49152-65535Dynamic TCP/UDP [Defined Company/Policy {CAN BE CHANGED}]

    There is no restriction within subnet as per network team.

  • Are you local admin on both nodes

  • check you DNS server/network.

  • goher2000 (11/6/2016)


    Are you local admin on both nodes

    Yes

  • goher2000 (11/6/2016)


    check you DNS server/network.

    Check for what?? The name is registered or not? We have checked that and its all there

  • check if you can reach to your LOGONSERVER from nodes.. , type set on command line and look for LOGONSERVER and try to ping it with servername, and FQDN

  • goher2000 (11/7/2016)


    check if you can reach to your LOGONSERVER from nodes.. , type set on command line and look for LOGONSERVER and try to ping it with servername, and FQDN

    Ping is working fine. Checked to nodes and also the Heartbeat IP.

  • VastSQL (11/6/2016)


    Perry Whittle (11/4/2016)


    is this a virtual machine?

    Yes

    VMware?

    Hyper-V?

    How many virtual NICS are exposed to the cluster nodes?

    Have you run a cluster validation?

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Check for any other error in errorlog.

  • most common reason ive seen for node dropping out of the cluster is co-stop on the ESX host, but normally get a different message.

    What is the current status on the node, are you able to restart the cluster service on this node?

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Perry Whittle (11/7/2016)


    VastSQL (11/6/2016)


    Perry Whittle (11/4/2016)


    is this a virtual machine?

    Yes

    VMware?

    Hyper-V?

    How many virtual NICS are exposed to the cluster nodes?

    Have you run a cluster validation?

    Thanks Perry. It runs on VMWare,2 NICS 1 for n\w and another 1 for hearbeat.

    Yes,ran cluster validation w/o disks but didnt give any error or lead.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply