November 7, 2012 at 10:56 pm
Hi All,
I have faced an issue yesterday on our SQL Server 2008 SP1 cluster. It got shut down due to network issue. In HA, the cluster should failover to another node if there is any issue. When the private network is not down, why the cluster service went down. The error reported in Event Viewer and Cluster events are 1205, 1069, 1077, 1126, 1127, 1129 and 1130.
Please help me to understand.
Thanks in advance.
Regards
S Govindarajan
November 8, 2012 at 4:59 am
Check your windows application and system logs, collate this info with the cluster events log. Can you post details of any errors reported, failure of the public network will initiate a failover.
Public network will usually allow cluster and user traffic.
Private network will usually only allow cluster communications.
If the private network goes down it can still communicate on the public network.
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
November 8, 2012 at 9:26 am
Network issues can make clusters inaccessible, prevent failover, or trigger failovers than shouldn't happen, but they shouln't cuase the cluster instance or SQL server to completely shut down.
November 8, 2012 at 9:21 pm
Hi All,
Thanks for your reply.
Even i have the same understanding that the sql server should not go down due to network issues whether it is private or public. When i was going thru the event logs, i found event id 1077
- System
- Provider
[ Name] Microsoft-Windows-FailoverClustering
[ Guid] {baf908ea-3421-4ca9-9b84-6689b8c6f85f}
EventID 1077
Version 0
Level 2
Task 20
Opcode 0
Keywords 0x8000000000000000
- TimeCreated
[ SystemTime] 2012-11-07T09:25:16.053Z
EventRecordID 45227
Correlation
- Execution
[ ProcessID] 5232
[ ThreadID] 28880
Channel System
Computer NODE1.COMPANY.COM
- Security
[ UserID] S-1-5-18
- EventData
ResourceName SQL IP Address 1 (SQLCLUSTER01)
IPAddress 192.168.101.154
Status 1117
Health check for IP interface 'SQL IP Address 1 (SQLCLUSTER01)' (address '192.168.101.154 ') failed (status is '1117'). Run the Validate a Configuration wizard to ensure that the network adapter is functioning properly.
When i searched in google, i found that "failed (status is '1117')." is ERROR_IO_DEVICE error. On the other hand, i could not find any logs related to this in system logs and applications logs.
Help me to find out how i can find the root cause for this message.
Thanks in advance.
Regards
S Govindarajan
November 8, 2012 at 11:45 pm
govindarajan69 (11/8/2012)
Even i have the same understanding that the sql server should not go down due to network issues whether it is private or public.
No, stop, this is incorrect.
Kindly re read my post above, failure of the public network will initiate a failover of the instance.
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
November 9, 2012 at 3:00 am
Hi Perry,
if you are saying the public network will initiate a failover of the instance, will it shut down the cluster itself ? or will it failover to the available node ? When connection to the SAN Storage & Quorum disk are intact, then why it is happening.
Regards
Govind
November 9, 2012 at 3:54 am
It should try to start on an available node, if theres an issue with the available node too then the group will stay offline after a certain amount of retries
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
November 9, 2012 at 9:16 am
Thanks Perrry. Infact thats what my understanding after this incident. It might have been costlier though.
Regards
Govind
November 9, 2012 at 9:34 am
You can change the actions that are taken during a failover but the default is to try online locally then fail over and try 3 times then offline if unsuccessful.
It sounds like you had a public network outage affecting both nodes, ensure they're not plugged into the same switch 😉
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply