September 13, 2012 at 1:43 pm
I have seen issues like this happen when there are problems at the switch level that cause brief but annoying flapping. In the case I am speaking of, someone plugged in a cable into the wrong switch. Once the cable was unplugged the problem went away.
September 13, 2012 at 1:57 pm
I seem to recall a systems person at another employer talking about rogue switches/routers causing issues as well. Some individual(s) had brought in personal route/switch (linksys, for example) to conect multiple PC's to a single jack in their cube.
September 13, 2012 at 2:50 pm
On the plus side my Net Admin has gone from "sounds like a SQL server problem" to "I am installing some throughput and network monitoring software for this" and "yes" to looking at the physical switches.
September 13, 2012 at 3:00 pm
I thought I better ask, you do know what DBA stands for don't you?
September 13, 2012 at 3:23 pm
Dah Blame Area (until proven otherwise)?
I actually got the additional help after reading off the post replies from here. I still don't have anything "rock solid" (to present or to me) that says it's NOT the SQL box or it IS the switch(es). I still can't blame the hardware at this point, but at least getting a second eye on it.
I also have some of our App end users that are setup with multiple data source connections for tomorrow morning, to see if it's just to the production environment or multiple sources. I also talked about doing OS patches and reboots for maintenance times this weekend, with the Net Admin. I may ask if we can do any firmware updates.
September 13, 2012 at 3:36 pm
matt.newman (9/13/2012)
Dah Blame Area (until proven otherwise)?I actually got the additional help after reading off the post replies from here. I still don't have anything "rock solid" (to present or to me) that says it's NOT the SQL box or it IS the switch(es). I still can't blame the hardware at this point, but at least getting a second eye on it.
I also have some of our App end users that are setup with multiple data source connections for tomorrow morning, to see if it's just to the production environment or multiple sources. I also talked about doing OS patches and reboots for maintenance times this weekend, with the Net Admin. I may ask if we can do any firmware updates.
Close, Default Blame Acceptor.
The fact that you are getting more eyes looking is a good thing.
September 13, 2012 at 5:15 pm
Lynn Pettis (9/13/2012)
matt.newman (9/13/2012)
Dah Blame Area (until proven otherwise)?I actually got the additional help after reading off the post replies from here. I still don't have anything "rock solid" (to present or to me) that says it's NOT the SQL box or it IS the switch(es). I still can't blame the hardware at this point, but at least getting a second eye on it.
I also have some of our App end users that are setup with multiple data source connections for tomorrow morning, to see if it's just to the production environment or multiple sources. I also talked about doing OS patches and reboots for maintenance times this weekend, with the Net Admin. I may ask if we can do any firmware updates.
Close, Default Blame Acceptor.
The fact that you are getting more eyes looking is a good thing.
I wasn't going to mention this part of the process in my original post, but this is a difficult concept for any networking engineer to swallow and you are going to get a lot of kickback and "The network is fine" responses. Just be glad that he started looking into it.. It took me 2 days of back and forth with the networking engineer to get the problem resolved. He eventually found a cable in the switch that wasn't suppose to be there. This can be a difficult find for a networking engineer, even if they are looking for it. Stick to your guns and let him dig the problem out.
September 14, 2012 at 4:09 am
I do not have anything constructive to add . I would just like to say I am at the edge of my set waiting for the solution.
September 14, 2012 at 8:40 am
You may be waiting for a bit.
Net Admin out today. I push on him when I feel there is something conclusive, but this time I do not see something conclusive that points that I need bring in the resources of other people. He is usually quick to respond, but I usually have something other than "stuff don't work, check your side". I see an apparent problem on SQL server whether it is SQL server itself or not. I didn't mean to play the blame game or throw someone else under the bus. Although I can't find the problem, I still feel I should give some insight before passing the buck and pulling resources away from other things.
I still have two theories on Log shipping causing a problem and a linked server connection to MySQL causing a hang up I am playing on this morning. I turned these off at 8:30 and I did not see the freeze this morning at 9 when the freeze has consistently shown up... however... it's Friday and there is less going on as well and less strain on the network and servers.
I just flipped on Log Shipping and caught a 30 second freeze on the system.
Our end users who caught the freeze connected to both production and stage said stage was slow but not locked, when production app link was locked.
September 14, 2012 at 8:43 am
One of the things you can do to check whether or not a networking issue is at hand, is to actually RDP to the server itself.. If you have a flapping problem, which causes lapses of connectivity, then your RDP session should actually briefly disconnect as well. Your RDP sessions intermittently disconnecting for a few moments every x mintues is concrete evidence of a networking problem.
September 14, 2012 at 9:05 am
One of the reasons I could not really say all network connectivity is an issue is that I used VNC and had been watching the task manager on the machine. I was able to query against the servers when the server was frozen, sometimes. Is RDP going to be different that you think, or are you looking for overall network connectivity outage?
September 14, 2012 at 9:09 am
In cases where I have seen this happen, while on the DB Server in question, my RDP session would become unresponsive and then a box was saying it was trying to reconnect. After x seconds, the RDP session would reconnect and it would be working perfectly fine. This happened intermittently and would only last between 1-5 minutes per disconnect.
September 14, 2012 at 9:25 am
We started getting "freezes" due to disk I/O being frozen by Avamar backup solution - but weirdly, sometimes at other times than the regularly scheduled backup when no Avamar process was running.
September 14, 2012 at 9:28 am
One of the log shipping jobs that is a very small and nearly static database seems to be at the root of this, somehow.
I kicked off log shipping and a lock was seen. Log shipping had not started for all databases when the lock was seen, it was stuck on this very small (3 mb data file 6 mb log file). This job step to backup the database is taking between 1:30 and 5:00 to complete.
It's after 10 so it's unlikely I will be able to continue this until next week. I am still not sure if this is the root cause because every time I manually kicked off to cause a block there was a very apparent CPU spike that had the server pegged. The stage connected app did not freeze.
September 14, 2012 at 9:29 am
The log shipping is not likely to be causing your SQL Server problem, as it will use a native driver; however, it is possible that the linked server to MySQL (probably odbc) could cause SQL Server to hang up... What provider are you using?
Viewing 15 posts - 16 through 30 (of 40 total)
You must be logged in to reply to this topic. Login to reply