SQL timeouts and disconnects?

  • Every 2 or 3 months we get sql timeouts or lose connection, mostly in SSMS. Usually disappears after a few days. This also causes disconnects from my application servers causing end uses to lose connection. At times, I can't even place my cursor in a sql query or execute - just hangs.

    Can't pinpoint a time or trend - just random. Nothing in the SQL log and haven't found anything in event viewer for the SQL box, yet.

    Ran continuous pings to the SQL box from my local pc while having the issue. There were no spikes or drops in packets. Doesn't seem to be network related. The SAN is a bit overloaded, but other than that the hardware is sufficient. Network/Hardware team can't find anything.

    What else can I check?

    SQL Server 2008 SP1; Enterprise (64-bit)

    OS: Win Server 2008 R2 Enterprise

  • Run PerfMon and a server-side sql trace. You might also check the default trace, see if it has enough back log to show you what was going on the last time this happened.

    You're sure there were no bulk loads or report runs when this happened? Maybe something that stole all the CPU processing? Does it happen around month end / quarter end? Find out who was doing what when this whole thing happened.

    And what do you mean by "The SAN is a bit overloaded?"

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin (7/28/2011)


    You're sure there were no bulk loads or report runs when this happened? Maybe something that stole all the CPU processing? Does it happen around month end / quarter end? Find out who was doing what when this whole thing happened.

    There is no timing trend as far as month/quarter end. When I monitor PerfMon, SQL activity, and all hardware metrics - nothing is above average, no intensive processes

    And what do you mean by "The SAN is a bit overloaded?"

    The db's and logs reside on a SAN and it is being worked pretty hard (controllers saturated), but no harder than usual. This COULD be the cause.

  • SkyBox (7/28/2011)


    And what do you mean by "The SAN is a bit overloaded?"

    The db's and logs reside on a SAN and it is being worked pretty hard (controllers saturated), but no harder than usual. This COULD be the cause.

    If you can't find any other problem, then I would say that *is* the problem. Time to find some real SAN performance monitoring tools and leave them running to see what happens during these intermittent problems.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin (7/29/2011) Time to find some real SAN performance monitoring tools and leave them running to see what happens during these intermittent problems.

    Can you recommend an inexpensive SAN monitoring tool? Many have recommended SLQIO, but from what I read, you should not run this on a production system.

    The SAN is EMC and the monitoring utility that we have does not monitor in real time.

  • SkyBox (7/29/2011)


    Can you recommend an inexpensive SAN monitoring tool?

    I don't know a whole lot about SAN monitoring, but let me ping in a few other experts and see what they have to say.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • SkyBox (7/28/2011)


    Every 2 or 3 months we get sql timeouts or lose connection, mostly in SSMS. Usually disappears after a few days. This also causes disconnects from my application servers causing end uses to lose connection. At times, I can't even place my cursor in a sql query or execute - just hangs.

    Can't pinpoint a time or trend - just random. Nothing in the SQL log and haven't found anything in event viewer for the SQL box, yet.

    Ran continuous pings to the SQL box from my local pc while having the issue. There were no spikes or drops in packets. Doesn't seem to be network related. The SAN is a bit overloaded, but other than that the hardware is sufficient. Network/Hardware team can't find anything.

    What else can I check?

    SQL Server 2008 SP1; Enterprise (64-bit)

    OS: Win Server 2008 R2 Enterprise

    Could also be something real simple - like a bad connector on a CAT5 cable.

    Although a few days is a puzzle.

    As with most tools like SQLIO, if things are bad at the time, running it for a short time might be helpful.

    Running something constantly to wait for a problem is a different matter.

    You would have a far better idea on what might be OK to try in your environment.

    Where you are running SSMS from - local or remote - might also play a part.

    On my laptop for example, if the wireless is on, and it is docked, I can see sporadic issues like you describe.

    They go away when I turn off the wireless.

  • Where you are running SSMS from - local or remote - might also play a part.

    On my laptop for example, if the wireless is on, and it is docked, I can see sporadic issues like you describe.

    They go away when I turn off the wireless.

    Running locally as are coworkers that are having the same problem. Possibly ruling out the SAN as the culprit, because I am working on site now, but I have disabled all the automated processes on the db servers that use the SAN. There is 0 activity, and I am still freezing up in SSMS?? My sqlserver 2005 system that shares that SAN is not experiencing this same problem.

    Suppose I can work in SSMS directly on the sql server to see if the issue occurs there.

  • Put on blackbox trace - This is free just need to run the sql procedures at least your see what was running at the time it does not connect.

    It will keep on looping and clearing the files but you can set it up to record last 10 at x size if you like.

    Do you see too many logins being connected at this time?

    How much reads/writes being done at this time?

    By knowing the amount of IOPS that you can send to your SAN will let you know that on day 5 you through 1000 IOPS to the SAN and you can only handle 800 means your SAN is not set up to handle the volume, if you talk to your SAN Vendor they will beable to measure and record the statistics for you and provide you the details of what happens during this period. Thats why we pay high prices for support of our SAN 🙂

    Any backups running during the periods.

    Do you have anti-virus running - miragent or anything?

  • I will check out the blackbox trace. I did find that when this occurs, I am not experiencing the same issue on the sql server or another server using SSMS. It appears to only be client side.

    This is also happening during very low volume. Guess it's time to lean on the network team. Maybe it's a bad cable or a router being overloaded.

  • Have them put a sniffer on the network regarding those specific machines.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply