January 11, 2012 at 12:07 pm
We keep having instances where our SQL Server service freezes. Inbound connection and query attempts time out. The service itself is still in a "Started" state, but will not respond to stop commands. Only killing the process, or retrying the stop command several times will end the service.
The SQL logs and application logs haven't really provided me with any useful information. I'm not 100% sure I know what I'm looking for, but nothing sticks out.
There are TRC files that are generated (I'm not sure by what), but there's no TextData field, so I can't see what's actually running. There's usually a database and occasionally an object name (usually a table), but it's not really enough information to figure out what's really causing the problem. I have noticed some patterns: the same database is usually the last or near last database in almost every instance. The database is used exclusively for integrations through a legacy intergration system (not SSIS). I also seem to be noticing a bunch of "Hash warnings" and "Recursion" as the subclass. It follows quite obviously that recursion (esp. infinite or deep recursion) could definitely cause the server to freeze. However, I can't find the root cause of the recursion, nor can I be 100% sure that's the problem. It's not always the integration database that triggers the recursion errors, and there's really no opportunity for the integration database to cause recursion in the first place.
There's a lot of other things that run against this database. It's pretty heavily utilized. There is a linked server against and Oracle database that I initially assumed was the culprit. I haven't been able to substantiate this, however. There's a report server with hundreds of reports running against both Oracle and MSSQL (and often both in the same report), as well as a report builder model with some 50+ reports. There are countless agent jobs largely focused around report delivery, integration tasks, and database maintainance. The freezes do not coincide with any scheduled maintainance task, but it's hard to tell it it coincides with other tasks (some run as often as every 5 minutes).
These instances largely happen in the early morning (6-7am), but they do occur with some frequency at other times of the day. They seem to be happening more and more lately, so much so that it occurred 4 times on the last working day of December, and 3 times yesterday (all during business hours). Before, it wasn't much of a problem, but 4 times a day is unheard of, consumes a lot of my time, and significantly annoys our users.
I feel like I'm chasing a needle in a haystack. Does anyone know how I can find more detailed information about what is happening?
January 11, 2012 at 2:03 pm
Oracle linked server? Hmmm...
How is the provider loaded? In process or out of process? In process load could definitely crash the service if the provider leaks memory.
Even if regular connections are "frozen", you should still be able to connect with DAC, which has a dedicated worker thread. Have you tried investigating things from there?
-- Gianluca Sartori
January 11, 2012 at 2:03 pm
- the trc files you see are the ones generated because the default trace is active (+_ 20MB each, wright ?) I presume.
( please let it be active. It can help out with some kinds of root cause analysis ! )
- did you configure tempdb / reportserver_tempdb large enough to handle your requests ?
- Adam Machanics WhoIsActive procedure can help you very good trying to figure out who/what is causing real time blocking, .. http://sqlblog.com/blogs/adam_machanic/archive/2011/04/27/who-is-active-v11-00-a-month-of-activity-monitoring-part-27-of-30.aspx
- I would start a trace at the time you expect problems to occur. Don't make this trace to heavy because you know your server is already suffering. Let it run for a reference period of time.
and work from there on.
- With linked servers, that may indeed be a pitfall, but that also needs to be investigated.
- check your databases for missing indexes, double check your objects are well maintained so they are in optimal shape to be consumed.
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply