SQL 2000 server locking up

  • Hi,

    We have a serious problem with our SQL server 2000.

    Any help with this will be greatly apreciated, I am totally stumped! :crazy:

    Every few days, it locks up completely. Nothing can connect to it, not even the profiler.

    If left alone it eventualy recovers after 3 hours, but during work hours we have to stop and start the SQL service to get it running again.

    During this process the errorlog fills up with about 300 to 500 MEGS of the following type of line.

    2009-07-18 14:36:47.75 spid4 Process ID 911:89 owns resources that are blocking processes on Scheduler 0.

    2009-07-18 14:36:47.75 spid4 Process ID 898:97 owns resources that are blocking processes on Scheduler 8.

    2009-07-18 14:36:47.75 spid4 Process ID 911:84 owns resources that are blocking processes on Scheduler 11.

    and so on...

    Also, the Windows transaction log fills up with these messages:

    1229 :

    Process ID 890:113 owns resources that are blocking processes on Scheduler 4.

    Strangely the SQLServer CPU usage during this time is very low.

    There seems to be no pattern to it happening. It has happened during busy day times, and on weekends when there is very low activity.

    What we have tried so far:

    I checked to see if there are any agent jobs running at the time of the lockup, and could not find any.

    I stopped very intensive jobs from running automatically, and run them manually at low usage times. This did not solve the problem.

    Our DBA consultant firm suggested it could be the "auto-shrink" option on the databases. We turned it off.

    Our DBA consultants are as stumped as I am about this.

    Some extra information:

    The event viewer shows this under security about 1 minute before the block started:

    A new process has been created:

    New Process ID:5080

    Image File Name:C:\Program Files\Microsoft SQL Server\80\COM\sqldumper.exe

    Creator Process ID:3608

    We implemented the usage of indexed views not too long before this started but I cannot see any reason this would matter.

    The server has about 90 databases, each with a small dedicated application connection pool. In total there are about 1000 to 1500 active connections at any given time.

    The database sizes range from a few hundred megabytes to +/- 20 gigabytes. The average is about 4 gigabytes.

    DB: Microsoft SQL enterprise server 2000 SP3.

    OS: Windows 2003 server SP1.

    Server: 16G RAM, 4x Processors, 1T Raid 5 split to 4 logical H/D's.

    Thanks

    Roman

  • what wait types do you see when this happens? This could point you in the direction of where the bottlleneck is.

    sounds ums scheduler related so would be a good idea to upgrade to SP4 as changes in this service pack produce more info on this type of problem. If you were to contact microsoft this would probably be the first thing they ask you to do. You may very well need their help to resolve this.

    I think you will find dump files being produced in your log directory each time this happens from what you say in your post.

    do you have hyperthreading enabled, could be worth testing without it if you do.

    ---------------------------------------------------------------------

  • Thanks George,

    I will get SP4 installed and take it from there.

    You are also correct, the system is generating dump files, but I do not know how to analyze them.

    I know SQL server pretty well, but not at a DBA level as I am more a coder than an administrator.

    I looked up wait types and understand how this can help me, but I am unsure how to test for them. When the problem is going on I cannot connect to the database, and I have not seen anything related to wait types in the logs.

    Thanks

    Roman

  • Unfortunately if the SQL instance is unresponsive you will not be able to connect (as you have found out) and in SQL 2000 there is no way round this. About all you could do is run perfmon see if that highlights anything.

    IF you can catch it early enough you may be able to get in before all the schedulers are hung and run some queries, wait types are held in sysprocesses in master.

    I recommend contacting MS on this one, they will want to see those dump files and possible a sqldiag output (thats a command line utility described in BOL, perfectly safe to use).

    Is dematerialising the views an option?

    good luck

    ---------------------------------------------------------------------

  • Thanks, I will get our guys to contact Microsoft for this.

    dematerialising the views is possible as a last resort, but I would prefer that we didnt as it would have serious performance implications on some processes in the larger databases.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply