Please Help! - Excessive Disk Queue

  • My system experienced a most unusual and troubling problem. The application which accesses this system suddenly started producing timeout errors on any write operation. I checked the performance counters and the average disk queue length was being pegged. I checked further yet and discovered that indeed sql server was producing an excessive number of IO writes. I ran numerous DBCC statements including SHOW_STATISTICS and CHECKDB and could not find anything unusual. I checked indexes for fragmentation and looked at the system itself for signs of hardware failure or disk corruption. I also tried the standard Microsoft solution of restarting the server- nothing worked. There were no deadlocks, no users running unusually long queries, no scheduled jobs were hung, and no checkpoint operations were executing. I looked at every possible log- SQL Server, SQL Agent, Event- and found no trace of a problem. I ran traces on every set of events I could think might be relevant. The weirdest part is that after about an hour of this problem, it simply stopped. The problem is gone and I have made no changes to the operating system, server settings or the front-end application. Better yet, other than the performance logging that is in place and some irritated users, there is not even any trace that this ever happened.

    This system has run without a hitch for over a year- nothing like this has ever occurred. I perform regular backups and update the statistics, I monitor this system regularly and it has been extremely reliable. We have made no major application changes either on the front end or the back end recently. While I recover from the near-heart attack of this incident, I was wondering if anyone could provide some kind of insight as to the cause of this problem.

    Thanks.

  • Yes, auto shrink is turned on but hasn't caused a problem in the past. Is there some know issue with this?

  • Thanks, that at least gives me another trail to follow. Incidentally, the trans logs and the data files are on different physical disks so I would rule that out as a possibility.

  • The problem is you don't see happening again and it might not. I had this happen to me on at least five occasions and am looking for the root cause. But one thing of note each time is that when I run sp_who spid 5 is running and it is performing BACKUP LOG on tempdb. I have nothing that does this and even rebooting the system every morning for about two months it happend after a reboot so I ruled out need for periodic reboots. When this happens the only thing that still runs is sp_who. I have also turned on the black box trace which you can find out about in SQL BOL to capture events before I do a shutdown and reboot to see if anything specific shows up.

    Not that this is it but gives you some place to look and a bit on a way to monitor if becomes infrequent in occurance. If anyone else has seen my issue thou, please let me know. I haven't started a thread yet but I may do so if the answer doesn't hit me and I get fed up.

    "Don't roll your eyes at me. I will tape them in place." (Teacher on Boston Public)

  • Are you running on 7.0 or 2000?

    If 7.0, what service pack?

    And if it's SP2 or less, are you running on a multi-processor box?

    We've seen some really grim problems with 7.0 SP2 on a standard Compaq 8000 dual processor box. There's a problem in the SQL Server 7.0 DB engine which only occurs when two or more certain types of query, both of which can be executed on parallel processors, are run simultaneously. Basically, one of the threads gets blocked whilst waiting for a 'completed' message from one of the other threads. Upshot was that queries which take 20 seconds on a box with 2 processors, but configured *NOT* to use parallel query processing, take 5m20s on the *same* machine with multi-proccesing enabled. Only happens occasionally, usually when producing complex management reports on more than a single client PC, when we see users being timed out and queries taking minutes to complete, if at all. Runs fine the rest of the time.

    Worth trying an upgrade to SQL Server 7.0 SP3 or SP4 if this is your scenario, alternatively, turn off parallel processing of queries using sp_configure 'max degree of parallelism', 1 until you get chance to install the SP, so that queries only get executed on one processor.

    Edited by - jonreade on 09/12/2002 04:46:06 AM


    Jon

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply