High avg disk queue length

  • Hi all,

    We have a sql2000 machine running on SAN storage. The logs and DB are on the same SAN drive and we are seeing avg disk queue length exceeding 200 sometimes for long periods of time. At the same time the transfer sec is reporting less the .010. I am not sure how many disks the data is touching on the backend of the SAN but assume its 16 or 32. The DB is around 150GB. Does this sounds like a major issue with the avg disk queue length being a high number but the transfer rate being ok?

     

    Thanks for any iput....

  • If your transfer rate is good (and yours seems to be), then I wouldn't worry unduly. I tend to monitor the CurrentDiskQueue, rather than the AverageDiskQueue, since it gives me an instant reading of how many requests are queued. The AverageQueueLength figure can be skewed if there was a period of high activity in the recent past. It could be that there are periods where your transfer rate is NOT as good as 0.01 secs, and that this coincides with periods of high IO request which you would detect more clearly monitoring the CurrentDiskQueue.

    We've just had issues with our SAN where we had 12 disks allocated, and were seeing CurrentDiskQueue values of between 150 and 250. Allowing approx 2 queued requests per disk, this was clearly way too high, and on top of that the DiskSecs/Read value was averaging 0.5 secs, sometimes being as high as 1.5 secs.

    We discovered a number of config problems with the SAN array (incorrect fiber settings, wrong firmware version etc.), which did improve things noticeably when corrected, but didn't solve the whole problem.

    We reallocated the disks with other aggregates in the SAN array, and ended up with a single aggregate of 36 disks. This has improved both the queue length (no worse than 75 now), but more importantly the read time is down to approx 0.05 seconds.

    You really need to find out how many disks are allocated to your aggregate that the database sits on. If it's only 16, then that would probably account for a queue length of 200, especially if you have queries that do large tablescans. If you've got 32 or more disks available, and you're STILL seeing a queue length of 200, then that is very poor indeed, and you need to confirm that there is no setup problem with your SAN that's making things worse.

  • Philip,

    Can you tell us how the 36 disks are configured?  Do you have several smaller raid 1+0 arrays or 1 big one with 36 disks?

    I ask because I've done some testing and found that I am better off with one big stripe set for both DB and logs than I am with 2 separate ones. 

  • I've not found that many of the perfmon counters give meaningful results with a SAN. No doubt the queue counters indicate something, but I suspect they don't relate the same way as DAS. The counter I've used to prove I have had SAN performance issues are the i/o completion time counters. These I know give meaningful results, you should see less than 6ms for a read or a write - anything above that indicates a problem. Then, as a DBA, it is the uphill struggle to convince people there is a SAN issue - I even had engineers from a very well known SAN Vendor on site assuring me there were no problems with the SAN whilst my OLTP systems ground to a halt. I was right and I got it fixed - but it's really hard work and to be blunt people don't want to hear you saying a SAN is a problem. I'm about to go for SAN storage certification as I'm so fed up with poor SAN performance - I figure once I'm certified ( yeah I know there's a very good joke in there < grin > ) my point will have to be listened to!!

    I suggest the purchase of this book as a starting point http://www.compman.co.uk/scripts/browse.asp?ref=698663

     

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply