September 21, 2011 at 1:56 pm
I noticed OS disk queue length has hit 95 and frequently more than 60 in the mornings for a SAN drive (H:).
We are using idera diagnostic manager. The metric captured is OS Avg Disk Queue Length Per Disk (Count). I assume OS Avg Disk Queue length per disk in Idera is the same as the perfmon counter PhysicalDisk: Avg. Disk Queue Length.
We asked the storage team to check just to make sure. The admin informed the volume was stripped expanded over 30 spindles for a throughput capacity of 480 Mbps equivalent to 5550 IO/Sec.
He was asking us to check the throughput being demanded.
Also we got messages that the os disk time on the same H drive hit 100%. The Idera Diagnostic manager metric was OS Disk Time Per Disk (%). Interestingly we had couple of occasions where it hit 100 % for the C drive too.
I thought probably it may not be an issue for SAN disk, but disk queue length per disk count is still more than twice the amount of spindles.
So my question in short is should I be concerned, the as far as SAN drive is concerned?
September 21, 2011 at 2:10 pm
Ignore it. It's a useless counter these days (and has been for several years to be honest). There's too much between the server and the drives (especially with a SAN) to sensibly interpret that counter (there's cache and controllers and a switch (or several), HBAs, etc.
The other reason is that SQL can intentionally drive the queue length up during normal operation. It does async IOs so it can issue a couple hundred IOs and then go off and do other work while those IOs process and return.
http://blogs.msdn.com/b/psssql/archive/2007/02/21/sql-server-urban-legends-discussed.aspx
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 21, 2011 at 2:35 pm
Thank you Gail.. the other thing I'd noticed while monitoring as I mentioned above was - OS Disk Time Per Disk (%) for C (and of course the H) drive was at 100%. Is that also something that can be ignored.
And please do recommend handy counters related to I/O that I could setup to be monitored regularly.
September 23, 2011 at 9:35 am
Thank you, okbangas.
So, I started capturing Disc reads/sec and disk writes/sec among few other counters since yesterday night. The frequency I was collecting was every three minutes.
I've seen Disc read/sec at 500 or so frequently. I have also seen on couple of occasions go beyond 1000. I thought the scale must be wrong, this must be ms, not seconds. But I didn't change the scale.
Going back to my San Admins statement, that the volume "has been stripped expanded over 30 spindles for a throughput capacity of 480 Mbps equivalent to 5550 IO/Sec". I am now wondering if this is acceptable. Please correct me if I am wrong.
The writes for the same duration have been under 100. I am planning to add Disk Reads Bytes/sec now.
September 23, 2011 at 9:36 am
Reads and writes/sec aren't much better. What you really need to look at is the latency. sec/read and sec/write
I've seen Disc read/sec at 500 or so frequently. I have also seen on couple of occasions go beyond 1000. I thought the scale must be wrong, this must be ms, not seconds. But I didn't change the scale.
Not at all surprising. A disk that can't do 1000 reads in a second is not going to be a very fast disk. But that counter is dependant on activity. If the server's idle that'll be low, and it's another counter where a 'good value' is extremely hard to determine.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 23, 2011 at 9:44 am
Gail, was that avg sec/read and avg sec/write that you meant? If so would the same >100 ms warning flag apply or is it relative?
September 23, 2011 at 9:49 am
Yes, avg sec/read and avg sec/write and the threshold for these is typically 20ms = good, 50ms = bad, above that = terrible.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 23, 2011 at 10:20 am
Thank You. Not sure why I didn't have it yesterday. Will add them and see.
September 26, 2011 at 2:12 pm
I noticed Idera Diagnostic Manager had Average Disk Milliseconds per write and also per Read,
so I enabled it from there, as we'd have history in our repository. The frequency however is 6 mins.
Also I set the warning at 50 and critical at 100 as I was not sure if I'd get a flood of messages.
(I plan to change them to 25 warning and 50 critical after monitoring them for a bit)
So, I noticed disk milliseconds per write spiked above 100 ms on 5 different occasions alternating between 2 diff SAN drives - hitting once about 220 and once 336.
By the next time the check runs after 6 mins, its back to normal, under 20, in fact single digits.
As these look to be spikes, usually atleast 8 hours apart, also being back to normal by the next check, can these be ignored yet monitored or is that ought to be an alarm?
September 26, 2011 at 2:59 pm
GilaMonster (9/23/2011)
Yes, avg sec/read and avg sec/write and the threshold for these is typically 20ms = good, 50ms = bad, above that = terrible.
Probably worth mentioning that even though you see above 50ms/transfer, this might not be bad at all if you have large I/O requests (bytes per read/write) issued by the read-ahead mechanism.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply