maximum wait time per I/O

Question

maximum wait time per I/O

passivebyz

SSCommitted

Points: 1908
More actions
June 4, 2010 at 12:30 pm

#145166

Hi,
What is the accepted value for maximum wait time per I/O (in milliseconds) for any database file?
I read in Spot light documentation for the Max I/O wait as below:
A value close to zero indicates that little wait is occurring and a value over 25 indicates that the I/O device is likely to be saturated
But we are getting this value close to 70. Please advice on what to consider to see the Max I/O wait time?
Is maximum wait time per I/O different for each environment or it's Universally same for any environment. i.e the value 25.
Thanks

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Patrick Purviance SSC Eights! Points: 990 More actions · Answer 1

Hey passivebyz,

I've struggled a little myself with the recent change in Spotlight 7.0 from gauging I/O waits based on disk queue lengths to the MAX I/O Wait time per file. I think it's a great new addition to Spotlight and something that will help in better diagnosing I/O bottlenecks that may be less hardware related, but I've always been more akin to using the disk queue length to determine I/O bottlenecks myself, so this is a new, and probably more in depth, focus for me.

Although I can't throw out a more acceptable max I/O wait time for you, I can tell you, that from experience, you should get a feel for end user acceptance of the application performance and go from there.

The 25 millisecond wait time Quest is using is probably right on for most systems, but I wouldn't base hardware decisions on that alone. If you also keep an eye on disk queue lengths, you'll get an idea of whether or not you're specifically disk I/O bound. You'd be looking to see disk queue length of no more than 2.0 per spindle (physical disk) in the associated array. That's 2 requests per spindle max before you start worrying about adding disks.

HTH,

-Patrick

Patrick Purviance, MCDBA

passivebyz SSCommitted Points: 1908 More actions · Answer 2

Hi Patrick,

Our storage is SAN and it's near impossible to get the Avg.Disk Queue length values when the storage is SAN. We have 150 Spindles and I asked the SAN admin and according to him there is not much activity for the SQL server I asked to.

Is Max IO wait time value (25) ,best practice same whether the storage is local or on SAN?

Thanks

Patrick Purviance SSC Eights! Points: 990 More actions · Answer 3

passivebyz,

We run a mixture of SAN and DAS solutions and the Avg Disk Queue Length is not itself dependent upon the number of spindles in the array. It's an OS performance counter and is easy to monitor. Now if you're saying that it is difficult with a SAN (and less than experienced SAN admins) to determine what the acceptable disk queue length should be for an array that's presented to your OS, that I can agree to. But honestly, the SAN admins should be able to document for you what the spindle counts are if you aren't sharing spindles across LUNs. If that's the case (you don't have dedicated spindles and/or dedicated LUNs) then they've got your SQL instances by the throat in the I/O department IMO.....

Early on with the company I'm with now I had to convince the Systems Admins that they didn't have the SANs configured correctly by taking one of the SQL Servers off the network and running a simple rebuild of an index on a 115GB table. Other systems (primarily and Oracle 9i VLDB) froze from the I/O load, thus proving that they had crossed the LUNs with different systems and activity on one could easily bring the others to their knees. After 6 mos or so of reconfigurations, it was finally all separated and dedicated! :w00t:

As far as the MAX I/O rate goes, you want to maintain the same levels of I/O throughput regardless of storage solution, so the 25 is what we've stuck with for the time being in Spotlight. That might be a god send though, as with the Avg Disk Queue Length counter, you have to configure what's acceptable for each server (and Spotlight wouldn't even allow you to configure it at the individual logical disk level so it wasn't always appropriate for every drive 🙁 )

HtH

-Patrick

Patrick Purviance, MCDBA

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 4

Maximum wait time is pretty useless as an IO performance metric. Much better is to use avg disk sec/read and avg disk sec/write for the appropriate physical volumes.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

passivebyz SSCommitted Points: 1908 More actions · Answer 5

Maximum wait time is pretty useless as an IO performance metric. Much better is to use avg disk sec/read and avg disk sec/write for the appropriate physical volumes.

Thank you,

Max I/O is going >25 several times a day. But the avg sec/read & avg sec/write are within the limits.

I expected to see avg sec/read & avg sec/write values from Spot light 7.0 but still it showing reads/sec & writes/sec. Is there way to convert the values of reads/sec & writes/sec to avg sec/read & avg sec/write?

Thanks

Patrick Purviance SSC Eights! Points: 990 More actions · Answer 6

Maximum wait time is pretty useless as an IO performance metric. Much better is to use avg disk sec/read and avg disk sec/write for the appropriate physical volumes.

I don't know that the Max I/O wait time is useless. There are better options, but this high level counter is looking at the longest wait time on a constantly moving target and reporting that you currently have disk latency. The way Spotlight is doing it, however, is that it is also identifying for you which file(s) is(are) affected, which is a bit of a bonus in identifying the root I/O bottleneck(s) (maybe poor code driving your disks crazy unnecessarily, poor indexing, fragmentation, you name it).

I expected to see avg sec/read & avg sec/write values from Spot light 7.0 but still it showing reads/sec & writes/sec. Is there way to convert the values of reads/sec & writes/sec to avg sec/read & avg sec/write?

It's further broken down into "Current Reads Wait Time" and "Current Writes Wait Time" per file inside the I/O by File drill down page in SQL Activity in Spotlight.

Patrick Purviance, MCDBA

Patrick Purviance SSC Eights! Points: 990 More actions · Answer 7

Interesting comments from Quest on this recent change to Spotlight that we got in a webcast invitation to discuss the changes for Spotlight 6.0 and 7.0.

oDisk Queue Length has been replaced by Max I/O Wait: Unfortunately there is still a lot of advice out there pointing folks to disk queue length to identify I/O bottlenecks. I won't say there's no value in that approach, but the advice is meant to be applied in conjunction with other metrics and investigations. To avoid confusion and to improve the data we're presenting we've replaced Disk Queue with Max I/O Wait. Now, SAN, RAID, or otherwise, the I/O information on the instance home page will indicate whether you have an I/O bottleneck or not...and then you can use the:
oNew I/O Diagnostic Drill-down: We added the CPU diagnostic drill-down in version 6.0, and now in 7.0 we've added a similar dashboard to identify I/O bottlenecks. Spotlight has always been great at helping you to identify resource bottlenecks, but these diagnostic drill-downs go a step further by pre-analyzing data for you to show you areas we've already analyzed, a confidence factor relative to the information presented, and data in a single location to identify what you should address to resolve the issue.

Jury's still out here.....

Patrick Purviance, MCDBA

Indianrock SSC-Insane Points: 20333 More actions · Answer 8

I just upgraded to Spotlight 7. Previously I had tried to use custom counters to get the Average Disk Seconds Per Read or Write, but the RAW version of the WMI queries returned huge, unusable values ( billions ) and the Formatted version returned essentialy zeroes.

Now with Spotlight 7 I'm seeing wait times anywhere from 6 to over 500 ( milliseconds I presume), and this on a Netapp disk device.

From Quest Support:

[font="Arial"]In effort to find out info you need for your custom counter, we found:

The “Transfer Time” graph (OS connection | Disks | Logical Disk Activity) represents metric you need (shown in milliseconds; the lower value the better) . It shows how long it is taking for data to be transferred between disk and memory. This graph includes Disk Reads AND Disk Writes summarised.

The Win32_PerfFormattedData_PerfDisk_LogicalDisk WMI class represents formatted and rounded values. Since the AvgDiskSecPerWrite value floats between 0 and 50 ms most of time, query returns 0 and is useless in this case.

The Win32_PerfRawData_PerfDisk_LogicalDisk WMI class represents cumulative values. The AvgDiskSecPerWrite shows number of writes since OS started.

To get number of writes per sec you need to get 2 values and then calculate rate knowing time difference between runs, and it is not possible to implement using custom counters since they allow to run one query only.[/font]