DISK IO when the storage is on SAN

  • Hi,

    We have SQL Server 2005 ee 64 bit with SP3 and the storage is from SAN (RAID 10 drives).

    Many forums says that:

    If the storage is SAN,where it is hard to tell how many physical disks you actually have,

    just go with Avg Disk sec/read for the data disks, and sec/write for the log disks

    SO I'm monitoring these 2 counters values only (skipping Avg.DiskQueue Length) to see whether we have diskIO bottleneck or not. Is it a valid test to know diskIO from these 2 conterers?

    thanks

  • According to some experts, if both the logical Avg Disk Reads/sec and logical Avg Disk Writes/sec are less than 8, the system unlikely experiences I/O contention.

  • Can we safley ignore the high avg disk queue length values, if the storage is SAN and consider only avg.sec/write & avg.sec/read? and if the values for avg.sec/write & avg.sec/read is <10 MS, then IO subsystem is performing well right?

    and can we ignore the high values for avg.sec/write & avg.sec/read, when:

    1. The Volume Shadow Copy service starts.

    2. Running DBCC CHECKDB

    thanks

  • This is a question I have also asked on a few occasions. I prefer not to look at avg disk queue length because you have to divide that result by the number of spindles, meaning in a RAID 10 with 8+8 146GB drives, your result should be divided by 8. At least that's how I understand the math. I'm not completely certain how reads vs. writes affect the formula. I've been given (and have read) differing opinions, but with a write you only count 8 spindles in the above config, because the other 8 spindles (disks) are the mirrored set of disks. However, with a read you are reading both sides of the mirror so does that mean you use 8 or 16. I'm still not clear on that one. I've also been told that read-ahead activity needs to be factored into disk queue results. Queuing could be visible because of read-ahead activity and not because of a disk bottleneck.

    The type of SAN configuration can also lead to incorrectly reported Perfmon results. I've seen one large pool of disks be used for data and one large pool for logs. The data and logs are then carved up logically (as presented to the OS). Avg Disk Read/Sec and Avg Disk Write/Sec were reporting high values for logical drives during a reindexing job. The problem was the reindexing job was being run again drive M and we were seeing values reported for drive M and another data drive not in use. In this case we had to rely on EMC's tools to monitor disk performance. When the SAN was configured to have the physical arrays match the logical drives Perfmon worked like a charm.

    On a SAN the values to look at for Avg Disk Sec/Read and Avg Disk Sec/Write can be a slightly different from vendor to vendor. Here are a couple of good links regarding Perfmon counters.

    http://www.texastoo.com/images/perfmoncounters.pdf

    http://www.grumpyolddba.co.uk/monitoring/Performance%20Counter%20Guidance%20-%20SQL%20Server.htm

    http://www.hds.com/assets/pdf/best-practices-for-microsoft-sql-server-on-hitachi-universal-storage-platform-vm.pdf

    We usually don't get concerned until the values are > 10 milliseconds, and even then we aren't too concerned until we start seeing > 30 at a fairly sustained rate. It all depends upon the system and if people are complaining about performance. We are in the process of moving to a new SAN because our existing SAN cannot support our largest application (over 1TB). It's a combination of cache and disk IO problems. It's common during reindexing to see drives hit 1/3 second to 3/4 second per read. Not good at all. Maintenance activities such as reindexing, especially REBUILD, can drive your counters much higher than normal. DBCC CHECKDB and even Update Statistics can push the values above your normal readings as can database backups, especially if using a third-party product that does a good job with data compression.

    Hope this helps.

    Dave

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply