I/O requests taking more than 15 seconds

Question

I/O requests taking more than 15 seconds

Viewing 15 posts - 16 through 30 (of 36 total)

You must be logged in to reply to this topic. Login to reply

CrazyMan SSChampion Points: 12560 More actions · Answer 1

Hi Sandra

What disk speeds are on ur SAN, is its 7.5 or 15 K disks??, how many Virtual machines are you running?? are you using VMWare or Windows for Virtual environment.

Robert Klimes SSCoach Points: 17865 More actions · Answer 2

Are you seeing any errors in the Application or system event logs at the same time as the errors in sql server log? I am seeing the same errors in my SQL server log and at the same times I sometimes see a error from vmscsi driver or com+ error from one of the apps. These errors all occur only during high I/O operations like checkdb or reindexing.

The issue may be with virtual hardware/drivers and not physical hardware or configuration.

quick questions:

do you have the vmtools installed on the server?

which VMware software are you using? ESX,ESXi, or server?

Bob
-----------------------------------------------------------------------------
How to post to get the best help[/url]

Hawkeye_DBA SSCarpal Tunnel Points: 4365 More actions · Answer 3

Thanks all for reviewing my perfmon numbers.

I talked with the SAN team some more today. There are no errors to correlate. The network backups run from 6PM to 4AM, which is when many of the I/O errors occur (the spikes). However, the virtual server in question is not being backed up by this network job.

Here is more info on the SAN:

Disk speed (most likely 15K)

Drive type: SASS

20 drives with dual parity, so 18

Separate Volume for this server, 500GB

We are using VMWare, and yes, the vmwaretools are installed on the server.

The vm software we're using is ESC version 3.5 update 3

The underlying configuration is a RAID 6 for this volume

There are no other virtual machines sharing this volume

Unfortunately there are no error messages on either the database or application servers 🙁

Thank you all for your help with my questions. I am trying very hard to uncover this problem because I need to build this server out to support many more databases due to a consolidation project, and I want to ensure we have a stable, and productive environment.

Bless you all!

Sandy

John Rowan SSC Guru Points: 56440 More actions · Answer 4

Gail and Colin have you going down the right path. I'd continue to focus on the I/Os at the SAN level. You say that your DB instance has a dedicated volume, not shared with any other process, but contention happens at the physical disk layer, not logical volume. Many SAN admins don't understand that database files should have their own physical disks. They think that the SAN's caching abilities make up for striping their LUNs accross physical disks and then sharing those physical disks with other applications.

If indeed your DB has dedicated physical disks, maybe check at the disk controller level. I had an experience where our client was using a SAN and they were experiencing high queu lengths and I/O wait times. I checked out the specs on thier SAN drives and found that they were more than enough to handle the I/O throughput that my DB was pushing to it. It ended up being that the disk controller was being overloaded by a combination of my DB's traffic and traffic from other applications. While each app had thier own disks, the disks shared the same controller. Worth noting.

John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

Hawkeye_DBA SSCarpal Tunnel Points: 4365 More actions · Answer 5

Thanks John, I will follow up on that question. I have to laugh a little here because you are right about the dedicated disks 🙂

More to come - and of course, Gail and Colin I REALLY appreciate your help!

David Benoit SSC-Dedicated Points: 34562 More actions · Answer 6

Hence my question earlier about the volume group (set of disks) being dedicated. Hopefully that is the configuration. If not you get phantom IO issues to chase down and the fun of trying to correlate that between multiple instances, etc.

Sandy - it is great watching you dig through this as the posts continue. 🙂

David

@SQLTentmaker

“He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 7

Sandra Skaar (1/15/2009)
I talked with the SAN team some more today. There are no errors to correlate. The network backups run from 6PM to 4AM, which is when many of the I/O errors occur (the spikes). However, the virtual server in question is not being backed up by this network job.

Perhaps not, but that the backups and the IO errors correlate seems to indicate that something along the IO path is shared with servers that are backed up.

HBAs, fibre, switch (I've seen that one before), SAN controllers, cache, the disks themselves

You're going to need to sit with the SAN admins and ensure that the disks, the controllers, the fibre and the switches are dedicated (not shared). As long as you're sharing some (any) part of the IO path with another server, you risk having unexpected and unpredictable slowdowns.

As a friend is fond of saying, "There's no such thing as magic SAN dust". A SAN has to be configured and laid out with the same (or more) care as direct attached storage.

(http://www.sqldownunder.com/SDU34FullShow.mp3)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

colin.Leversuch-Roberts SSC Guru Points: 52643 More actions · Answer 8

You also mentioned raid 6, I don't think you could possibly have a worse raid level for writes; 2 parity disks; you might as well use a floppy and I'd still expect it to be quicker!

Transation logs must be on raid1/10 not on raid 5 or 6. Sadly my latest blog post on san contention didn't format very well - such are the joys of blogging! However I've finished trying to benchmark a SAN but essentially have failed becuase despite the hype it's pretty obvious from the wildly differing run times from the tests there are problems. One point with a fibre channel network is that the backups tend to be pulled back across the same network and switches as you're using from sql server and this will show as increased latency. I'd say the figures you have clearly show why you shouldn't use raid 6! Essentially if your log writes are slow/erratic in performance/latency then your sql server will suffer. this is my blog post on my web site where it stays almost correctly formatted - don't you just love html! http://www.grumpyolddba.co.uk/infrastructure/TrackingSANContention.htm

I'm hoping to summarize my san testing on my blog this weekend.

[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/

noeld SSC Guru Points: 96590 More actions · Answer 9

.... you might as well use a floppy and I'd still expect it to be quicker!

Now that's FUNNY :D:D:D:D

* Noel

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 10

RAID 6?! I missed that.

That, by itself, is going to give you horrendous write times, especially for a tran log

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

colin.Leversuch-Roberts SSC Guru Points: 52643 More actions · Answer 11

only if you haven't been using the san I've been testing!!!!

[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/

noeld SSC Guru Points: 96590 More actions · Answer 12

I have to admit that the SAN I have worked with in the past were managed by people really trained for it and they were TOP Vendors (HP, HITACHI).

* Noel

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 13

colin Leversuch-Roberts (1/16/2009)
only if you haven't been using the san I've been testing!!!!

There are worse ways to kill a SAN's performance than RAID 6.

Share the OLTP system's fibre switch with a datawarehouse

Slice the disks so that SQL's log is sharing drives with the Exchange server

Misconfigure the HBAs so that they're running 1/4 the speed they're capable of

Do synchronous mirroring of the SAN across a 128kb line to a secondary data centre 17 km away (minimum latency 200ms)

noeld (1/16/2009)
I have to admit that the SAN I have worked with in the past were managed by people really trained for it and they were TOP Vendors (HP, HITACHI).

You're lucky. I've worked with a couple 'storage engineers' who didn't really know what they were doing, but knew that they knew more than anyone else. Net result, the list above

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

David Benoit SSC-Dedicated Points: 34562 More actions · Answer 14

The link Colin provided above didn't work for me but this one did. Thanks for sharing - looking forward to the read.

http://sqlblogcasts.com/blogs/grumpyolddba/archive/2009/01/14/tracking-contention-on-the-san-testing-times.aspx

David

@SQLTentmaker

“He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

colin.Leversuch-Roberts SSC Guru Points: 52643 More actions · Answer 15

try dropping from this menu

http://www.grumpyolddba.co.uk/Infrastructure/Infrastructure.htm

it has details of the tests I've been using. Sadly I cannot name the san I've been testing. shared luns are a problem, however many sans now are using virtual luns which don't map to drives so every lun technically shares.

[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/