Ugh SAN groundhog day

  • For the past weeks been dealing with disk subsystem nightmare. Started about 2.5 weeks ago. My Idera started sending me alerts that disk queues were over 1000MS for my database drives. Database is about 1.5 terabytes, 600 users. EMC looked at it (still looking), all of a sudden after applying some window patches last Wednesday things got better no warnings for 2 days. Today warnings showed up again, and users complaining and timeouts. pagelatch xo very high and all MS articles point to issues with disk subsystem. PER MS If you see significant PAGEIOLATCH waits it means that SQL Server is waiting on the I/O subsystem. While a certain amount of PAGEIOLATCH waits is expected and normal behavior, if the average PAGEIOLATCH wait times are consistently above 10 milliseconds (ms) you should investigate why the I/O subsystem is under pressure. The confounding thing about this is same amount of users, same database, no application changes, only change this weekend was weekly index rebuild which runs fine (last time this started up was on Wednesday). Anyone run into this nightmare before. I have full rights on SQL Server, not on the SAN.

  • Any chance that one of the paths in the SAN fabric went offline ? How about a problem NIC for the SAN fabric? Check your event logs for NIC hardware errors or see what you can find in the way of packet error statistics, as these might indicate a network issue. Also might be a fiber cable in the SAN fabric got pinched and isn't transmitting at full speed.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • trying to stay PC with rest of IT, but EMC did take a peek and saw numbers they could not believe. Disk queue spikes over 10000 MS (MS says over 20 is bad) waiting for more input from them

  • tcronin 95651 (7/8/2015)


    trying to stay PC with rest of IT, but EMC did take a peek and saw numbers they could not believe. Disk queue spikes over 10000 MS (MS says over 20 is bad) waiting for more input from them

    Standard vendor management practices include not waiting to gather your own data, such as event logs, perfmon data (if warranted), network stats, and such. Always know as much as you can so that the vendor's story isn't the only one being told.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • the spikes I see are from my Idera and verified with perfmon. Had a consultant help sysadmins setup SAN, looks like we may have hired wrong guy. Been a DBA for 20 years, have never seen numbers like this. Most of my peers who run Idera say they get worried if they see spikes over 20 for a regular basis, right now that would be an improvement for me. I actually had spikes over 100,000

  • tcronin 95651 (7/8/2015)


    the spikes I see are from my Idera and verified with perfmon. Had a consultant help sysadmins setup SAN, looks like we may have hired wrong guy. Been a DBA for 20 years, have never seen numbers like this. Most of my peers who run Idera say they get worried if they see spikes over 20 for a regular basis, right now that would be an improvement for me. I actually had spikes over 100,000

    Wondering if there might be a SAN mis-configuration that allows a link slow-down or some other such thing. You didn't mention if you found any offline paths in the SAN fabric, so if that's not the case, the only things that make much sense are mis-config, pinched cable, or flaky fibre network card.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • made some SAN settings change looks better today. DO have a question I was going to post anyhow. I have generally set my windows (on win 2008 enterprise) swap file per windows setting. SAN guy told me he heard we need the swap file to be 1.5 times the size of memory (that would be 192 gig on this server), that seems insane, I could see in 7.0 days and before but not now. Anyone else doing this?

  • tcronin 95651 (7/10/2015)


    made some SAN settings change looks better today. DO have a question I was going to post anyhow. I have generally set my windows (on win 2008 enterprise) swap file per windows setting. SAN guy told me he heard we need the swap file to be 1.5 times the size of memory (that would be 192 gig on this server), that seems insane, I could see in 7.0 days and before but not now. Anyone else doing this?

    I just don't know anyone that runs their server OS drive on the SAN instead of on a pair of local SSDs that are mirrored. Use a 1 TB SSD and you have plenty of room for your swap file. I'm pretty sure you don't wan't page file swaps going across the SAN fabric. As to size for your paging file, I know some folks that use a formula of 2x RAM plus some, but I can't come up with a reason to do much more than just exceed RAM size, although it may be I just haven't encountered it yet. I also can't come up with a good reason to subject your OS paging traffic to the I/O slowdown associated with using the SAN fabric.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply