Disk IO Performance Degredation after Server Upgrade

  • I found another issue. A new Junior DBA reset all of my SQL Diagnostic Manager alert thresholds to factory defaults. That doesn't mean there is an issue, bur it's not as large as it first appeared. We definately have and IO bottleneck somewhere, but reseting of the thresholds sure didn't help. :angry:

    wait stats to come. so many to choose from. 😀

    *** added waitstats as text file ***


    Greg Roberts

  • further up in your post some one gave you some script to capture them....I would grab them to start....looks like they covered what you needed

  • I have run the queries suggested by Syed Jahanzaib Bin hassan and attached the results as a text file. Some of the columns have been removed for security purposes.


    Greg Roberts

  • you dont have a memory issue, plenty there lPage LIFE ex. the...it looks like I would focus on the LANCELOT

    I would check the LUN this on being run on. may best suggestion is to take the File and put it on its own LUN raid 10 if possible

    only other suggestion is run perf mon for 24 hrs and grab disk counters and Page Life exp..make sure the avg are ok, I have problems in the past where there were spike during the day due extra activity...that is if you have not done so already

    I dont see TEMPDB in the read out..hopefully its on its own LUN as well

    here is small explanation on the IO stalls from the Microsoft with paper "PERFORMANCE TUNING WAITS and Queues"

    If IO_stalls is inordinately high for one or more files, it is possible that there is either a disk bottleneck or that high reads and writes are occurring on one drive. Average IO Waits per read or write can distinguish between consistently high IO queues or a temporary IO spike. A significantly higher average value for IO stalls on one particular drive indicates consistently high IO requests. This should be corroborated with Performance Monitor counters Physical Disk: Average Disk Seconds/Read and Average Disk Seconds/Write. The following script can also compute the Average Disk Seconds/Read and Average Disk Seconds/Write using sys.dm_io_virtual_file_stats.

  • Disk Delays are not acceptable

    Maximum Disk delays should not be more than 30 ms,if more than 30 ms then you have to check your SAN storage settings RAID Groups and Read\Write options

    How many Filegroups you have and how many files in each Filegroup ?

    I checked you are receiving ASYNC_NETWORK_IO,its mean server is waiting for the client network TDS packet processesing, (NIC is 1Gb ? Cat5 or Cat 6 ? enable jumbo packets on NIC if switches and NIc support) increase the network packet size of SQL server visit my recent blog regarding this issue

    http://aureus-salah.com/2011/06/14/sql-server-async_network_io-wait-type/

    you have total 24 Core and 6 hyperthreads , what is the degree of parallelism(DOP) value ?

    is your Server is Hardware NUMA aware ?

    what is the biggest table in these both databases ?

    Did you check the fragmentation level of indexes ?

    Did you run Shrink Database before go to the new Hardware on these databases ?

    There is no need to add more memory

    Regards,
    Syed Jahanzaib Bin Hassan
    BSCS | MCTS | MCITP | OCA | OCP | OCE | SCJP | IBMCDBA

    My Blog
    www.aureus-salah.com

  • SYed,

    is correct, your reads appear to be to high. I would run the following, will tell what drives the files reside on. Also run you perfmon and focus on reads on the disk, that way you have better numbers over a 24 hr period. The numbers you are seeing as I am sure you know are from the last restart.

    and again then move the highest usage file to it own LUN...hopefully your SAN guy can put it RAID 10

    select db_name(mf.database_id) as database_name, mf.physical_name,

    left(mf.physical_name, 1) as drive_letter,

    vfs.num_of_writes, vfs.num_of_bytes_written, vfs.io_stall_write_ms,

    mf.type_desc, vfs.num_of_reads, vfs.num_of_bytes_read, vfs.io_stall_read_ms,

    vfs.io_stall, vfs.size_on_disk_bytes

    from sys.master_files mf

    join sys.dm_io_virtual_file_stats(NULL, NULL) vfs

    on mf.database_id=vfs.database_id and mf.file_id=vfs.file_id

    order by vfs.num_of_bytes_written desc

    good luck give us update after you SAN guys make some adjustment..

  • Thanks for all of the feedback everyone.

    All of the files are on there own LUNs. In most cases RAID 10 was not an option. The files are jsut too big, and we'd run out of disk space on the SAN quickly. Unfortunately there isn't the budget to expand the SAN at this time.

    TempDB is also on it's own LUN.

    I'm begining to suspect it is n iSCSI issue. There is a 4 NIC team but I never see more than 2 NICs with any utilization. 1 Receive / 1 Transmit / 2 idle. The good news is all of the iSCSI NICs are 1 Gb. The bad news is Jumbo Frames are not enabled, and we do not have the switch hardware to support it. Again I have a budget freeze, so do not really have any options. 🙁


    Greg Roberts

  • Those ISCSI SANs need to be configured carefully. Each LUN should have multiple channels configured. Also, I have seen some types of hardware not work right with some other brand of SAN. Maybe a driver update is in order.

    Wish I could be more help but the setup is very specific to the server and SAN you are using.

    The probability of survival is inversely proportional to the angle of arrival.

  • Are you using the Microsoft iSCSI initiator?

    IIRC MIcrosoft do not support or recommend teamed NICs for iSCSI,

    You should be using the Multi Path IO component across multiple NICs for redundancy

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Perry Whittle (6/23/2011)


    Are you using the Microsoft iSCSI initiator?

    IIRC MIcrosoft do not support or recommend teamed NICs for iSCSI,

    You should be using the Multi Path IO component across multiple NICs for redundancy

    Thanks, I'll pass that on to the NetOps team.


    Greg Roberts

  • Greg Roberts-134214 (6/23/2011)


    Thanks for all of the feedback everyone.

    All of the files are on there own LUNs.

    Ask the SAN guys to map the LUNS down to individual spindles; and for every one of those spindles, map every LUN on it. See how many spindles each has... and what each spindle is sharing our to other LUNs. Particularly given the rest of your budget, I'd suspect you're sharing spindles with other platforms, or perhaps among your own LUNs.

    Scrape up some local storage, perhaps? If you can, investigate getting to a point where you can enable compression; it sounds like you have lots of CPU and not so much IO, so trade one for the other.

    And see if you can budget for 10Gb NICs next year for your DB and the SAN, even if you have to operate them in crossover mode (i.e. without a switch in the middle) because a switch is 'too expensive'; that'll take care of the "teaming isn't working right" issue.

    And run SQLIO! Preferably on the LUNs one at a time, and then all at the same time (to see if there's contention over and above the bandwidth limits you have)

  • Greg Roberts-134214 (6/24/2011)


    Perry Whittle (6/23/2011)


    Are you using the Microsoft iSCSI initiator?

    IIRC MIcrosoft do not support or recommend teamed NICs for iSCSI,

    You should be using the Multi Path IO component across multiple NICs for redundancy

    Thanks, I'll pass that on to the NetOps team.

    also remember that teaming is not recommended on the dedicated heartbeat to, although if you're using Windows 2008 a dedicated heartbeat is no longer required anyway.

    It's important to allow the redundancy and performance at the iSCSI level via multiple paths and not through teamed NICs

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • also remember that teaming is not recommended on the dedicated heartbeat to, although if you're using Windows 2008 a dedicated heartbeat is no longer required anyway.

    We're not running a Windows cluster so I'm not sure I follow your reference to a dedicated heartbeat. The SAN is has it's own clustering. To expand the SAN we have to add head-units, not just shelves of disks like you can with a NetAPP or EMC. (Both of which i've used and miss 🙁 )


    Greg Roberts

  • We have a call with HP SAN Support tomorrow, I'll update with any new information we get. Thanks everyone.


    Greg Roberts

  • I see you have been going back and forth for days on this issue. You REALLY should have gotten in a professional consultant initially. This type of problem is not what forums are really good at helping you with.

    BTW, did you by any chance upgrade your sql server version at the same time you did this hardware upgrade?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 15 posts - 16 through 30 (of 35 total)

You must be logged in to reply to this topic. Login to reply