IO request taking longer than 15 seconds

  • I recently adjusted the tempdb from the installed default of one file to the number of cpus. So I increased the number of data files to 8 total (4 cpu dual core). Since doing that change I get the following message in my logs for the tempdb.

    SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [X:\Data\tempdev1.ndf] in database [tempdb] (2). The OS file handle is 0x0000000000000940. The offset of the latest long I/O is: 0x000000073e0000

    Why would I start receiving that message with the change? Should I change it back? I found various articles about the cause of this message, but I don't understand why it occurred with this change.

    Anyone have suggestions on what to do?

  • may be something is wrong with your X: drive, is that the only phisical drive reporting the issue?

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Its part of the SAN (could be the issue). All the tempdb files are on that set of drives. They were there when it was one file as well when no issues existed.

  • Just so I can understand better, are you saying that all the tempdb files that you created / added are on the same volume group? If so, then what was the purpose for splitting them?

    Have you looked at the information from sys.dm_io_virtual_file_stats to see what the IO stall information looks like? Curious.

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • Yes they are on the same volume group. But I've been told that this is not the same as when you have a normal scsi style drive system because the SAN is split across numerous drives and therefore is unlikely hitting the same physical disks.

    What information should I look at in the IO management view, just thee stall?

    Here is the results for each file

    48890107

    76204

    49464938

    49073041

    48767264

    48747981

    48817171

    48687122

    48782405

  • I am by no means a SAN engineer. What you are stating in that the IO will be spread across all the disks is true however, that should be true regardless of having 1 file vs many files for that database. It would be one thing if you put one file on VG (Volume Group) 1 and another file on VG2, etc so that you were truly splitting IO across multiple volume groups but in this case with all the files on the one VG they are still working off the same number of spindles so, IO should be the same. Someone else correct me if I am wrong please.

    As for the output from the query, I would look at all the stall information, io_stall_read_ms, io_stall_write_ms and io_stall. You will have to take snapshots at intervals and compare the deltas as the results are cumulative from instance startup.

    Curious, what type of SAN storage are you using (vendor)? Any idea what the drive configurations are, i.e. RAID configuration, stripe size, number of drives in the VG, how big are the drives?

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply