I/O timeouts writing to a transaction log - Please Help!

Question

I/O timeouts writing to a transaction log - Please Help!

Ryan-209402

Hall of Fame

Points: 3707
More actions
July 22, 2008 at 1:58 pm

#122428

HELP! I'm seeing an I/O issue on a production server. The error in the SQL log is "SQL Server has encountered 34 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [F:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Tranlogs\proddb_log.LDF] in database [proddb] (26). The OS file handle is 0x0000000000001C90. The offset of the latest long I/O is: 0x00000000097e00"
The Physical Disk counters look like this:
Physical Disk: Avg. Disk Queue Length - average = 55
Physical Disk: Avg. Disk Sec/Read - average = .043
Physical Disk: Avg. Disk Sec/Write - average = .207
Physical Disk: %Disk Time - average = 2660
Physical Disk: Avg. Disk Reads/Sec - average = 235
Physical Disk: Avg. Disk Write/Sec - average = 210
I'm not quite sure where to start, and don't even know whether the counters are normal, but this is causing timeouts for users for a web application which hits this db. Any advice you can offer is greatly appreciated.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

GSquared SSC Guru Points: 260824 More actions · Answer 1

Run a trace (or Profiler, I guess), and find out what query or queries are causing the long waits.

You might also take a look at the growth options on the log file. It might be auto-growing by too small an increment, which might cause something like this.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

murthykalyani SSCarpal Tunnel Points: 4500 More actions · Answer 2

Your disk is being subjected to high stress based on the peformance counters numbers.

Also are the values provided for various counters are for _Total instance, if so start collecting data for individual disks.

Perry Whittle SSC Guru Points: 234013 More actions · Answer 3

how many disks are in the array that the file sits on?

also what is the set up of sql server, where are the data and log files stored?

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" 😉

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 4

Those disk stats don't look good. The sec/write is far too high. Could you plese describe how the disks are set up. How many diska, how many separate arrays, what Raid level, etc.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Ryan-209402 Hall of Fame Points: 3707 More actions · Answer 5

Thanks for everyone's reply. The OS is on a separate mirrored set using the on board controller. The data and logs are on a RAID 5 array on a HP Smart Array 6400 controller, with 8 x 300GB disks. Both data and log drives have over 800GB of free space and are slightly fragmented but not too bad. I will run the perf counters for the individual disks. Also, the transaction log for this db is set to autogrow by 50MB (I've also tried autogrow by 10% and autogrow by 10MB).

I also noticed an error in the HP Integrated Management log stating "Array Accelerator Battery Charge Low" but I don't know if that would cause any performance degradation. Does anyone have any experience with configuring the battery backed write cache on a Smart Array 6400 controller on a server that is a dedicated SQL server? Right now the read/write ratio is set to 50%/50%.

Ryan-209402 Hall of Fame Points: 3707 More actions · Answer 6

Here is some additional info about the RAID controller:

Controller Smart Array 6400 Controller

Bus Interface 64-Bit PCI

Controller Location Slot 10

Array B

Array Type Parallel SCSI

Number of Logical Drives 1

Logical Drive 2

Size 1716586 MB

Fault Tolerance RAID 5

Heads 255

Sectors per Track 63

Cylinders 65535

Stripe Size 64 KB

Status OK

Failed Physical Drives None

Array Accelerator Enabled

Carl Federl One Orange Chip Points: 25384 More actions · Answer 7

When the array accelerator battery charge is low, then the write cache will be temporarily disabled until the batteries either become fully charged or are replaced. The disabling of the write cache will cause major IO slowness.

Hardware support needs to check this out.

SQL = Scarcely Qualifies as a Language

Andy sql SSCrazy Eights Points: 9496 More actions · Answer 8

You have not said how large the Log File currently is; an auto-grow of 50MB may be entirely unsuitable. Also, you may find that internally, the Virtual Log Files are horribly fragmented, as well as having fragmentation of the "physical" Log File itself.

For help with VLFs, see Andrew Calvett

http://www.sqljunkies.com/WebLog/acalvett/default.aspx

and Kimberley L Tripp

http://www.sqlskills.com/blogs/kimberly/PermaLink.aspx?guid=934f3755-5b1d-4572-a386-c6a2a0d14a9e

I would suggest you calculate how large your Log File needs to be, create the Log File to be that size (or a little bigger) and ensure Auto Shrink is switched off. The, if you can, defrag the drive (preferably with the Database engine stopped, otherwise the Log Files will not defrag efficiently).

Andy

Ryan-209402 Hall of Fame Points: 3707 More actions · Answer 9

Thanks for the links. I will check them out as I'm not sure how to calculate how large the tran log needs to be and maintain that size. I didn't have a large enough window to defrag the entire log drive but I did recreate the tran log for this db by backing up and detaching the db, deleting the tran log and reattaching the db. I figure that this would insure a new unfragmented log but it did not resolve the problem. The log is currently about 100MB but I've seen it grow to over 4GB.

Here are two things I've done and so far I have not seen the I/O error return:

1. Apparently the HP SmartArray 6400 controller has a problem with firmware version 2.34 where it does not charge the cache battery. After working with HP I updated the firmware to version 2.84 and supposedly this will resolve the battery issue so that write cacheing will be re-enabled. I'll have to wait a couple days and see if the "Array Accelerator Battery Charge Low" error stops posting. If not, I may need to replace the battery.

2. I also found that the anti-virus software was scanning the logs directory so I've excluded it, at least for now.

Thanks again for all of your replies. I'll have to wait and see if this I/O problem returns. I will also do some research on how to best handle the transaction log size/growth/shrinking etc. Any other advice on that is welcome.

BobAtDBS SSCarpal Tunnel Points: 4591 More actions · Answer 10

Ryan (7/24/2008)
2. I also found that the anti-virus software was scanning the logs directory so I've excluded it, at least for now.

My guess is both were excellent choices, but the anti-virus software is a big bingo. You also really need to have logs and data on separate disk volumes, you are making your disk drives hop back and forth all the time between writing the sequential log file and doing the random data read/writes. Put the log files on the separate OS drive if you have room. Better for performance and reliability.

Student of SQL and Golf, Master of Neither

Andy sql SSCrazy Eights Points: 9496 More actions · Answer 11

Sounds like quite a significant problem with that firmware release!

If your current log file is 100MB, but it has been 4GB, you obviously have Auto Shrink enabled (or it occurs during a Maintenance Plan). Probably not a good idea. You could do worse than read "Top Tips for Effective Database Maintenance", written by Paul S Randal and published in this month's Microsoft TechNet magazine:

http://technet.microsoft.com/en-us/magazine/cc671165(TechNet.10).aspx

Ryan-209402 Hall of Fame Points: 3707 More actions · Answer 12

One more question about the placement of transaction log files. What is the most important?

1. Having the transaction logs, data, OS etc on a separate controller/channel?

2. Having the transaction logs, data, OS etc on separate RAID sets?

Would this configuration be advisable?

C:\ OS on RAID1 set 1 (SmartArray 6400 controller channel 1)

D:\ APPS on RAID5 set 1 (SmartArray 6400 controller channel 1)

E:\ DATA on RAID5 set 1 (SmartArray 6400 controller channel 1)

F:\ LOGS on RAID1 set 2 (SmartArray 6400 controller channel 2)

We have a decent SCSI controller with two channels to work with and I'm trying to determine the best configuration.

BobAtDBS SSCarpal Tunnel Points: 4591 More actions · Answer 13

I don't think separate or same channels on the same card will make much difference, it's getting them on separate drive groups that matters. The more separate drive spindles running, the faster you can get data on and off the physical platters. It's a rare system that is I/O channel bound.

We typically put the OS and logs on one Raid set (Raid 1), temp db on a drive by itself, and the data on a separate Raid 5 set. Since you're trying to store general data as well on this server, I guess I'd put it on the Raid 5 set with the data.

C (raid 1 pair) OS

D (raid 5 set) Data and other files

E (same raid 1 pair as OS) Log files

F (separate single drive) temp db

Student of SQL and Golf, Master of Neither