June 4, 2009 at 4:58 am
Hi friends
Recently the error log shows I/O stall warnings in my production server. This strangely started only Three days before. I have more than 50 warnings in my error log in 3 days. Mostly from tempdb mdf and ldf. The error log says
SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file XXXXXXXXXXXXXXXX\tempdb.mdf] in database [tempdb] (2). The OS file handle is 0x0000000000000868. The offset of the latest long I/O is: 0x000000241a0000
I collected the performance counters for Physical disk average sec/write and average sec/read since we have SAN.
My total average is good (.001 Sec) but at times the sec/write spike went very high ( 14 sec).
I read in some posts by Gail that the these values should be under 10 Milli sec and spikes under 50 Milli sec. Is there any tech net article about these values? Do I have a performance problem?
I am particularly worried since this is the first time I see this in my Prod server.
Thanx in advance
June 4, 2009 at 5:58 am
June 5, 2009 at 9:57 am
Were your physical disk counters for each individual drive?
Spurious IO delays during high activity can be expected and really aren't an issue as long as they are very limited and don't cause problematic timeouts or other issues with your database applications.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
June 8, 2009 at 6:32 am
Thanks a lot.
Sorry for late reply
Were your physical disk counters for each individual drive?
Yes. My counters where for each individual drives. I noticed that the spikes where more frequent to drive containing tempdb. However the io warnings are not seen now after fail over to secondary node.
June 9, 2009 at 10:32 am
What type of hardware are you running ? What type of SAN are you running ?
There are a few combinations that are prone to this.
RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."
June 10, 2009 at 5:58 am
Is the TempdB on its own dedicated drive?
How you run Profiler to determine long running queries? Some of these queries may be using the tempDb especially if temp tables are used or there are lots of sorts.
June 10, 2009 at 6:16 am
Thank you all for your help.
But now the main cause of concern is no more there. I no longer get the IO warnings after a fail over. I think I need to monitor it a little more
June 10, 2009 at 6:36 am
This message means what it says. Some IO took a long time. It doesn't usually indicate a hardware issue in my experience, although it may do sometimes. If it does you'd expect errors in the system event log too.
More usually when it occurs on tempdb it's because an auto expand is happening. You can make this less likely by decreasing the amount tempdb expands by each time. This should always be set in megabytes rather than as a percentage so you know exactly what it will expand by and how long it might take.
If this is what happened the impact will have been that whatever process needed the extra space will have had to wait. Potentially other processes may have waited too.
Providing nothing timed out, the processes will have all completed once the tempdb space became available.
Tim
.
June 10, 2009 at 6:46 am
Thanks Tim.
I was wondering how it just stopped after restart.
My tempdb is pre - sized. I think the messages might have started after it filled the initial size. I will better monitor growth in my tempdb.
June 10, 2009 at 6:50 am
If by pre-sized you mean it doesn't grow I agree with your assessment.
It sounds as though it got full, one or more processes waited, space became available, they carried on.
I also agree with your conclusion, monitor it more regularly and make it bigger if necessary.
Tim
.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply