August 18, 2008 at 11:41 am
I get this error in my SQL Error Log, this is the time when my Litespeed full backup job runs in the night. There is no heavy activity at this time so why a timeout error. Its a SQL 2000 Cluster environment with a DB of size over 300 GB.
Its been happening for a week now, and there wasn't any changes made for this to happen.
"SQL Server has encountered 655 occurrence(s) of IO requests taking longer than 15 seconds to complete on file [M:\SQLData\Derivative_Prod_5_Data.ndf] in database [Derivative_Prod] (5). The OS file handle is 0x00000660. The offset of the latest long IO is: 0x00000b217d0000"
Thanks in Advance!!!
The_SQL_DBA
MCTS
"Quality is never an accident; it is always the result of high intention, sincere effort, intelligent direction and skillful execution; it represents the wise choice of many alternatives."
August 18, 2008 at 1:03 pm
Did you have other jobs running against this server?
Did you backup your data to a tape drive? How many jobs running against this server?
If you answer is YES. The timeout is reasonable. The solution is to reduce the number of jobs running at the same time.
August 18, 2008 at 7:17 pm
There are many a jobs running on this server. But they've been there for quite some time now. Also I noticed one more thing that the Tlog backup(hourly) took 29 hours and still did not complete and the full backup (nightly) took 14 hours and still did not complete. One reason could be bcos the backups steps that have a step name 'wait for other process' so may be they were waiting for the other to end. Its only after I cancelled the Full backup that the Tlog succeeded. So let me try cutting down on the number of jobs.
Thanks!!
The_SQL_DBA
MCTS
"Quality is never an accident; it is always the result of high intention, sincere effort, intelligent direction and skillful execution; it represents the wise choice of many alternatives."
August 19, 2008 at 12:46 am
A single IO (64kb) taking longer than 15 seconds is never acceptable, no matter what may be running. What that warning means if that in the course of 2 minutes (I think), SQL issued 655 IO requests for the secondary data file to the operating system that had not completed 15 seconds later.
Generally this points at an IO bottleneck problem, though it can also be problems in the IO stack (like antiviruses)
Please run a perfmon trace for a few hours and capture the following counters.
Physical disk: Avg sec/read
Physical disk: Avg sec/write
Physical disk: % idle time
Physical disk: Avg disk queue length
What are the min, max and avg values that you see?
What's your disk setup? SAN, direct attached RAID? What levels and what files are on what drives/arrays?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 23, 2008 at 11:40 am
MS SQL Server, I believe it is the same to other database servers, has a limit to run jobs from SQL Agent. If we schedule many jobs starting at the same time, we may get intermittent failures.
Just experience.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply