SQL error in error logs

Question

SQL error in error logs

Imke Cronje

SSCarpal Tunnel

Points: 4826
More actions
March 25, 2009 at 1:07 am

#132500

Hi Guys,
I have encountered the following error in our error logs:
SQL Server has encountered 1 occurrence(s) of I/O requests taking longer
than 15 seconds to complete on file [datafile.mdf] in database [DBName] (68).
The OS file handle is 0x0000000000001BD8.
The offset of the latest long I/O is: 0x00000025a8a000
Can anyone pls assist with this? I have been on one of the microsoft pages and they have suggested that I disable the trace flag(830). I have done that but the error still appears.
Regards
IC

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply

Eswin SSCrazy Eights Points: 8320 More actions · Answer 1

The 15-second threshold is currently not adjustable. While not recommended, stalled and stuck I/O detection can be completely disabled with trace flag 830. Set the startup parameter –T830 to disable stalled/stuck I/O detection during SQL Server startup. Use dbcc traceon(830, -1) to disable detection for a currently running SQL Server instance. Dbcc traceon is only effective until SQL Server is restarted.

Note that given I/O request that becomes stalled or stuck is only reported one time. If the message reports 10 I/Os as stalled, these 10 reports will not occur again. If the next message reports 15 I/Os stalled, it indicates that 15 new I/O requests have become stalled.

Tanx 😀

Anonymous SSC-Dedicated Points: 32891 More actions · Answer 2

I have a SQL Server that is experiencing this as well. It appears that SQL statements send a request to the db and after 15 seconds are not getting a reply so you get this message. I opened a case with Microsoft and they had me run PSSDIAG to gather a bunch of info while this is happening. They said that the back end storage is the problem, however our storage guy states that all I/O's are getting processed in less than 15ms as they should. We rebooted the cluster and the problem went away for a month and has now returned. I am at a loss as to what is wrong. It mostly happens when I run DBCC Check Intergrity check jobs against our production dbs. But, sometimes get the IO stuck with normal business day work too.

Mohit K. Gupta SSChampion Points: 12130 More actions · Answer 3

I am performace tuning a SQL Server with this issue right now. This is caused by bottle neck in I/O as Microsoft said. Problem is it Disk guys are not going to see it if they look at averages over a day; this kind of activity is triggered from following issues ..

1) Bad indexes, causing excessive Disk activity.

2) The database is putting requiring more throughput then the I/O the disk can handle.

3) Monitor following counters:

- Avg Disk Read Queue Length

- Avg Disk Write Queue Length

- % Idle Time

- Disk Read Bytes/Sec

- Disk Write Bytes/Sec

I think you will notice interseting patterns as I did; so my first step has been looking at the database in question. This database is not big it is only 10GB and doesn't have the throughput that can max out a NAS. Database have about 250 tables, with biggest table holding about 7Million rows; no reason to get this. However the entire database did not have a SINGLE clustered index and the non-clustered indexes it had had not been re-organized or re-built in a long time.

After just rebuilding all indexes and building cluster indexes on tables with more then 5K records the error has gone away for me. I will be futher tuning the system; but for now I have not seen the repeat of the issue ...

Hope that helps a bit :).

Thanks.

[font="Arial"]---

Mohit K. Gupta, MCITP: Database Administrator (2005), My Blog, Twitter: @SQLCAN[/url].
Microsoft FTE - SQL Server PFE

* Some time its the search that counts, not the finding...
* I didn't think so, but if I was wrong, I was wrong. I'd rather do something, and make a mistake than be frightened and be doing nothing. :smooooth:[/font]

How to ask for help .. Read Best Practices here[/url].

Anonymous SSC-Dedicated Points: 32891 More actions · Answer 4

I hear you on what you say. I have been running profiler on this SQL Server for over a year and I have tuned the crap out of the SQL Statements that run. I have added alot of tuning indexes and just last week when we had an outage I rebuilt all of the indexes and updated stats 100% too. In December we got a new environment, Win2003 server, upgraded to SP4 for SQL Server, new back end storage everything. Only now is the issue showing up, it did not exist on the old hardware which was half the CPU power and back end storage was way slower on the old environment. The main time I see these IO stuck warnings is when I run the DBCC CHECKDBs at 4 AM when there is very little going on on this SQL Server. That is another reason why I don't think it is bad SQL, untuned SQL or heavily fragmented table/indexes.

Mohit K. Gupta SSChampion Points: 12130 More actions · Answer 5

Right DBCC does put heavy load on the server; I would not be concerned in your system. Unless that starts happening in during business hours.

How is your tempdb setup?

Mohit.

[font="Arial"]---

Mohit K. Gupta, MCITP: Database Administrator (2005), My Blog, Twitter: @SQLCAN[/url].
Microsoft FTE - SQL Server PFE

* Some time its the search that counts, not the finding...
* I didn't think so, but if I was wrong, I was wrong. I'd rather do something, and make a mistake than be frightened and be doing nothing. :smooooth:[/font]

How to ask for help .. Read Best Practices here[/url].

Anonymous SSC-Dedicated Points: 32891 More actions · Answer 6

The last time we experienced these issues it started only during DBCC Check, then it would happen once in a while during somewhat heavy user use and would happen more and more.

When we started the talks about this new environment I asked about putting TEMP DB on a seperate drive. Our DASD guy said it is not necessary due to the newer storage we were moving to, so it is not on a seperate drive.

Mohit K. Gupta SSChampion Points: 12130 More actions · Answer 7

Yaa our Disk guys tell us that too; I still do it!

I have always noticed performace improvement. Other thing how many CPU you have and how many Data files in the TempDB?

Mohit.

[font="Arial"]---

Mohit K. Gupta, MCITP: Database Administrator (2005), My Blog, Twitter: @SQLCAN[/url].
Microsoft FTE - SQL Server PFE

* Some time its the search that counts, not the finding...
* I didn't think so, but if I was wrong, I was wrong. I'd rather do something, and make a mistake than be frightened and be doing nothing. :smooooth:[/font]

How to ask for help .. Read Best Practices here[/url].

Anonymous SSC-Dedicated Points: 32891 More actions · Answer 8

Two CPUs. One MDF and one LDF file.

The thing is that we did not have this issue on the old server hardware and this issue starts about 4-5 weeks after a reboot and it starts with stuck IO's while running DBCC CheckDBs while the system is mostly idle and then over time gets worse. It is like something builds up over time. I see the same pattern on our test cluster as well and that has very little throughput as far as utilization.

maechismo_8514 SSCoach Points: 17510 More actions · Answer 9

Markus (3/26/2009)
Two CPUs. One MDF and one LDF file.

Sorry to jump in between 😀

Best practice:

number of temp db data files should be = number of CPU's

Mohit K. Gupta SSChampion Points: 12130 More actions · Answer 10

Markus (3/26/2009)
Two CPUs. One MDF and one LDF file.
The thing is that we did not have this issue on the old server hardware and this issue starts about 4-5 weeks after a reboot and it starts with stuck IO's while running DBCC CheckDBs while the system is mostly idle and then over time gets worse. It is like something builds up over time. I see the same pattern on our test cluster as well and that has very little throughput as far as utilization.

As Krisha said you should create two data files on server; but do you have 2 CPU with Quard core or Duo-Core configuration? It is number of core = number of data files.

But getting worse over time; what do you do to get it back to proper operations again?

Mohit.

[font="Arial"]---

Mohit K. Gupta, MCITP: Database Administrator (2005), My Blog, Twitter: @SQLCAN[/url].
Microsoft FTE - SQL Server PFE

* Some time its the search that counts, not the finding...
* I didn't think so, but if I was wrong, I was wrong. I'd rather do something, and make a mistake than be frightened and be doing nothing. :smooooth:[/font]

How to ask for help .. Read Best Practices here[/url].

maechismo_8514 SSCoach Points: 17510 More actions · Answer 11

And on top of what mohit advised, each tempdb data files should be of equal size.

Anonymous SSC-Dedicated Points: 32891 More actions · Answer 12

A reboot makes the problem go away for about 4-6 weeks. I follow what you say about TEMPDB but I am thinking that is not the issue since right now I only see the problem while DBCC CHECKDB runs. If it was all user based heavy sorts and happens all the time I could see that being a possible answer. In monitoring this SQL Server there are not alot of heavy sorts going on. This is mainly an insert heavy system and minor select statements of small tables not heavy sorts with big where clauses and very very tiny joins or where clause/order by's. Two tables that are inserted into I have delete jobs that run at 3ish in the morning to keep the table with 60 days of data in them. When things run well the jobs run in less than 1 minute, once the IO stucks start showing up the delete jobs start running 3-6 minutes. The amount of data in the table and amount of data to delete is pretty much the same and the load on the system is very low when the delete jobs run.

Mohit K. Gupta SSChampion Points: 12130 More actions · Answer 13

Markus (3/27/2009)
A reboot makes the problem go away for about 4-6 weeks. I follow what you say about TEMPDB but I am thinking that is not the issue since right now I only see the problem while DBCC CHECKDB runs. If it was all user based heavy sorts and happens all the time I could see that being a possible answer. In monitoring this SQL Server there are not alot of heavy sorts going on. This is mainly an insert heavy system and minor select statements of small tables not heavy sorts with big where clauses and very very tiny joins or where clause/order by's. Two tables that are inserted into I have delete jobs that run at 3ish in the morning to keep the table with 60 days of data in them. When things run well the jobs run in less than 1 minute, once the IO stucks start showing up the delete jobs start running 3-6 minutes. The amount of data in the table and amount of data to delete is pretty much the same and the load on the system is very low when the delete jobs run.

Hmm probably not I/O issue if your issues resolves after reboot. Following things to consider ...

You said you designed indexes for performance tuning; how often you reorganize/rebuild them?

What is your memory pressures like?

What is the tempdb size like?

Mohit.

[font="Arial"]---

Mohit K. Gupta, MCITP: Database Administrator (2005), My Blog, Twitter: @SQLCAN[/url].
Microsoft FTE - SQL Server PFE

* Some time its the search that counts, not the finding...
* I didn't think so, but if I was wrong, I was wrong. I'd rather do something, and make a mistake than be frightened and be doing nothing. :smooooth:[/font]

How to ask for help .. Read Best Practices here[/url].

Anonymous SSC-Dedicated Points: 32891 More actions · Answer 14

Tuning indexes: Last Thursday night we had an outage that I rebuilt ALL of the indexes and did a 100% update stats as well. I typically rebuild some of the tables indexes every 3-4 months.

TEMPDB: MDF file is at a 1 Gig size, LDF file is at 50MB

Memory: There is 4 gig of memory in the server as it is 32bit and SQL Server is using 1.8 gig.

I have been very proactive over the last year in running profiler and also analyzing any new application SQL going into production for indexing.