IO Errors

Question

IO Errors

SQLSACT

SSC-Insane

Points: 21838
More actions
January 11, 2013 at 4:17 am

#272750

Hi All
Recently I have been observing the following types of errors in the SQL error log
SQL Server has encountered 100 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [i]datafile[/i] in database [i]db_name[/i] (5).
These errors seem to be popping up more and more.
Things I've checked:
>> Checked wait stats on the server and noticed IO related waits at the top
>> Using sys.dm_db_index_operational_stats, I've tracked down the top tables with the highest page_io_latch-wait_counts.
>> The indexes on these tables are heavily fragmented, 70% +
>> I've also observed what I think is a low buffer cache hit ratio, between 89%-92%
>> Drives containing datafiles are used ONLY for datafiles.
Could the fragmentation be the cause of the IO delays?
Is there anything else I can check to track this down?
Thanks

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 1

It could be fragmentation, but more likely is either slow disks or disk contention. Check your wait statistics to see what is causing the system to slow down. See if you're dealing primarily with i/o issues. Then collect metrics on queries to figure out which ones are using the most i/o. Tune 'em.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

SQLSACT SSC-Insane Points: 21838 More actions · Answer 2

Thanks

I've checked the highest averaging IO queries in cache and there are definately some serious scans happening against the top tables

Also, do you think that the low Buffer cache hit ratio could be contributing? From what I understand, the low hit ratio means that more queries are forced to retrieve pages from disk. Along with this, Page life expectancy is between 700-900 secs. The server has 20GB of ram and SQL has been allocated 4GB.

Thanks

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 3

Buffer cache hit ratio is a poor indicator. The page life expectancy is pretty low though. I suspect you are hitting the disk a lot. Fragmentation may help at the margins, but it sounds like you either are moving massive amounts of data, or you need to tune some queries.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

SQLSACT SSC-Insane Points: 21838 More actions · Answer 4

Thanks

Thanks, I know that there are definately some queries hitting the disk quite hard, and that could definately do with some tuning.

In your opinion, what's a better approach: Changing the T-SQL Code or adding indexes to accomodate the code ?

I'm gonna increase SQL's memory allocation as well and observe from there. SQL is only allocated 4GB out of a total of 20GB available.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 5

SQLSACT (1/12/2013)
In your opinion, what's a better approach: Changing the T-SQL Code or adding indexes to accomodate the code ?

Yes.

Adding indexes is useless if the problem is in the code. Changing the code is useless if the problem is a lack of indexes.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

SQLSACT SSC-Insane Points: 21838 More actions · Answer 6

GilaMonster (1/12/2013)
SQLSACT (1/12/2013)
In your opinion, what's a better approach: Changing the T-SQL Code or adding indexes to accomodate the code ?
Yes.
Adding indexes is useless if the problem is in the code. Changing the code is useless if the problem is a lack of indexes.

Thanks

The Queries that seems to be racking up the most IO are not queries that run often, they're run at most 5 times a day. This is an OLTP system.

Adding indexes is useless if the problem is in the code. Changing the code is useless if the problem is a lack of indexes

Where do you draw the line between this, when does code become "bad code". The queries are not queries like "select * from some_table".

Thanks

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 7

SQLSACT (1/12/2013)
Where do you draw the line between this, when does code become "bad code".

When testing shows that it performs badly.

Performance tuning is based on testing, not guesswork. Find the queries with the worst IO load (not the ones that do the most reads per execution). Run them. Identify why they're not performing well. Fix whatever problem was identified. Run them. Do they perform acceptably? If not, identify why and fix. Repeat until performance is acceptable.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

george sibbald SSC Guru Points: 104210 More actions · Answer 8

SQLSACT (1/12/2013)
Thanks
I've checked the highest averaging IO queries in cache and there are definately some serious scans happening against the top tables
Also, do you think that the low Buffer cache hit ratio could be contributing? From what I understand, the low hit ratio means that more queries are forced to retrieve pages from disk. Along with this, Page life expectancy is between 700-900 secs. The server has 20GB of ram and SQL has been allocated 4GB.

why such a low memory allocation to SQL out of the total? Increasing the memory available to SQL will improve your page life expectancy and reduce the need to go to disk.

---------------------------------------------------------------------

SQLSACT SSC-Insane Points: 21838 More actions · Answer 9

The application that uses this SQL Server is installed on the server as well and is very resource intensive

SQLRNNR SSC Guru Points: 281334 More actions · Answer 10

SQLSACT (1/12/2013)
The application that uses this SQL Server is installed on the server as well and is very resource intensive

My first step would be to split these two servers.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

SQLSACT SSC-Insane Points: 21838 More actions · Answer 11

Find the queries with the worst IO load (not the ones that do the most reads per execution).

Thanks

I've been using the following script to identify the highest IO intensive queries

SELECT top 20

SUBSTRING(t.text, ( s.statement_start_offset / 2 ) + 1,

( ( CASE statement_end_offset

WHEN -1 THEN DATALENGTH(t.text)

ELSE s.statement_end_offset

END - s.statement_start_offset ) / 2 ) + 1)

AS statement_text,

text,

objtype,

cacheobjtype,

usecounts,

last_execution_time,

total_physical_reads,

total_physical_reads / execution_count AS [Avg Physical Reads],

total_logical_reads,

total_logical_reads / execution_count AS [Avg Logical Reads],

total_physical_reads+total_logical_reads/execution_count AS [Total AVG IO],

total_worker_time,

total_worker_time / execution_count AS [Avg CPU Time],

execution_count ,

qp.query_plan

FROM sys.dm_exec_query_stats AS s

inner join sys.dm_exec_cached_plans cp

on s.plan_handle = cp.plan_handle

CROSS APPLY sys.dm_exec_sql_text(s.sql_handle) AS t

CROSS APPLY sys.dm_exec_query_plan(s.plan_handle) qp

order by [Total AVG IO] desc

Have I been going about my investigation incorrectly?

Thanks

SQLRNNR SSC Guru Points: 281334 More actions · Answer 12

Change your order by to Avg Physical Reads. The Total Avg in that query has physical reads plus logical reads.

Starting with IO is a good start. Identifying those high IO queries is what Gail suggested be done. From there, tune the code appropriately and add indexes as needed.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

SQLSACT SSC-Insane Points: 21838 More actions · Answer 13

SQLRNNR (1/12/2013)
Change your order by to Avg Physical Reads. The Total Avg in that query has physical reads plus logical reads.
Starting with IO is a good start. Identifying those high IO queries is what Gail suggested be done. From there, tune the code appropriately and add indexes as needed.

Thanks