IO and drive latency

Question

IO and drive latency

coolchaitu

SSChampion

Points: 10334
More actions
October 6, 2017 at 8:09 am

#328955

Good morning Experts,
What is a good value for avg_io_stall_ms and pending_io_requests? I am asking this to see if my server is facing I/O bottleneck. Are there any other ways and good values to check if my server is facing IO bottleneck. Also, what is drive latency and what is a good value for it.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Henrico Bekker One Orange Chip Points: 27652 More actions · Answer 1

As with any stall value, you'd want it as low as possible, and by that, I mean under 5ms would be idea in most cases, in some cases that would be considered high as well.

Your question is a bit vague, there are too many factors involved.
What is the current underlying storage configuration used, what connectivity is used to the storage arrays, disk speeds etc.
Also, what is more important in your environment, read or write optimized configuration?

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This thing is addressing problems that dont exist. Its solution-ism at its worst. We are dumbing down machines that are inherently superior. - Gilfoyle

Sue_H SSC Guru Points: 90860 More actions · Answer 2

As low as possible as Henrico said.
Using sys.dm_io_virtual_file_stats is a good way to check the I/O performance but you can't really just query it once and have it tell you anything as the numbers are cumulative since SQL Server started. You should do an initial capture for a baseline and then one (or more) captures after that so that you can determine the performance for the period of time between the captures.

Sue

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 3

I've found that about 99.9% of the time, it's not the hard disk a latency turns out to be a bit of a lie. It's normally bad code that can't actually be reused and so it has to compile every time that it's used. About 99.9 % of that is ORM generated code. Fix that and what appear to be disk problems will simply sublimate.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

coolchaitu SSChampion Points: 10334 More actions · Answer 4

Jeff Moden - Saturday, October 7, 2017 2:37 PM
I've found that about 99.9% of the time, it's not the hard disk a latency turns out to be a bit of a lie. It's normally bad code that can't actually be reused and so it has to compile every time that it's used. About 99.9 % of that is ORM generated code. Fix that and what appear to be disk problems will simply sublimate.

So, shall i say OPerations team that the issue is with the code?

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 5

coolchaitu - Sunday, October 8, 2017 7:07 AM
Jeff Moden - Saturday, October 7, 2017 2:37 PM
I've found that about 99.9% of the time, it's not the hard disk a latency turns out to be a bit of a lie. It's normally bad code that can't actually be reused and so it has to compile every time that it's used. About 99.9 % of that is ORM generated code. Fix that and what appear to be disk problems will simply sublimate.
So, shall i say OPerations team that the issue is with the code?

Not until you prove it. Never say anything that you can't prove because you always have to prove it especially when it comes to things like this.

The only way that you can prove it is to find the code (could be plural here) that's causing the problem and why it's causing the problem. It's one of those things that we have to do just because of the nature of what we do.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

coolchaitu SSChampion Points: 10334 More actions · Answer 6

Jeff Moden - Sunday, October 8, 2017 7:19 AM
coolchaitu - Sunday, October 8, 2017 7:07 AM
Jeff Moden - Saturday, October 7, 2017 2:37 PM
I've found that about 99.9% of the time, it's not the hard disk a latency turns out to be a bit of a lie. It's normally bad code that can't actually be reused and so it has to compile every time that it's used. About 99.9 % of that is ORM generated code. Fix that and what appear to be disk problems will simply sublimate.
So, shall i say OPerations team that the issue is with the code?
Not until you prove it. Never say anything that you can't prove because you always have to prove it especially when it comes to things like this.
The only way that you can prove it is to find the code (could be plural here) that's causing the problem and why it's causing the problem. It's one of those things that we have to do just because of the nature of what we do.

Thanks Jeff. I have this script:

SELECT AVG(pending_disk_io_count) AS [Avg Pending DiskIO Count]
FROM sys.dm_os_schedulers
WHERE scheduler_id <255 OPTION(RECOMPILE)

So, if Avg Pending DiskIO Count>0, it means I/O bottleneck, am i correct?

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 7

coolchaitu - Sunday, October 8, 2017 7:24 AM
Jeff Moden - Sunday, October 8, 2017 7:19 AM
coolchaitu - Sunday, October 8, 2017 7:07 AM
Jeff Moden - Saturday, October 7, 2017 2:37 PM
I've found that about 99.9% of the time, it's not the hard disk a latency turns out to be a bit of a lie. It's normally bad code that can't actually be reused and so it has to compile every time that it's used. About 99.9 % of that is ORM generated code. Fix that and what appear to be disk problems will simply sublimate.
So, shall i say OPerations team that the issue is with the code?
Not until you prove it. Never say anything that you can't prove because you always have to prove it especially when it comes to things like this.
The only way that you can prove it is to find the code (could be plural here) that's causing the problem and why it's causing the problem. It's one of those things that we have to do just because of the nature of what we do.
Thanks Jeff. I have this script:
SELECT AVG(pending_disk_io_count) AS [Avg Pending DiskIO Count]
FROM sys.dm_os_schedulers
WHERE scheduler_id <255 OPTION(RECOMPILE)
So, if Avg Pending DiskIO Count>0, it means I/O bottleneck, am i correct?

Some will differ with me but, IMHO, all that code does is measure the symptoms. It doesn't identify the cause. I hate the term "IO Bottleneck" because a whole lot of people think that automatically means the disk system or the "pipe" connecting the disk system to the server has a problem. While that may certainly be the case, it usually isn't the problem.

The code above may correctly identify that there's a whole lot of "IO" stacked up. There are two possible reasons for that... the first is like I said before. It could be the disk system or the "pipe". Or, the disk system might be just fine but having to work its ever loving hinny off to keep up with the bad code that's requesting all of the IO. Normally, it's the bad code and not the disk or the "pipe".

My recommendation would be to do two things... make sure your statistics are up to date and then find the code that is causing so much IO because no one will believe you until you find the bad code and demonstrate why its bad.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Arsh SSCertifiable Points: 6354 More actions · Answer 8

Though the smaller the number the better , it depends as pointed out by others. Bad code causes unnecessary IO causing the performance issues ,and shares the bulk of it. Advising to see if the file id column values remain the same for different pieces of code running at the time of the bottleneck .How's the file distribution on the storage ? And bad DB design might also bite in the long run when the data grows . but yes either case , the code has to be efficient.