October 6, 2017 at 8:09 am
Good morning Experts,
What is a good value for avg_io_stall_ms and pending_io_requests? I am asking this to see if my server is facing I/O bottleneck. Are there any other ways and good values to check if my server is facing IO bottleneck. Also, what is drive latency and what is a good value for it.
October 6, 2017 at 8:29 am
As with any stall value, you'd want it as low as possible, and by that, I mean under 5ms would be idea in most cases, in some cases that would be considered high as well.
Your question is a bit vague, there are too many factors involved.
What is the current underlying storage configuration used, what connectivity is used to the storage arrays, disk speeds etc.
Also, what is more important in your environment, read or write optimized configuration?
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This thing is addressing problems that dont exist. Its solution-ism at its worst. We are dumbing down machines that are inherently superior. - Gilfoyle
October 6, 2017 at 8:46 am
As low as possible as Henrico said.
Using sys.dm_io_virtual_file_stats is a good way to check the I/O performance but you can't really just query it once and have it tell you anything as the numbers are cumulative since SQL Server started. You should do an initial capture for a baseline and then one (or more) captures after that so that you can determine the performance for the period of time between the captures.
Sue
October 7, 2017 at 2:37 pm
I've found that about 99.9% of the time, it's not the hard disk a latency turns out to be a bit of a lie. It's normally bad code that can't actually be reused and so it has to compile every time that it's used. About 99.9 % of that is ORM generated code. Fix that and what appear to be disk problems will simply sublimate.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 8, 2017 at 7:07 am
Jeff Moden - Saturday, October 7, 2017 2:37 PMI've found that about 99.9% of the time, it's not the hard disk a latency turns out to be a bit of a lie. It's normally bad code that can't actually be reused and so it has to compile every time that it's used. About 99.9 % of that is ORM generated code. Fix that and what appear to be disk problems will simply sublimate.
So, shall i say OPerations team that the issue is with the code?
October 8, 2017 at 7:19 am
coolchaitu - Sunday, October 8, 2017 7:07 AMJeff Moden - Saturday, October 7, 2017 2:37 PMI've found that about 99.9% of the time, it's not the hard disk a latency turns out to be a bit of a lie. It's normally bad code that can't actually be reused and so it has to compile every time that it's used. About 99.9 % of that is ORM generated code. Fix that and what appear to be disk problems will simply sublimate.So, shall i say OPerations team that the issue is with the code?
Not until you prove it. Never say anything that you can't prove because you always have to prove it especially when it comes to things like this.
The only way that you can prove it is to find the code (could be plural here) that's causing the problem and why it's causing the problem. It's one of those things that we have to do just because of the nature of what we do.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 8, 2017 at 7:24 am
Jeff Moden - Sunday, October 8, 2017 7:19 AMcoolchaitu - Sunday, October 8, 2017 7:07 AMJeff Moden - Saturday, October 7, 2017 2:37 PMI've found that about 99.9% of the time, it's not the hard disk a latency turns out to be a bit of a lie. It's normally bad code that can't actually be reused and so it has to compile every time that it's used. About 99.9 % of that is ORM generated code. Fix that and what appear to be disk problems will simply sublimate.So, shall i say OPerations team that the issue is with the code?
Not until you prove it. Never say anything that you can't prove because you always have to prove it especially when it comes to things like this.
The only way that you can prove it is to find the code (could be plural here) that's causing the problem and why it's causing the problem. It's one of those things that we have to do just because of the nature of what we do.
Thanks Jeff. I have this script:
SELECT AVG(pending_disk_io_count) AS [Avg Pending DiskIO Count]
FROM sys.dm_os_schedulers
WHERE scheduler_id <255 OPTION(RECOMPILE)
So, if Avg Pending DiskIO Count>0, it means I/O bottleneck, am i correct?
October 8, 2017 at 8:07 am
coolchaitu - Sunday, October 8, 2017 7:24 AMJeff Moden - Sunday, October 8, 2017 7:19 AMcoolchaitu - Sunday, October 8, 2017 7:07 AMJeff Moden - Saturday, October 7, 2017 2:37 PMI've found that about 99.9% of the time, it's not the hard disk a latency turns out to be a bit of a lie. It's normally bad code that can't actually be reused and so it has to compile every time that it's used. About 99.9 % of that is ORM generated code. Fix that and what appear to be disk problems will simply sublimate.So, shall i say OPerations team that the issue is with the code?
Not until you prove it. Never say anything that you can't prove because you always have to prove it especially when it comes to things like this.
The only way that you can prove it is to find the code (could be plural here) that's causing the problem and why it's causing the problem. It's one of those things that we have to do just because of the nature of what we do.
Thanks Jeff. I have this script:
SELECT AVG(pending_disk_io_count) AS [Avg Pending DiskIO Count]
FROM sys.dm_os_schedulers
WHERE scheduler_id <255 OPTION(RECOMPILE)So, if Avg Pending DiskIO Count>0, it means I/O bottleneck, am i correct?
Some will differ with me but, IMHO, all that code does is measure the symptoms. It doesn't identify the cause. I hate the term "IO Bottleneck" because a whole lot of people think that automatically means the disk system or the "pipe" connecting the disk system to the server has a problem. While that may certainly be the case, it usually isn't the problem.
The code above may correctly identify that there's a whole lot of "IO" stacked up. There are two possible reasons for that... the first is like I said before. It could be the disk system or the "pipe". Or, the disk system might be just fine but having to work its ever loving hinny off to keep up with the bad code that's requesting all of the IO. Normally, it's the bad code and not the disk or the "pipe".
My recommendation would be to do two things... make sure your statistics are up to date and then find the code that is causing so much IO because no one will believe you until you find the bad code and demonstrate why its bad.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 15, 2017 at 9:14 am
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply