May 1, 2011 at 10:33 am
I've got a pretty good SQL 2008, SP2 server, 64-bit with 16GB of ram, there is a virtual array (RAID 1, and there are two other arrays, both RAID 10 )
I keep getting notifications that the the average disk queue length is 68 on one disk and 105 on another.
In this setup, in attempts to minimize disk contention we've split up logging for all DB files on a seperate disk array (D:\) and have replicated data files on drive F:\ and the rest of our data files on drive E:\. There is ample free space.
Clearly I can search google for articles and whitepapers about what "average disk queue length" means but am hoping for an answer that can kinda sum up what I should do about it or how to make it lower.
From what I've already gathered online I've found:
Average number of both read and write requests that were queued for all the disks on the SQL Server computer. If this queue often remains above 6 per physical disk for more than 10 seconds at a time then your disk subsystem is overloaded. In this case consider the following:
-Check OS Paging to make sure that paging from/to the swap file is not causing these IOs
-Replace disks with faster disks.
-Add more disks to your RAID array.
-Switch your RAID array from RAID 5 to a RAID 10 solution as each write IO results in two writes using RAID 10 vs. 4 for RAID 5.
-Move non-SQL Server data hungry applications to another computer.
-If the RAID controller has some form of battery backup, switch the cache mode from Write-through to Write-back as this increases the system ability to handle write IOs by an order of magnitude.
-If the computer is running multiple instances of SQL Server, consider placing each instance on a separate computer.
#1. How do i check to see if OS is paging from /to the swap file?
#2. Already have 15K RPM drives
#3. The array (both) are already maxed out
#4. Already set up to be RAID 10
#5. There's no apps to move to another computer. Only things like HyperBac are running
#6. I don't believe this is the case
#7. There is only one instance.
Can anyone offer me some finer understanding on the issues and possible "things to check?"
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
May 1, 2011 at 10:57 am
IMO this thread http://www.sqlservercentral.com/Forums/Topic1044430-391-1.aspx is a very good ref on DISK/SAN troubleshooting.
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
May 1, 2011 at 11:33 am
Average disk queue length (Total) doesn't give you any good insight. It gives you the average values of the all disks (Including C and D) as a whole which will skew the metrics of the actual data and log disks.
Instead use Average disk queue length per each disk like Data, Log and Tempdb. Also Average disk queue length could be misleading. So consider Current disk queue length instead.
When it comes to disks the most useful counters are
Physical Disk: Avg Disk Sec/Read
Physical Disk: Avg Disk Sec/Write
Just start a perfmon counter log and collect these counters which will give you a better insight on the Disk bottlenecks
To check the paging happening on the server you need to use the following counters
PagingFile: %Usage
PagingFile: %Usage Peak
It's recommended to grant Lock pages in Memory privilege to the SQL Server account to prevent the SQL Server pages from getting paged out to the page file on HD.
Thank You,
Best Regards,
SQLBuddy
May 2, 2011 at 10:18 am
1) how many spindles underly the RAID sets you mention?
2) what else is on those RAID sets?
3) Have you checked for WHERE the actual bottleneck is? Assuming a SAN, there are umpteen things beteen the CPUs and the physical rotating media.
4) What is RAM situation?
5) have you done a file IO stall and wait stats analysis?
6) as someone else mentioned, ADQL is useless these days. Avg disk sec/read and /write are what matter for each physical volume.
7) best way to get to the root causes and some corrections is to hire a professional tuner for a day or two.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
May 3, 2011 at 3:53 pm
MyDoggieJessie (5/1/2011)
.....I keep getting notifications that the the average disk queue length is 68 on one disk and 105 on another......
You've read a lot of good information on your own and received a lot of good advice on this thread.
Although average disk queue length is not the best measure of performance, it is not without some merit for direct attached storage when the numbers remain higher than the article you quoted, and your numbers are high, although I have seen worse, much worse.
Without knowing more about your server and application environment, my best professional recommendation, to enable you to home in on the problem quickly, is: Focus on your physical disk utilization and PLE (Page Life Expectancy) numbers. For direct attached disk storage, physical disk utilization needs to go down to a queue length of 5 or 6 for each disk. PLE needs to stay up, preferably above 300 page seconds. If either of these numbers is not optimum, the source of the problem(s) could be caused by some very badly written queries, or some large reports being executed concurrently with OLTP, or full table scans on large tables, or something else.
I think your disks are being worked hard because you don't have enough RAM to contain most of the data your applications and users are frequently accessing. I'll bet your PLE (Page Life Expectancy) is below 300 page seconds.
There are many things that you can look it, as advised in this thread, but if you're in a hurry, my advice is likely to get you to a solution quickly, since I've seen this situation before and had to solve it.
If I am correct, then more RAM on your server will likely help you, maybe completely solve the problem.
However, it just may be that you're asking too much of your direct attached disk system in its current configuration and you may need to look at ways to distribute your physical I/O among more spindles.
I recommend that you execute Windows Performance Monitor (PERFMON.exe) and set it up to look at your RAM (Memory:Available, Megabytes), CPU (Processor:%, Processor Time _Total), Physical Disk Queue Length for each disk (Physical Disk: Avg. Disk Queue Length), your Paging File Utilization (Paging File: %Usage and Paging File: %Usage Peak), PLE (SQL Server:Buffer Manager, Page Life Expectancy), and BCHR (SQL Server:Buffer Manager, Buffer Cache Hit Ratio). You can save these to a file or simply look at them in a 100 second, moving window. Look at them during your peak periods when performance is suffering. Record the numbers, high and low. Post them back on this thread so we can look at them.
If your numbers are inconclusive, there are more things we can look at.
LC
P.S. Which physical disk is your Windows Paging File on?
May 4, 2011 at 4:25 am
Check Performance Monitor and Resource Monitor also for the specific disk issue or queue and check this query to find out the query which one is a good candidate of disk queue
SELECT ST.TEXT,SP.* FROM SYS.SYSPROCESSES SP
CROSS APPLY SYS.DM_EXEC_SQL_TEXT(SP.SQL_HANDLE) ST
WHERE SP.PHYSICAL_IO > 100
ORDER BY SP.PHYSICAL_IO DESC
Regards,
Syed Jahanzaib Bin Hassan
BSCS | MCTS | MCITP | OCA | OCP | OCE | SCJP | IBMCDBA
My Blog
www.aureus-salah.com
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply