April 18, 2013 at 11:59 am
Im trying to either prove or disprove that the slow front-end (web ui) performance is, in some part, attributed to disk bottleneck. I've run SQLIO to gather some metrics, but they'll always be just numbers without context. For example:
c:\Program Files (x86)\SQLIO>sqlio -kR -t4 -s120 -o64 -frandom -b64 -LS -Fparam.txt
sqlio v1.5.SG
using system counter for latency timings, 3579545 counts per second
parameter file used: param.txt
file e:\sqlio\testfile.dat with 2 threads (0-1) using mask 0x0 (0)
2 threads reading for 120 secs from file e:\sqlio\testfile.dat
using 64KB random IOs
enabling multiple I/Os per thread with 64 outstanding
using specified size: 100 MB for file: e:\sqlio\testfile.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec: 1649.36
MBs/sec: 103.08
latency metrics:
Min_Latency(ms): 4
Avg_Latency(ms): 77
Max_Latency(ms): 779
How can I provide context to the above to state definitely that the disk subsystem is underperforming to support a multi client UI tool. All of the articles I've read online only seem to use SQLIO as a tuning tool (optimize within existing hardware scope), not as a tool to determine whether the architecture is suitable or not.
Anyone have any ideas? Am I using the right tool? How can I either make this case or move onto something else?
Thanks in advance.
April 18, 2013 at 1:54 pm
Have you used performance monitor on the server?
Specifically Physical Disk AVG SEC/READ & SEC/WRITE. Any values over 20ms indicate a problem with disk IO.
Andrew
April 18, 2013 at 1:59 pm
Can anyone provide any feedback on this or please point me in the right direction? Im not sure if I've given enough detail or if this is a question that's been asked too many times around here.
Thanks again.
April 18, 2013 at 2:03 pm
THanks for responding Andrew. Yes, I have. In fact I did that first. Yes, Avg Disk sec/Read is ~.030, which I believe is 30ms.
April 18, 2013 at 2:07 pm
OK, what's your disk system, DAS or SAN?
Andrew
April 18, 2013 at 2:56 pm
DAS, RAID-5. Unfortunately, our server is installed an a virtual host. 32GB RAM allocated. I can't speak to what other hosts are sharing the same physical storage.
April 18, 2013 at 3:22 pm
Alright, as you have stats showing that there is latency from the disk system I would try and prove it.
You're right when you initially said that figures showing latency are hard to completely confirm that the disk system is the bottleneck. You don't have a baseline so it is impossible to determine that that is the problem (the latency from the disk could have always been at those figures and something else is causing the problem).
I would try and prove it by running a query that would retrieve a large number of records from the database.
Initially (if the table has not been queried recently), SQL Server would have to retrieve the data from disk and copy into memory. Once that has been done, subsequent executions of that query would return quickly. This should prove that as the initial execution was longer, the disk system is the bottleneck.
Should be quite a good way of demonstrating the problem by showing the run times of the first query against subsequent queries.
If your system has many different queries executed against it, this would be a good demonstration of where the problem is. However if you have multiple executions of the same query, the problem lies elsewhere.
I would caution against running this on your live system as it would affect performance.
What do you think?
Andrew
April 18, 2013 at 3:28 pm
Sorry, forgot to say that when you run the query have a look at Activity Monitor for IO_COMPLETION wait types
April 19, 2013 at 7:45 am
This sounds good and is kinda what I've been doing. Do you know of any MS recommendation white papers that I can use to compare my latency results against to present to non-technical individuals? That would really go a long way to kinda prove that we're trying to get blood from a stone.
Thanks.
April 19, 2013 at 8:36 am
Your SQLIO test file size is far too small to be a valid test. I prefer to make it something close to the size of the database or at least 20 GB. Run a short throwaway test to create the test file, and use it for the remaining tests. This makes sure you are reading/writing data from/to the disk, instead of the disk controller cache.
Also, you should really run a series of tests, where you add to the workload on each test until you reach the limit of the acceptable level of Avg Latency, say 40 MS. You can do this by increasing the number of threads on each test.
You also need to run tests for sequential read, sequential write, random read, and random write. Random read and random write are probably the most important for SQL Server.
Also, run these tests with different block sizes, say 4K, 8K, 16K, 32K, and 64K. 64K is probably the most important for SQL Server.
One final suggestion: setup a standard test suite and perform it on every new SQL Server before you put it into production. Make sure the IO performance is OK before you go live. Save the test results for each server so that you can refer to them later and compare them to a new series of tests if there are problems later. An Excel worksheet is a good place to store the data.
I know this seems like a lot of work, but it can save a lot of trouble in the long run, and it gives you hard numbers to show there is a problem. I have managed hundreds of servers and I can tell you from experience that disk IO performance problems are not uncommon, especially with SAN.
April 19, 2013 at 1:56 pm
Michael Valentine Jones (4/19/2013)
Your SQLIO test file size is far too small to be a valid test. I prefer to make it something close to the size of the database or at least 20 GB. Run a short throwaway test to create the test file, and use it for the remaining tests. This makes sure you are reading/writing data from/to the disk, instead of the disk controller cache.Also, you should really run a series of tests, where you add to the workload on each test until you reach the limit of the acceptable level of Avg Latency, say 40 MS. You can do this by increasing the number of threads on each test.
You also need to run tests for sequential read, sequential write, random read, and random write. Random read and random write are probably the most important for SQL Server.
Also, run these tests with different block sizes, say 4K, 8K, 16K, 32K, and 64K. 64K is probably the most important for SQL Server.
One final suggestion: setup a standard test suite and perform it on every new SQL Server before you put it into production. Make sure the IO performance is OK before you go live. Save the test results for each server so that you can refer to them later and compare them to a new series of tests if there are problems later. An Excel worksheet is a good place to store the data.
I know this seems like a lot of work, but it can save a lot of trouble in the long run, and it gives you hard numbers to show there is a problem. I have managed hundreds of servers and I can tell you from experience that disk IO performance problems are not uncommon, especially with SAN.
Thanks Michael for responding. I bolded the above where im having trouble determing "OK". How do I define "OK" within the context of the server's role (SQL backend for web UI). Is there something I can reference that I can present to help assist with my recommendation?
The only thing I've found so far related to avg disk read/sec:
Is there anything like this for SQLIO?
April 22, 2013 at 7:49 am
Michael, are you able to assist on the above? thanks.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply