November 7, 2008 at 2:19 pm
When interpreting SET STATISTICS IO results, what is the order for interpreting statistics? In other words, how do you determine which query uses less I/O?
For 2 queries, I turned on SET STATISTICS IO ON and ran DBCC DROPCLEANBUFFERS and DBCC FREEPROCCACHE and got the following results:
Query 1:
Table 'tbl_pa_products'. Scan count 1, logical reads 47738, physical reads 92, read-ahead reads 77, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Query 2:
Table 'tbl_pa_products'. Scan count 6, logical reads 10133, physical reads 88, read-ahead reads 2416, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Which one uses more I/O?
November 8, 2008 at 4:43 am
The (unexpected) answer is both.
Physical and readahead reads are physical IOs, ie they are reads from the physical disk. Logical reads are reads from memory.
So, your second query required more pages read off disk, but the first required more reads from memory. Possibly because it read the pages multiple times. Could also mean that some of the pages it required were already in memory. DropCleanBuffers only removes clean pages from memory. If there were dirty pages (ones with modifications not yet written to disk) they wouldn't have been dropped. Run checkpoint to write all dirty pages to disk.
Any chance of seeing the two queries and the table structure?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 9, 2008 at 3:46 pm
I've attached the 2 queries with all the query table's I/O information, referenced functions, and create table statements.
If you need anything else, please let me know.
Thanks again for your input.
November 10, 2008 at 6:20 am
Any indexes on those tables?
Is the distinct necessary?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 10, 2008 at 6:36 am
I re-attached an updated version of the table_definitions.txt with the table indices.
I'm actually trying to optimize someone else's stored procedure. The DISTINCT keyword existed in the previous version of the stored procedure. However, removing the DISTINCT causes an incorrect result.
Please let me know if you need anything else.
Thanks again!
November 10, 2008 at 8:00 am
The query with the lowest IO requirement is the one with the lowest logical reads (unless there are LOB reads, in which case you also compare lob logical reads).
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
November 10, 2008 at 12:21 pm
TheSQLGuru (11/10/2008)
The query with the lowest IO requirement is the one with the lowest logical reads (unless there are LOB reads, in which case you also compare lob logical reads).
When you do I/O tests, how do you determine how many sample inputs to use for testing? For example, how many permutations of unique @dd and @r_id values would you use?
SELECT COUNT(*)
FROM tbl_batches as b
JOIN
tbl products as p on p.pk_product_id = b.fk_product_id
WHERE b.analytic_date = @dd
AND p.fk_retailer_id = @r_id
November 10, 2008 at 12:39 pm
jlp3630 (11/10/2008)
TheSQLGuru (11/10/2008)
The query with the lowest IO requirement is the one with the lowest logical reads (unless there are LOB reads, in which case you also compare lob logical reads).When you do I/O tests, how do you determine how many sample inputs to use for testing? For example, how many permutations of unique @dd and @r_id values would you use?
SELECT COUNT(*)
FROM tbl_batches as b
JOIN
tbl products as p on p.pk_product_id = b.fk_product_id
WHERE b.analytic_date = @dd
AND p.fk_retailer_id = @r_id
I do not understand what you are really looking for with your term "I/O tests" so I cannot answer your question. I doubt I would use testing of your nature anyway since parameter sniffing and plan caching would lead to inappropriate query plans and skewed IO results given the test script you have. Also a count(*) won't accurately portray the data needs of a query that actually selected 1 or more fields from the tables involved due to likely index usage for the count(*).
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
November 10, 2008 at 1:01 pm
TheSQLGuru (11/10/2008)
jlp3630 (11/10/2008)
TheSQLGuru (11/10/2008)
The query with the lowest IO requirement is the one with the lowest logical reads (unless there are LOB reads, in which case you also compare lob logical reads).When you do I/O tests, how do you determine how many sample inputs to use for testing? For example, how many permutations of unique @dd and @r_id values would you use?
SELECT COUNT(*)
FROM tbl_batches as b
JOIN
tbl products as p on p.pk_product_id = b.fk_product_id
WHERE b.analytic_date = @dd
AND p.fk_retailer_id = @r_id
I do not understand what you are really looking for with your term "I/O tests" so I cannot answer your question. I doubt I would use testing of your nature anyway since parameter sniffing and plan caching would lead to inappropriate query plans and skewed IO results given the test script you have. Also a count(*) won't accurately portray the data needs of a query that actually selected 1 or more fields from the tables involved due to likely index usage for the count(*).
Given 2 semantically equivalent queries, I'm trying to determine which one is better or worst in terms of performance. The reports relying on these queries take a couple days to run. As a result, I'm trying to find other ways to test which query version is "better".
When I searched online, it looks like I/O and time (CPU and elapsed) should be used for performance measurements. For I/O measurements, I've been using the SET STATISTICS IO and looking only at the logical reads (per your suggestion). Additionally, I've been taking CPU and elapsed time measurements via SET STATISTICS TIME. To circumvent the plan caching, I've been using DBCC FREEPROCCACHE.
What kind of general testing methodology do you recommend? I'm fairly new to query performance tuning, so any insights would be greatly appreciated.
November 10, 2008 at 1:26 pm
Dude, if you are new to perf tuning and are looking to improve stuff that currently takes days to run that could be a pretty tall order. 🙂
Here is my standard spiel for someone in your situation (and there are LOTS of people like you out there): hire a professional tuner to come in and give your servers, databases, code, etc a performance review. During that time these monster reports can be fixed to give direct benefit to company (I am certain there will be many other opportunities for improvement too). You will gain from mentoring by this person so that going forward you will have some tools and knowledge to address future problems on your own. It is a big win-win for you AND the company!
Doing this via forum posts is at best a haphazard process that will likely not get you either results or much useful training. Perf analysis and tuning is a HUGE topic. I have seen people go WEEKS back and forth trying to solve a problem that a good consultant could have narrowed down and solved in a matter of hours or even minutes. Seriously.
Failing the above, it is time to take some classes, read some books, spend many hours searching and reviewing web documents, blogs and postings. The best people in this area have likely spent tens of thousands of manhours learning how to be expert tuners (I know I have)!!
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
November 10, 2008 at 1:49 pm
Thanks for the advice. Unfortunately, a consultant is not an option because I'm working for a startup company. I was hired on to be the DBA because of my general programming and networking background and not necessarily for query tuning and performance. However, I know that I have a long way to go before becoming even a decent DBA.
Regarding my initial question, I guess I'm just looking for a methodology to test my queries and not necessarily opinions on why my query is running slow. Searching through forums and books, I found many "tricks" to rewrite my query. However, I'm not sure if the resulting rewritten query is better or worst for my situation.
For example, if you were hired to be an expert tuning consultant, how would you determine that version 1 of the query is "better" than version 2 of query A? All I've read in forums and websites is to do testing and more testing, but how much testing is "good enough"? Do I systematically go through all possible permutations of inputs and do a weighted average? Can I just take a single random test value set and go from there?
November 10, 2008 at 2:09 pm
jlp3630 (11/10/2008)
Thanks for the advice. Unfortunately, a consultant is not an option because I'm working for a startup company. I was hired on to be the DBA because of my general programming and networking background and not necessarily for query tuning and performance. However, I know that I have a long way to go before becoming even a decent DBA.Regarding my initial question, I guess I'm just looking for a methodology to test my queries and not necessarily opinions on why my query is running slow. Searching through forums and books, I found many "tricks" to rewrite my query. However, I'm not sure if the resulting rewritten query is better or worst for my situation.
For example, if you were hired to be an expert tuning consultant, how would you determine that version 1 of the query is "better" than version 2 of query A? All I've read in forums and websites is to do testing and more testing, but how much testing is "good enough"? Do I systematically go through all possible permutations of inputs and do a weighted average? Can I just take a single random test value set and go from there?
Tell the startup's senior management (and their investors if they are that far along) that they have no hope of success long-term if they don't do the right things now with their data structures and queries. If they question the veracity of my statement have them contact me directly - I have CIOs that will back up my claim that SQL Server can be overwhelmed by people who don't know how to interact with it properly and that throwing hardware at it will NOT fix the problem in many cases. And it gets HUGELY more complicated and expensive to fix problems after they are in production.
Query performance has several metrics: IO cost (logical reads and writes), CPU cost, Duration. Also a query itself actually has a cost you can see if you show the actual execution plan. NOTE: you really need to use profiler to see IO and CPU properly, because IO and CPU doesn't show correctly if UDFs are involved.
I test multiple inputs always, and do statistical analysis of the data distribution to look for parameters that may have widely varying rowcounts since that will lead to the need for different query plans.
This isn't even a drop in the ocean you realize.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
November 11, 2008 at 12:33 am
jlp3630 (11/10/2008)
Given 2 semantically equivalent queries, I'm trying to determine which one is better or worst in terms of performance. The reports relying on these queries take a couple days to run. As a result, I'm trying to find other ways to test which query version is "better".
Those simple queries you posted take a couple of days? From the IO output I would not think so. Couple minutes, maybe, I wouldn't think much more than that.
If your boss won't hire someone, you have a massive struggle ahead. First, you have to find out what portion of those reports is slow. Optimising the wrong code is a complete waste of time. Run profiler while the report is running and have a look at exactly what is running against SQL. Look for queries that are running repeatedly or ones that take massive amounts of time and resources and look at optimising those.
Generally, tweaking the order of pieces of the query isn't going to get you much. Your biggest gains will be from removing cursors, making conditions in the where clause sargable (if they're not) and adding or changing indexes
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 11, 2008 at 6:22 am
The queries that I posted are actually the rewritten versions. Over the course of the report creation, this stored procedure (and it's "sibling" stored procedures) is getting called about 25000 times. Initially, the stored procedure took approximately 10-15 seconds of time. Based off my research online and books that I've read, I've already reduced it to approximately 3 seconds per call. However, I know that there are probably a lot more improvements to be made.
When management comes back in the office, I plan to speak with them.
November 11, 2008 at 8:43 am
jlp3630 (11/11/2008)
The queries that I posted are actually the rewritten versions. Over the course of the report creation, this stored procedure (and it's "sibling" stored procedures) is getting called about 25000 times. Initially, the stored procedure took approximately 10-15 seconds of time. Based off my research online and books that I've read, I've already reduced it to approximately 3 seconds per call. However, I know that there are probably a lot more improvements to be made.When management comes back in the office, I plan to speak with them.
The winning solution here is to aggregate all 25000 calls into a single one and let SQL Server do the work one time. I would bet that much of the data processed by those calls is the same between each call. It would also save the not-inconsequential overhead of each round-trip call too. I have often achieved 2-4 orders of magnitude performance improvement by refactoring such iterative code.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply