May 21, 2012 at 10:15 am
SQL Kiwi (5/20/2012)
Elizabeth.Block (5/17/2012)
I do have the actual execution plan for the monster query so I will start going through that.This is where I would start, comparing actual plans for a 2-second run with a 24-second run. Several people have invited you to share the plans, so I will just repeat that request together with an offer to help analyse it. The reason for the performance difference will likely be immediately apparent. Reading between the lines of your replies, you could also look at implementing benchmarking and waits and queues analysis, so you can anticipate problems and take corrective action before bad things happen. Brent Ozar has a good introduction to the topic here[/url].
Paul, thanks for your good ideas. You've given me a lot of topics to research on this. I greatly appreciate the help.
I found the cause of the problem I had last week. I rebuilt some of the large indexes but unfortunately I set it so those indexes weren't set to auto recompute statistics. So after 1 1/2 days, the query ran slow again because the statistics were out of date. I've fixed that issue and now that query is running fine again.
May 21, 2012 at 10:22 am
Jeff, thanks for your input and ideas. My main concern was to "First do no harm", so in using tools, I didn't implement something if I didn't understand it. Probably the best thing I did during this episode was to start a spreadsheet listing all the things I did so I could roll them back if performance got worse. I did actually refresh a test box with a 90G database to test the large query. It performed badly on the test box too, which was great so I knew it was a problem with the data. Then I rebuilt the indexes and the performance was back to normal. Right now I keep finding out what I don't know, which is why I've bought 2 performance tuning books and will be reading them this week.
Thanks all for your great suggestions! This is such a great forum
May 21, 2012 at 11:08 am
Elizabeth.Block (5/21/2012)
Probably the best thing I did during this episode was to start a spreadsheet listing all the things I did so I could roll them back if performance got worse.
I also hope some of the other rookies took this away with them from Elizabeth's experience. It's much better to change things in a controlled fashion and keep track of what you've changed. Rather than willy-nilly making changes until you've completely lost track of what's what. In this kind of situation, that can happen very quickly.
Glad to hear everything worked out Elizabeth.
May 21, 2012 at 11:16 am
Thanks, Scott! I've learned a lot during this incident and also learned how much I don't know...
May 21, 2012 at 11:37 am
Elizabeth.Block (5/21/2012)
Jeff, thanks for your input and ideas. My main concern was to "First do no harm", so in using tools, I didn't implement something if I didn't understand it. Probably the best thing I did during this episode was to start a spreadsheet listing all the things I did so I could roll them back if performance got worse. I did actually refresh a test box with a 90G database to test the large query. It performed badly on the test box too, which was great so I knew it was a problem with the data. Then I rebuilt the indexes and the performance was back to normal. Right now I keep finding out what I don't know, which is why I've bought 2 performance tuning books and will be reading them this week.Thanks all for your great suggestions! This is such a great forum
Im sure you must have known this, usually whenever i work on a performance tuning task , before each execution i do clean the cache and buffers to evaluate the changes.
May 21, 2012 at 11:40 am
Im sure you must have known this, usually whenever i work on a performance tuning task , before each execution i do clean the cache and buffers to evaluate the changes.[/quote]
Wow, in the heat of the moment I didn't do that. Thanks.
May 21, 2012 at 12:12 pm
Elizabeth.Block (5/21/2012)
I've fixed that issue and now that query is running fine again.
Regarding the execution plan you posted (which I am going to assume is an example of slow running):
As you probably guessed, the problematic part of the plan is this:
Reverse-engineering that from the plan properties, the logical query being executed there is:
SELECT
dap.monetary_liab_id,
SUM(dap.amount)
FROM Ascend.dbo.financial_trans AS dft
JOIN Ascend.dbo.acct_posting AS dap
ON dap.finan_trans_id = dft.id
WHERE
dft.type_cd = 638
AND dft.ft_status = 'Posted'
AND dap.cred_deb_ind = 'C'
GROUP BY
dap.monetary_liab_id;
As you can see, the optimizer has decided to un-nest the correlated subquery you highlighted. So, instead of executing the subquery once for each new value of [monetary_liab_id] from the [monetary_liab] table, the optimizer has deciding to pre-compute the SUM for *every* [monetary_liab_id] and then use a right outer hash match join to match on the [monetary_liab_id] values obtained from the rest of the query.
To see why the optimizer decided to do this, look at the estimated and actual row counts on the lower input to the outer hash join:
It expected 6141 rows, though only 6 were encountered at run time, assessing that the pre-compute strategy would be cheaper than executing the correlated subquery 6141 times. If it had know there were only 6 [monetary_liab_id] values (at most), it might well have chosen a nested loops strategy (which, from the look of the indexes, would be optimal here).
The other thing I have highlighted on the diagram above is that the Hash Match Aggregate performing the global SUM aggregate estimated 68,100 groups - but 2.7 million were encountered. This most likely resulted in the hash table recursively spilling to physical tempdb (hash warnings in Profiler). No matter how much free memory your server has, the memory granted to build the hash table is fixed based on the estimates and cannot grow dynamically at run time.
Anyway, it appears that the more important cardinality estimation error (6141 vs 6) originates from a poor estimate on the [acct_posting] table's [IX_acct_posting_finan_trans_id_monetary_liab_id] index.
That said, my recommendation for plans like these is to help the optimizer by giving it less work to do, and less scope for errors. The query is relatively large (in terms of the number of joins), so I would break it up. Persist the 6 row result of the whole query (excluding the correlated SUM code) and then write the correlated query or join against that. Using a temporary table (not variable) will give the optimizer an accurate row count (and statistics) to work with and greatly reduce the complexity of the plan.
You might also like to add an OPTION (RECOMPILE) hint to the temporary-table-to-financial-trans-and-acct-posting query. You would need SQL Server 2008 or later to benefit from everything OPTION (RECOMPILE) can do for you, but the overhead of recompiling that trivial query will be very small, and may give you more plan reliability over the long term.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
May 21, 2012 at 4:56 pm
Paul, thanks very much for the good ideas, I'll have to look into those. Two questions - could the differences between the estimated and actual number of rows be due to bad statistics? I know the statistics at one point were very much out of date. Would the Option recompile statement work on SQL 2005? This is a vendor app on a SQL 2005 server and the database is in SQL 2000 compatability.
May 21, 2012 at 5:39 pm
Elizabeth.Block (5/21/2012)
Two questions - could the differences between the estimated and actual number of rows be due to bad statistics? I know the statistics at one point were very much out of date.
Yes, I was guessing that [IX_acct_posting_finan_trans_id_monetary_liab_id] was an index you rebuilt to get the better plan. But yes, in general, poor estimates *can* be caused by out-of-date statistics. Not always, though: estimates are exactly that, so the bigger a plan gets, the more chance there is for small errors to grow. A simpler plan generally has a better chance of being more accurately assessed by the optimizer.
Would the Option recompile statement work on SQL 2005? This is a vendor app on a SQL 2005 server and the database is in SQL 2000 compatability.
Yes, OPTION (RECOMPILE) works on a SQL Server 2005 database in 80 compatibility mode.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
May 22, 2012 at 6:17 am
As always you have a great breakdown of what is going on in the plan Paul! To the OP: Paul makes a great point about helping the optimizer by breaking these monster-join queries down into one or more temp-table-based intermediate sets. I will reiterate what he said too - temp table, not table variable.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
May 22, 2012 at 6:21 am
TheSQLGuru (5/22/2012)
As always you have a great breakdown of what is going on in the plan Paul! To the OP: Paul makes a great point about helping the optimizer by breaking these monster-join queries down into one or more temp-table-based intermediate sets. I will reiterate what he said too - temp table, not table variable.
If nothing else, I hope it shows why hiring a good consultant & trainer for a short period can deliver much greater benefit than a potentially much greater $ investment in whatever automated tool might be the flavour of the day š
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
May 22, 2012 at 8:54 am
SQL Kiwi (5/22/2012)
TheSQLGuru (5/22/2012)
As always you have a great breakdown of what is going on in the plan Paul! To the OP: Paul makes a great point about helping the optimizer by breaking these monster-join queries down into one or more temp-table-based intermediate sets. I will reiterate what he said too - temp table, not table variable.If nothing else, I hope it shows why hiring a good consultant & trainer for a short period can deliver much greater benefit than a potentially much greater $ investment in whatever automated tool might be the flavour of the day š
Yes, I agree. I was really struggling with the information all my tools were giving me. I ended up not taking much of the recommendations because I didn't want to fix one query and break 5 other queries. It really pointed out how much I need to learn about performance tuning. Last week I bought 2 books - SQL Server 2008 Query Performance Tuning Distilled and Performance Tuning with SS DMVs. I was able to get the 2nd book on Kindle so I've already started that one. I'm going to be busy at nights reading!
Thanks Paul for your detailed analysis of the query, it helped a lot. Thanks Kevin for your input too!
Now off to put out more fires...
May 22, 2012 at 1:28 pm
SQL Kiwi (5/22/2012)
TheSQLGuru (5/22/2012)
As always you have a great breakdown of what is going on in the plan Paul! To the OP: Paul makes a great point about helping the optimizer by breaking these monster-join queries down into one or more temp-table-based intermediate sets. I will reiterate what he said too - temp table, not table variable.If nothing else, I hope it shows why hiring a good consultant & trainer for a short period can deliver much greater benefit than a potentially much greater $ investment in whatever automated tool might be the flavour of the day š
So VERY true. I have been brought in quite a number of times to fix problems identified by tools that the client couldn't figure out how to address. I have also been brought in to clean up messes CREATED by previous consultants!! š
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
May 23, 2012 at 4:59 am
Hi Elizabeth, Kevin, Paul, Jeff and the rest š
IMHO, Automated tools and consultants manual work are 100% complementary.
While a good consultant will probably outperform any tool when dealing with a highly complex, very specific or just a small number of processes, a good tool will outperform any consultant when dealing with a large scale workload that needs to be optimized.
How long will it take you to manually tune 2,000 stored procedures?
Yes - your CURRENT problem may not require tuning 2,000 SP. perhaps tuning 3 will solve the current bottleneck and make your users happy.
But what about the fourth, fifth, sixth ones which you didn't get to optimize? They are tomorrow's crisis..
Another advantage of some tools is their ability to benchmark your changes across the entire workload, something that is nearly impossible to do manually.
Elizabeth said earlier 'I got a lot of ideas for indexes but didn't put many in because I wanted to make sure I didn't make performance worse."
She is right. When performing manual tuning, can you guarantee that the brilliant index you came up with won't cause performance disasters elsewhere in the workload?
See a classic case I witnessed first hand in this short article about a seemingly harmless index.
I've been in the consulting business for 15 years and Iām well aware of the limitation of automatic tools. However, I'm also aware of the limitations of the human capabilities.
You will achieve top results only when combining the best of both worlds - your human wisdom with the scale and power of automated tools.
15 years ago, chess grand-masters laughed at "automatic" chess programs and beat them hands down every time. They said "Computers will never be able to beat a highly skilled human at chess".
Let's see what will be the reply of Anand or Gelfand (the current chess world championship contenders) if you ask them today to bet a million dollars playing against one of the top chess programs. They are using these programs extensively today as reference mentors for practice to improve their game...
Just my 2 cents.
š
Ami Levin
CTO, DBSophic
May 23, 2012 at 6:38 am
You should know about good automated tools Ami, given what DBSophic's Optimizer is capable of!! Like I have said before - it is like having the vast majority of my 40,000 man hours of SQL Server relational engine experience distilled into a software package!! š
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Viewing 15 posts - 31 through 45 (of 48 total)
You must be logged in to reply to this topic. Login to reply