December 10, 2009 at 4:52 pm
Hello all.
Screenshot reference (also attached): http://picasaweb.google.com/EnrightMcC/SQLStuff#5413752936151516514
At the above link are screen shots from running a stored proc and native code with the same parameters. I’m running the code directly (right), and on the left I’m running the stored proc. The proc took between 7 and 10 minutes to complete. The direct code method regularly takes about two minutes.
I'm trying to focus in on one piece of the problem hoping that it's indicative of the overall performance problem I'm experiencing.
The ‘tooltip’ from the execution plans from a table (you can’t see it, but it’s called tbldAdj), specifically the Actual Number of Rows compared to the Estimated Number of Rows.
The actual # of rows in the left screen shot is a SEVEN FOLD difference from the estimated rows. Also note that the query optimizer chose to first join tbldAdj with tbldEncntr.
Now look at the tool tip on the right. This is the query that I ran outside of the proc. The Actual versus the Estimated number of rows: 3680975 versus 3877730, or about 5% off of being the correct estimation, as opposed the estimate that was only 13% of the actual number of rows returned. Also note that the optimizer (based on that estimate) chose to first join tbllAdjCd instead of tbldEncntr.
Subsequently look at the Hash Join that connects the two tables. 13% in the ‘bad’ query, and only 2% in the ‘good’ query. The bad estimate clearly causes problems downstream as it takes longer to sort through the extra rows.
So... What opinions can you guys share? What should I go to next?
Based upon what I've seen and read, everything seems to point to bad statistics. I'm currently updating the statistics on tbldAdj (the table I'm referencing above). However it's been running for about 8 hours, and hasn't finished yet. I suspect I'll need to update statistics for (at a minimum) the other two tables in the join in that part of the query plan. I'm just worried about how long it will take.
Should I just be patient with the update statistics that's currently running?
Related info:
* The huge variance in Estimate versus Actual row counts means bad statistics right? (except if its the inner side of a loop join)
* I just upgraded from SQL 2000 to SQL 2008 last month.
* I ran monthly maintenance a couple of weeks ago. Indexes rebuilt.
* This is a very large database (over 1 TB)
* tbldAdj has over 766 million records.
Let me know what other information I can provide.
December 10, 2009 at 6:14 pm
Sounds like both bad statistics and parameter sniffing. One could potentially lead to the other. Once stats are updated you can work on the parameter sniffing by using a query hint or taking the parameters and assigning them to local variables in the procedure.
Subsequently look at the Hash Join that connects the two tables. 13% in the ‘bad’ query, and only 2% in the ‘good’ query. The bad estimate clearly causes problems downstream as it takes longer to sort through the extra rows.
This isn't exactly accurate because the join is to different tables. Yes the optimizer has chosen a different join order because of bad stats, but there are no extra rows in the results.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
December 10, 2009 at 11:29 pm
Jack,
Thanks for the quick response. The UPDATE STATISTICS are completed on the one table. Based on row count, I estimate the next table to take about half as long.
In response to your comment about parameter sniffing, and some more information for you (and anyone else that cares to share their expertise): The first thing we do with all of the parameters in the proc is assign them to local variables inside the proc ala...
PROCEDURE [dbo].[procedureNameHere]
@_ActPer CHAR(6) ,
@_ClientName VARCHAR(50)
AS DECLARE
@ActPer CHAR(6) ,
@ClientName VARCHAR(50)
SET @ActPer = @_ActPer
SET @ClientName = @_ClientName As we uncovered some parameter sniffing issues a few years ago in this proc.
December 10, 2009 at 11:31 pm
Can you please post the execution plan itself, rather than just a screenshot? There's a lot of info that a single screenshot can't show.
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 10, 2009 at 11:34 pm
Parameter sniffing - http://sqlinthewild.co.za/index.php/2007/11/27/parameter-sniffing/
(it's a 3-part blog series)
By using variables, you've got a different form of parameter sniffing, more accurately, a lack thereof. The optimiser can't see the value of variables, so it has to guess as to cardinality.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 11, 2009 at 5:08 am
Full SQL Plan uploaded.
Thanks.
Bob McC
December 11, 2009 at 5:12 am
GilaMonster (12/10/2009)
Parameter sniffing - http://sqlinthewild.co.za/index.php/2007/11/27/parameter-sniffing/(it's a 3-part blog series)
By using variables, you've got a different form of parameter sniffing, more accurately, a lack thereof. The optimiser can't see the value of variables, so it has to guess as to cardinality.
Woah. Most resources say to assign parameters to local variables to avoid parameter sniffing. Now doing that is wrong too? This is turning into a "which came first... the chicken or the egg" problem. All I know is this query takes 2 minutes on SQL 2000, and now it's coming in around 7-10 minutes. Not just this query, but others as well (which I'm hoping are ultimately the same problem). Ahhh!
Bob McC
December 11, 2009 at 5:30 am
BobMcC (12/11/2009)
Woah. Most resources say to assign parameters to local variables to avoid parameter sniffing. Now doing that is wrong too?
😀 You asked for it....
It depends.
Seriously, sometimes using variables instead of parameters is the best solution. Sometimes using parameters and perhaps a query hint is the best solution. It's not that one option is 'right' and the other is 'wrong'. It's all about the tradeoffs, there's no absolute best answer for all circumstances.
Did you read the 3 blog posts. Did you understand the discussion?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 11, 2009 at 7:31 am
Ahh... the "it depends." One of my favorite answers with "well.... yes.... and no" being a close second. 🙂
I've read the first two Blogs, will get to the 3rd one momentarily. I understand it all just fine... so far.
December 11, 2009 at 8:06 am
Have you updated your statistics ?
The actual row counts are vastly difference from the estimated.
December 11, 2009 at 8:53 am
Estimated might differ from actual if variables are in use rather than parameters. Because the optimiser can't sniff the value of variables, it has to guess as to cardinality. The guess is just that and can be quite inaccurate. The advantage of using variables is that the guesses are consistent, rather than varying based on the parameter value of the first compile.
Bob, would you mind zipping that plan and uploading again? I'd like to take a look, but have limited bandwidth here.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 11, 2009 at 8:59 am
Done.
December 11, 2009 at 9:07 am
Can you post all of the code to ?
December 11, 2009 at 10:36 am
Here's the code.
December 11, 2009 at 11:24 am
Ive left for the day now but will try to get a chance to look at this....
Viewing 15 posts - 1 through 15 (of 22 total)
You must be logged in to reply to this topic. Login to reply