December 17, 2009 at 12:36 pm
We have a client with a long running query (on the order of 10 hours). The query JOIN's a bunch of tables as well as two views and I'm pretty sure that the culprit is the view referenced here; consequently, I tried a couple of simple queries just against the view. The results are unusual.
select *
FROM PM.vw_LastSelfPayPaymentActivity --where account_id = 107246
The query above returns roughly 100,000 rows in about six seconds. These are the IO Statistics:
(101816 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Vouchers'. Scan count 2, logical reads 4648, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Payments'. Scan count 1, logical reads 778744, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Service_Payments'. Scan count 1, logical reads 34133, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
If I remove the comment and query the view with the WHERE clause it takes about 32 seconds to return one row.
Here are the IO Statistics.
(1 row(s) affected)
Table 'Payments'. Scan count 4850299, logical reads 15925208, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Vouchers'. Scan count 6, logical reads 10710, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Service_Payments'. Scan count 5, logical reads 27, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
The query plan for the slow one says that it is performing a clustered index seek where the cost is 100%. The clustered index on the Payments table is actually on VoucherID not PaymentID although PaymentID is the Primary Key. The cost and the duration of the query seems pretty disproportionate for a clustered index seek.
I ran the same query on my test machine and the
I'd attach the query plans but I'm not quite sure how to do it on here. Should I use the img tags?
Anyone have any thoughts about what could cause this poor performance?
"Beliefs" get in the way of learning.
December 17, 2009 at 12:50 pm
Please post query, view definition, table definitions, index definitions and execution plan, as per http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 17, 2009 at 12:52 pm
Robert Frasca (12/17/2009)
select *
FROM PM.vw_LastSelfPayPaymentActivity --where account_id = 107246
...
The query plan for the slow one says that it is performing a clustered index seek where the cost is 100%. The clustered index on the Payments table is actually on VoucherID not PaymentID although PaymentID is the Primary Key. The cost and the duration of the query seems pretty disproportionate for a clustered index seek.
I ran the same query on my test machine and the
I'd attach the query plans but I'm not quite sure how to do it on here. Should I use the img tags?
Anyone have any thoughts about what could cause this poor performance?
We will need to see the execution plans and view and table definitions. Please also include index frag level.
edit...Just posted and saw that Gail beat me to the punch...
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
December 17, 2009 at 12:56 pm
One thing first. Run this and see if there's any change in the exec plan or IO stats.
UPDATE STATISTICS Payments WITH FULLSCAN
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 17, 2009 at 12:58 pm
From your description, it looks like there is no Index on the account_id column, can you make sure that there exists an Index on that column, if not create an Index and try again.
Execution plan would be more useful.
Of course, I might be missing something...
Blog -- LearnSQLWithBru
Join on Facebook Page Facebook.comLearnSQLWithBru
Twitter -- BruMedishetty
December 17, 2009 at 2:19 pm
Thanks for the link to the guidelines. That was helpful.
I've already posted the query. I've attached a zip file with the three table definitions and the view definition as well as the query plans.
Indexes are rebuilt nightly, and the clustered index referred to in the query plans is 4% fragmented.
There are 6.2 million service payments, 3.8 million payments and 1.05 million vouchers. The data file is on direct attached storage but it is implemented with just one file and filegroup. TempDB is 10 GB but has apparently been extending in 10% increments from an 8 MB beginning.
There are a LOT of foreign key constraints, indexes etc. Obviously, you'll need to get rid of any FK constraints that refer to tables that aren't included.
I'm pretty new here and this is a mature OLTP product with thousands of implementations in the field so "it is what it is". I'm trying to fix the most glaring issues but once in awhile, an obscure problem like this pops up. The long term solution is to probably do something like replication to get copies of the hard hit tables out to an ODS and then run queries off of that more heavily indexed ODS but that's going to take awhile to implement. The biggest problem is that this model just isn't scalable so I'm going to need to get creative.
"Beliefs" get in the way of learning.
December 18, 2009 at 8:36 am
The statistics on the Voucher table look like they might be out of date. Estimated row count = 149 and actual = 4850298. Same thing with the loop join. It's pulling more than 3 million rows with an estimated 71. That's not always a statistics problem, but it's a leading indicator.
I'm curious what the selectivity of the index on the Voucher table is, because that scan, combined with the 4850298 executions of the clustered index seek are pretty much destroying performance.
In the view query itself, I'd suggest changing this:
...IN (SELECT MAX(Payment_ID)...
to
...= (SELECT MAX(Payment_ID)...
The IN statement is a poor choice since the select will only ever return a single value. But, you might want to try changing that whole sub-select. Instead of an aggregate function, see if you can do a SELECT TOP 1 Payment_ID with an ORDER BY clause on v.Account_ID. I've seen that methodology work a lot better than aggregations.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
December 18, 2009 at 1:39 pm
Grant,
Thanks for stimulating the brain cells. In short, the subquery that you pointed out was the problem and the view just wasn't going to get it done. I put that subquery into a function. In the query that used the view I replaced the view reference with the following:
LEFT JOIN (SELECT Account_ID,
PMT.Payment_ID,
PMT.Date_Paid AS Last_Payment_Date,
SUM(SVCPMT.Amount) AS Last_Payment_Amount
FROM PM.Vouchers VOUCH WITH(NOLOCK)
INNER JOIN PM.Payments PMT WITH(NOLOCK) ON PMT.Voucher_ID = VOUCH.Voucher_ID
INNER JOIN PM.Service_Payments SVCPMT WITH(NOLOCK) ON SVCPMT.Payment_ID = PMT.Payment_ID
GROUP BY VOUCH.Account_ID,
PMT.Payment_ID,
PMT.Date_Paid
) LSTSELFPMT ON LSTSELFPMT.Payment_ID = PM.fnLastSelfPaymentIDByAccountID(ACCT.Account_ID)
This allowed us to pass in the Account ID for the sub-query to use. (I hope this makes sense.) Here's the new view definition:
CREATE FUNCTION PM.fnLastSelfPaymentIDByAccountID(@Account_ID INT) RETURNS INT
AS
BEGIN
DECLARE @PaymentID AS INT
SELECT @PaymentID = MAX(Payment_ID)
FROM PM.Payments p WITH(NOLOCK) INNER JOIN PM.Vouchers v WITH(NOLOCK) ON v.Voucher_ID = p.Voucher_ID
WHERE v.Account_ID = @Account_ID
AND Remitting_Carrier_ID IS NULL
AND p.Update_Status BETWEEN 1 AND 2
AND Transaction_Type = 'P'
GROUP BY v.Account_ID
RETURN @PaymentID
END
The original query went from 10 hours, 40 some odd minutes to under 10 seconds.
Smokin...
It's ironic that you stepped up to give me a hand. I was just singing the praises of "Dissecting SQL Server Execution Plans" and "SQL Server 2008 Query Performance Tuning Distilled" on the thread associated with Jeff Moden's review when this issue popped up.
We've actually met before at January's code camp in Waltham, MA for the New England SQL Server users group. I think you know a few of my former colleagues including, of course, Adam Machanic, but Michael Ruminer as well?
Anyway, thanks for your help and I hope our paths cross again.
"Beliefs" get in the way of learning.
December 19, 2009 at 1:38 am
Can you post the revised exec plan? I want to see if there are indexing opportunities.
Just a word of warning. Scalar functions like the one that you have there are not inline in SQL and will be executed once for each row in the query. In other words, cursor in disguise.
Second aside. Watch the nolock usage unless you're happy with the possibility of incorrect data. See - http://sqlblog.com/blogs/andrew_kelly/archive/2009/04/10/how-dirty-are-your-reads.aspx
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 20, 2009 at 4:43 am
I'm glad it worked. Thanks for the praise on the books. I'd follow Gail's suggestions if I were you.
See you at the next Data Camp on January 30th?
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
December 21, 2009 at 9:16 am
In cases where you can get widely disparate numbers of rows back, which leads to ugly index seek/nested loop plans on kajillions of rows, I have found that the best solutions are usually either dynamic sql (which gets every plan hard-coded values so it can get precises estimates for each input) or disassembling the complex query to have one or more intermediary temporary tables (NOT table variables!!), which gets you the same thing.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
December 21, 2009 at 9:52 am
Yes, I'm planning on attending the data camp and I hope to see you there.
I am familiar with the issue that Gail raises about "cursors in disguise". It's a problem here. My experience is that this can occur for functions referenced in the column references of the SELECT and in the WHERE clause but that it can be effective with JOIN's in the FROM clause. That being said, I will generate a new query plan and perhaps a trace to see how many times the function fires. As always, I'm happy to be proven wrong if I can learn something from it.
"Beliefs" get in the way of learning.
December 21, 2009 at 10:02 am
Robert Frasca (12/21/2009)
Yes, I'm planning on attending the data camp and I hope to see you there.I am familiar with the issue that Gail raises about "cursors in disguise". It's a problem here. My experience is that this can occur for functions referenced in the column references of the SELECT and in the WHERE clause but that it can be effective with JOIN's in the FROM clause. That being said, I will generate a new query plan and perhaps a trace to see how many times the function fires. As always, I'm happy to be proven wrong if I can learn something from it.
Sometimes something that is 'bad' can be helpful, as seems to be the case here. 🙂 The function is likely 'helping' the optimizer choose the 'optimal' plan in some way.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
December 21, 2009 at 10:27 am
Can you post the revised execution plan (after Grant's improvments)? I'm interested in seeing if there are any opportunities for improving the indexing.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 21, 2009 at 3:00 pm
When I posted the original problem I said that the problem I identified was part of a much larger query but that I had isolated it to a specific view. I have attached the original query plan and the new query plan. I'm sure there are other opportunities for tuning but I should point out that this sql is generated dynamically.
"Beliefs" get in the way of learning.
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply