October 13, 2016 at 2:17 am
Hey guys
Just looking if anyone else can advise on an improvements (if any can be made to a query I'm running)
Background: The original query (which took between 20- 25 seconds) was using a scalar function in the main query that was slowing things down. I created a temp table as a lookup which is populated with the return values for each record that the function would have returned. I now join this temp table to the main select.
Although this has improved performance, now down to 7-10 seconds to complete the whole query, I still think it can be improved.
It's in the initial population of this lookup table where the main slowness occurs.
One of the selects I'm using to populate the lookup table was using an inner join to two tables. The latter inner join was causing a huge difference in the Actual vs Estimated number of rows (estimate was 3k, actual was nearly 700k). The query didn't need any fields from this join in it's return values so I replaced this inner join with an EXISTS in the WHERE criteria.
This has improved performance but now the exec plan says 8k for Actual and 1 row for Estimated. I know this is mainly down to the use of the <= in the where criteria of the subquery but I'm wondering if anyone else has any other ideas on how to get the estimate closer to the actual
Query:
SELECT a.col1
, a.col2
, a.col3
, a.col4
, MAX(d.col6) AS col5
FROM #Temp1 a
INNER JOIN table2 d WITH (NOLOCK)
ONa.col2 = d.col2
ANDa.col4= d.col4
WHEREa.col3 = -1
ANDa.col5 IS NULL
ANDd.col3 IN ('A', 'P')
ANDd.col6 < @now
AND EXISTS
(
SELECT 1 FROM table3 p
WHERE p.col1 = d.col3
AND p.col2 <= d.col2 --this is where I think the issue is
AND p.col3 = a.col1
)
GROUP BY a.col1, a.col2, a.col3, a.col4
October 13, 2016 at 2:23 am
Perhaps the statistics on your tables are out of date. Try updating them with FULLSCAN.
John
October 13, 2016 at 2:33 am
Yeah tried that along with rebuilding and reorganzing the indexes.
The index used does cover the query and performs a seek not a scan
October 13, 2016 at 3:15 am
Can you post the actual execution plan for the query?
π
October 13, 2016 at 3:43 am
Exec plan attached
I'm wondering if this might be combination of using the <= operator in the inner join along with using a temp table which might be throwing the stats a bit.
this query is the last of 4 update statements that runs against the temp table
If I run the query without this particular update it takes 2-3 seconds
October 13, 2016 at 4:00 am
mitzyturbo (10/13/2016)
Exec plan attachedI'm wondering if this might be combination of using the <= operator in the inner join along with using a temp table which might be throwing the stats a bit.
this query is the last of 4 update statements that runs against the temp table
If I run the query without this particular update it takes 2-3 seconds
First things to fix
π
<Warnings>
<PlanAffectingConvert ConvertIssue="Cardinality Estimate" Expression="CONVERT_IMPLICIT(date,[d].[NAVdate],0)" />
<PlanAffectingConvert ConvertIssue="Seek Plan" Expression="[a].[NAVDate]=CONVERT_IMPLICIT(date,[d].[NAVdate],0)" />
<PlanAffectingConvert ConvertIssue="Seek Plan" Expression="CONVERT_IMPLICIT(date,[d].[NAVdate],0)=[t].[NAVDate]" />
</Warnings>
October 13, 2016 at 4:59 am
So the original DB owners, in their infinite wisdom, are using varchar(10)'s to store DATE values
For testing purposes, I just changed the date field on my temp table to a varchar(10) as well, the warnings have gone but the performance is still the same. The actual and estimate no. of rows for the seek is the same.
However, the cost relative to the batch for the index seek has now reduced to 40%.
I spotted a Hash Match, with a cost of 10% relative to the batch and it's actual vs estimated no. of rows is way off: Actual 412497, Estimate 2034.45
October 13, 2016 at 5:24 am
-- Your query matches portfolio.clientID to deadline.clientID
-- where does #Temp1.clientID fit into this?
-- It might be possible to do something like this
UPDATE a
SET Ret = x.Ret
FROM #Temp1 a
CROSS APPLY (
SELECT MAX(deadline) AS Ret
FROM deadline d
WHERE a.NAVDate = d.NavDate
AND a.qid = d.qid
AND d.statusCode IN ('A', 'P')
AND d.deadline < @now
AND EXISTS
(
SELECT 1
FROM portfolio p
WHERE p.clientID = d.clientID
AND p.portfolioDate <= a.navDate
AND p.investmentID = a.InvestmentID
)
) x
WHERE a.ClientID = -1
AND a.Ret IS NULL
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 13, 2016 at 7:16 am
The Temp table is a lookup table for each record that I'm applying the update for instead of using a function call (which returns the Ret value) within the main select. In the main select I join back to the temp table to retrieve this value.
The cross apply won't make any difference as it's using the same inner join that's causing the slow performance.
October 13, 2016 at 7:36 am
You've got two sort operations (including one with a spill to tempdb) in support of a merge join. Despite the estimated costs, I think this is where more of the slow behavior of the query comes from. The optimizer is timing out on this query. I suspect there just isn't enough to go on for it to arrive at a good plan. You might try putting an index on the temp table after you build it. Preferable would be a unique index if the data supports it. NavDate & QID seem like good candidates. Also, just to help the optimizer do it's work while you're tuning this, lose the NOLOCK hint.
Since you're only performing a MAX on the aggregations, you might try a TOP 1 with an ORDER BY instead. That's a much better approach for most queries than aggregations.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
October 13, 2016 at 8:15 am
That sounds about right to me Grant, the join on this subquery are on some very non specific values - NAVDate and QID. This sub query is basically a catch all for the records on the table that haven't already been updated and don't have a clientid. The other subqueries run mostly the same syntax but use the clientid in the join which speeds things up.
At this stage I've spent over a day reducing a query from 20-25 seconds to 10, not perfect, but I think it will do.
On a separate note, pardon my ignorance here and maybe I've jsut been looking at the screen too long :w00t: but how would you join on a subquery with a Top 1 for each record? The other subqueries use MIN's so I would just swap the order by around to get that, but won't TOP 1 always return only 1 record? Are you using ROW_NUMBER and filtering where the row_number = 1?
October 13, 2016 at 8:26 am
Big assumption on my part, but aren't you going for something like the situation in this article[/url]? There are examples there of exactly what I mean.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
October 13, 2016 at 8:37 am
I just looked at your very first query in that article and saw the first inner join and shouted a four letter word in my head π Sometimes you can just drink too much coffee
On a much more successful note, I finally copped that the WHERE clause on the subquery was using the clientid on the wrong side (on the #Temp table) A simple change of the a. to a d. removed that heavy load on the Hash Match. Query is running in under 2 seconds now
I had assumed in the original code in the function which set NULL values for clientid's to -1 wasn't their actual value on the table but it is!!
Thanks for your help on this one
October 13, 2016 at 9:07 am
-- Most of the cost of the query is reading the
-- portfolio table - with the aggregate performed AFTER this expensive bunch of seeks.
-- This version of the query might well aggregate BEFORE the join to portfolio.
-- If it doesn't, then I'd explore other means of achieving this.
-- Note that reading the portfolio table is estimated at 40%,
-- is estimated to be 2034 executions (seeks) but is actually 412419 executions.
-- That's the number of rows coming off the product of #temp and deadline, hence the urge to
-- aggregate first.
UPDATE a
SET Ret = x.Ret
FROM #Temp1 a
CROSS APPLY (
SELECT TOP (1)
d.deadline AS Ret,
d.clientID
FROM deadline d
WHERE a.NAVDate = d.NavDate
AND a.qid = d.qid
AND d.statusCode IN ('A', 'P')
AND d.deadline < @now
ORDER BY deadline DESC
) x
WHERE a.ClientID = -1
AND a.Ret IS NULL
AND EXISTS
(
SELECT 1
FROM portfolio p
WHERE p.clientID = d.clientID
AND p.portfolioDate <= d.navDate
AND p.investmentID = a.InvestmentID
)
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 13, 2016 at 10:18 am
Thanks for that Chris, CROSS APPLY is a lot neater in this case alright
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply