Query Performance

Question

Post reply

Query Performance

mitzyturbo

Hall of Fame

Points: 3452
More actions
October 13, 2016 at 2:17 am

#322103

Hey guys
Just looking if anyone else can advise on an improvements (if any can be made to a query I'm running)
Background: The original query (which took between 20- 25 seconds) was using a scalar function in the main query that was slowing things down. I created a temp table as a lookup which is populated with the return values for each record that the function would have returned. I now join this temp table to the main select.
Although this has improved performance, now down to 7-10 seconds to complete the whole query, I still think it can be improved.
It's in the initial population of this lookup table where the main slowness occurs.
One of the selects I'm using to populate the lookup table was using an inner join to two tables. The latter inner join was causing a huge difference in the Actual vs Estimated number of rows (estimate was 3k, actual was nearly 700k). The query didn't need any fields from this join in it's return values so I replaced this inner join with an EXISTS in the WHERE criteria.
This has improved performance but now the exec plan says 8k for Actual and 1 row for Estimated. I know this is mainly down to the use of the <= in the where criteria of the subquery but I'm wondering if anyone else has any other ideas on how to get the estimate closer to the actual
Query:
SELECT a.col1
, a.col2
, a.col3
, a.col4
, MAX(d.col6) AS col5
FROM #Temp1 a
INNER JOIN table2 d WITH (NOLOCK)
ONa.col2 = d.col2
ANDa.col4= d.col4
WHEREa.col3 = -1
ANDa.col5 IS NULL
ANDd.col3 IN ('A', 'P')
ANDd.col6 < @now
AND EXISTS
(
SELECT 1 FROM table3 p
WHERE p.col1 = d.col3
AND p.col2 <= d.col2 --this is where I think the issue is
AND p.col3 = a.col1
)
GROUP BY a.col1, a.col2, a.col3, a.col4

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 1

Perhaps the statistics on your tables are out of date. Try updating them with FULLSCAN.

John

mitzyturbo Hall of Fame Points: 3452 More actions · Answer 2

Yeah tried that along with rebuilding and reorganzing the indexes.

The index used does cover the query and performs a seek not a scan

Eirikur Eiriksson SSC Guru Points: 182916 More actions · Answer 3

Can you post the actual execution plan for the query?

😎

mitzyturbo Hall of Fame Points: 3452 More actions · Answer 4

Exec plan attached

I'm wondering if this might be combination of using the <= operator in the inner join along with using a temp table which might be throwing the stats a bit.

this query is the last of 4 update statements that runs against the temp table

If I run the query without this particular update it takes 2-3 seconds

Eirikur Eiriksson SSC Guru Points: 182916 More actions · Answer 5

mitzyturbo (10/13/2016)
Exec plan attached
I'm wondering if this might be combination of using the <= operator in the inner join along with using a temp table which might be throwing the stats a bit.
this query is the last of 4 update statements that runs against the temp table
If I run the query without this particular update it takes 2-3 seconds

First things to fix

😎

<Warnings>

<PlanAffectingConvert ConvertIssue="Cardinality Estimate" Expression="CONVERT_IMPLICIT(date,[d].[NAVdate],0)" />

<PlanAffectingConvert ConvertIssue="Seek Plan" Expression="[a].[NAVDate]=CONVERT_IMPLICIT(date,[d].[NAVdate],0)" />

<PlanAffectingConvert ConvertIssue="Seek Plan" Expression="CONVERT_IMPLICIT(date,[d].[NAVdate],0)=[t].[NAVDate]" />

</Warnings>

mitzyturbo Hall of Fame Points: 3452 More actions · Answer 6

So the original DB owners, in their infinite wisdom, are using varchar(10)'s to store DATE values

For testing purposes, I just changed the date field on my temp table to a varchar(10) as well, the warnings have gone but the performance is still the same. The actual and estimate no. of rows for the seek is the same.

However, the cost relative to the batch for the index seek has now reduced to 40%.

I spotted a Hash Match, with a cost of 10% relative to the batch and it's actual vs estimated no. of rows is way off: Actual 412497, Estimate 2034.45

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 7

-- Your query matches portfolio.clientID to deadline.clientID

-- where does #Temp1.clientID fit into this?

-- It might be possible to do something like this

UPDATE a

SET Ret = x.Ret

FROM #Temp1 a

CROSS APPLY (

SELECT MAX(deadline) AS Ret

FROM deadline d

WHERE a.NAVDate = d.NavDate

AND a.qid = d.qid

AND d.statusCode IN ('A', 'P')

AND d.deadline < @now

AND EXISTS

(

SELECT 1

FROM portfolio p

WHERE p.clientID = d.clientID

AND p.portfolioDate <= a.navDate

AND p.investmentID = a.InvestmentID

)

) x

WHERE a.ClientID = -1

AND a.Ret IS NULL

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

mitzyturbo Hall of Fame Points: 3452 More actions · Answer 8

The Temp table is a lookup table for each record that I'm applying the update for instead of using a function call (which returns the Ret value) within the main select. In the main select I join back to the temp table to retrieve this value.

The cross apply won't make any difference as it's using the same inner join that's causing the slow performance.

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 9

You've got two sort operations (including one with a spill to tempdb) in support of a merge join. Despite the estimated costs, I think this is where more of the slow behavior of the query comes from. The optimizer is timing out on this query. I suspect there just isn't enough to go on for it to arrive at a good plan. You might try putting an index on the temp table after you build it. Preferable would be a unique index if the data supports it. NavDate & QID seem like good candidates. Also, just to help the optimizer do it's work while you're tuning this, lose the NOLOCK hint.

Since you're only performing a MAX on the aggregations, you might try a TOP 1 with an ORDER BY instead. That's a much better approach for most queries than aggregations.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

mitzyturbo Hall of Fame Points: 3452 More actions · Answer 10

That sounds about right to me Grant, the join on this subquery are on some very non specific values - NAVDate and QID. This sub query is basically a catch all for the records on the table that haven't already been updated and don't have a clientid. The other subqueries run mostly the same syntax but use the clientid in the join which speeds things up.

At this stage I've spent over a day reducing a query from 20-25 seconds to 10, not perfect, but I think it will do.

On a separate note, pardon my ignorance here and maybe I've jsut been looking at the screen too long :w00t: but how would you join on a subquery with a Top 1 for each record? The other subqueries use MIN's so I would just swap the order by around to get that, but won't TOP 1 always return only 1 record? Are you using ROW_NUMBER and filtering where the row_number = 1?

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 11

Big assumption on my part, but aren't you going for something like the situation in this article[/url]? There are examples there of exactly what I mean.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

mitzyturbo Hall of Fame Points: 3452 More actions · Answer 12

I just looked at your very first query in that article and saw the first inner join and shouted a four letter word in my head 🙂 Sometimes you can just drink too much coffee

On a much more successful note, I finally copped that the WHERE clause on the subquery was using the clientid on the wrong side (on the #Temp table) A simple change of the a. to a d. removed that heavy load on the Hash Match. Query is running in under 2 seconds now

I had assumed in the original code in the function which set NULL values for clientid's to -1 wasn't their actual value on the table but it is!!

Thanks for your help on this one

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 13

-- Most of the cost of the query is reading the

-- portfolio table - with the aggregate performed AFTER this expensive bunch of seeks.

-- This version of the query might well aggregate BEFORE the join to portfolio.

-- If it doesn't, then I'd explore other means of achieving this.

-- Note that reading the portfolio table is estimated at 40%,

-- is estimated to be 2034 executions (seeks) but is actually 412419 executions.

-- That's the number of rows coming off the product of #temp and deadline, hence the urge to

-- aggregate first.