Top N sub query workaround - Does it exist?

Question

Top N sub query workaround - Does it exist?

krypto69

SSChampion

Points: 13604
More actions
January 15, 2009 at 9:48 pm

#200122

I'm getting a major slowdown on a Top N sort.
Here's the part that so slow:
UPDATE #events SET userID =
(SELECT TOP 1 userID FROM #events e2
WHERE e2.coCRN = e.coCRN AND e2.deptCRN = e.deptCRN AND e2.county = e.county AND e2.orderNo = e.orderNo
ORDER BY e2.logID)
FROM #events e
I've been searching and searching looking for a different methoed to avoid this slow sort. I can't figure out a way to go within the subquery. This temp table has about 433k records, and I've tried all kinds of indexes...but still slow.
Any ideas?

Viewing 15 posts - 1 through 15 (of 32 total)

You must be logged in to reply to this topic. Login to reply

RBarryYoung SSC Guru Points: 143329 More actions · Answer 1

Can you show us the execution plan.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Ramesh Saive SSC-Insane Points: 24275 More actions · Answer 2

You are using correlated sub-queries, which is another form of RBAR..., to know more about RBAR and its performance implications see http://www.sqlservercentral.com/links/276514/84234.

Based on the query, what I understood is that you want to update the userID based on some criteria on the same table and taking only the first matching record.

Here is what I came up with....

UPDATEE1

SETE1.userID = E2.userID

FROM#events E1

INNER JOIN

(

SELECTROW_NUMBER() OVER( PARTITION BY coCRN, deptCRN, county, orderNo ORDER BY logID ) AS RowNumber,

coCRN, deptCRN, county, orderNo, userID

FROM#events

) E2 ON E1.coCRN = E2.coCRN AND E1.deptCRN = E2.deptCRN AND E1.county = E2.county AND E1.orderNo = E2.orderNo

WHEREE2.RowNumber = 1

Further, you can add index on the columns specified in where clause and specify the userID column in the INCLUDE clause.

--Ramesh

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

Ramesh (1/16/2009)
You are using correlated sub-queries, which is another form of RBAR...,

The optimiser can handle most correlated subqueries just fine. If the comparison with the outer query is an equality, the subquery won't be processed row by row.

If it's an inequality comparison, it's a whole 'nother story.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 4

This may be doable with a preaggregated derived table instead of a correlated subquery:

-- current query

SELECT e.coCRN, e.deptCRN, e.county, e.orderNo, MAXuserID = (SELECT TOP 1 userID FROM #events e2

WHERE e2.coCRN = e.coCRN AND e2.deptCRN = e.deptCRN AND e2.county = e.county AND e2.orderNo = e.orderNo

ORDER BY e2.logID)

FROM #events e

-- alternative query

SELECT e.coCRN, e.deptCRN, e.county, e.orderNo, MAXuserID

FROM #events e

INNER JOIN (SELECT MAX(userID) AS MAXuserID, coCRN, deptCRN, county, orderNo

FROM #events

GROUP BY coCRN, deptCRN, county, orderNo

) e2 ON e2.coCRN = e.coCRN

AND e2.deptCRN = e.deptCRN

AND e2.county = e.county

AND e2.orderNo = e.orderNo

Cheers

ChrisM

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Grant Fritchey SSC Guru Points: 398890 More actions · Answer 5

I wish it was available online, but if you can track down the final copy of the SQL Standard, I wrote an article comparing TOP, MAX and ROW_NUMBER as a mechanism for getting versioned data. Depending on the data sets involved, TOP works best on larger data sets and ROW_NUMBER works best on smaller data sets. But you need to test in your environment to be sure.

Generally, something along the lines of the following works best for larger data sets:

SELECT ...

FROM X

WHERE X.Val = (SELECT TOP(1) x2.Val

FROM X AS x2

WHERE X.id = x2.Id

ORDER BY x.Val DESC)

You have to have the right indexes in place of course because table/index scans will kill performance. You can also express it like a JOIN:

SELECT...

FROM X AS x

JOIN X as x2

ON x.ID = x2.ID

AND x2.Val = (SELECT TOP(1).... -- you get the picture

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Ramesh Saive SSC-Insane Points: 24275 More actions · Answer 6

GilaMonster (1/16/2009)
Ramesh (1/16/2009)
You are using correlated sub-queries, which is another form of RBAR...,
The optimiser can handle most correlated subqueries just fine. If the comparison with the outer query is an equality, the subquery won't be processed row by row.
If it's an inequality comparison, it's a whole 'nother story.

Thanks gail, for letting me know about the optimizer and correcting me. One thing, is it do the same in all environments? Or is it dependent on other factors?

--Ramesh

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 7

Ramesh (1/16/2009)
Thanks gail, for letting me know about the optimizer and correcting me. One thing, is it do the same in all environments? Or is it dependent on other factors?

What environments or factors are you thinking of?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Ramesh Saive SSC-Insane Points: 24275 More actions · Answer 8

I meant to ask you that does the optimizer always process the two tables/queries separately (i.e. not a row-by-row process as you've mentioned) and do a equality join, similar to an INNER JOIN, irrespective of the amount of data, sql 2005 edition, hardware etc.?

--Ramesh

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 9

They're not processed separately, that's the point. SQL can and does 'de-correlate' the subquery and create an exec plan that processes the inner and outer query together.

Take this example I did in a presentation a couple months back

USE AdventureWorks

GO

SET NOCOUNT ON

DECLARE @StartTime DATETIME

SET @StartTime = GETDATE()

SELECT productId, productnumber,

(SELECT SUM(LineTotal) SumTotal FROM Sales.SalesOrderDetail sd WHERE sd.productID = p.productid )

FROM Production.Product p

WHERE ProductNumber like 'bk%'

print 'Duration with correlated subqueries: ' + CAST(DATEDIFF(ms, @StartTime, getdate()) as varchar(10)) + ' ms'

SET @StartTime = GETDATE()

SELECT p.ProductID, ProductNumber, SumTotal

FROM Production.Product p

INNER JOIN (

SELECT productid, SUM(LineTotal) SumTotal

FROM Sales.SalesOrderDetail

GROUP BY ProductID

) SalesTotals on p.ProductID = SalesTotals.ProductID

WHERE ProductNumber like 'bk%'

print 'Duration with derived table: ' + CAST(DATEDIFF(ms, @StartTime, getdate()) as varchar(10)) + ' ms'

Near-identical execution plans and durations

(run it twice and take the second time, the first time the data and plan caches are cold)

As far as I know, it doesn't depend on edition, data, day of the year, etc

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Ramesh Saive SSC-Insane Points: 24275 More actions · Answer 10

Hey Gail, firstly thank you very much for taking time in explaining me the concept and providing me with a working example.

Secondly, you were right, it does provide the same plans and complete in approximately same durations.

--Ramesh

krypto69 SSChampion Points: 13604 More actions · Answer 11

THANK YOU ALL SOOOOO MUCH!!

I really appreciate and am in awe of your skills.

You all may have saved my job!!!

Good karma to you all, and if any of you are ever in Sunny West Palm Beach - DRINKS ARE ON ME!!!

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 12

Hi Krypto

Which solution did you decide to use? Would it be possible for you to post it, for the potential benefit of others who may experience the same or a similar problem? Thanks!

Cheers

ChrisM

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 13

GilaMonster (1/16/2009)
If the comparison with the outer query is an equality, the subquery won't be processed row by row.

I pretty sure it will with an ORDER BY...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 14

Jeff Moden (1/16/2009)
GilaMonster (1/16/2009)
If the comparison with the outer query is an equality, the subquery won't be processed row by row.
I pretty sure it will with an ORDER BY...

Order by where? In the subquery? In the outer query?

I'll test, I have code set up. I'm just not sure what to add where.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass