Top N sub query workaround - Does it exist?

  • I'm getting a major slowdown on a Top N sort.

    Here's the part that so slow:

    UPDATE #events SET userID =

    (SELECT TOP 1 userID FROM #events e2

    WHERE e2.coCRN = e.coCRN AND e2.deptCRN = e.deptCRN AND e2.county = e.county AND e2.orderNo = e.orderNo

    ORDER BY e2.logID)

    FROM #events e

    I've been searching and searching looking for a different methoed to avoid this slow sort. I can't figure out a way to go within the subquery. This temp table has about 433k records, and I've tried all kinds of indexes...but still slow.

    Any ideas?

  • Can you show us the execution plan.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • You are using correlated sub-queries, which is another form of RBAR..., to know more about RBAR and its performance implications see http://www.sqlservercentral.com/links/276514/84234.

    Based on the query, what I understood is that you want to update the userID based on some criteria on the same table and taking only the first matching record.

    Here is what I came up with....

    UPDATEE1

    SETE1.userID = E2.userID

    FROM#events E1

    INNER JOIN

    (

    SELECTROW_NUMBER() OVER( PARTITION BY coCRN, deptCRN, county, orderNo ORDER BY logID ) AS RowNumber,

    coCRN, deptCRN, county, orderNo, userID

    FROM#events

    ) E2 ON E1.coCRN = E2.coCRN AND E1.deptCRN = E2.deptCRN AND E1.county = E2.county AND E1.orderNo = E2.orderNo

    WHEREE2.RowNumber = 1

    Further, you can add index on the columns specified in where clause and specify the userID column in the INCLUDE clause.

    --Ramesh


  • Ramesh (1/16/2009)


    You are using correlated sub-queries, which is another form of RBAR...,

    The optimiser can handle most correlated subqueries just fine. If the comparison with the outer query is an equality, the subquery won't be processed row by row.

    If it's an inequality comparison, it's a whole 'nother story.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • This may be doable with a preaggregated derived table instead of a correlated subquery:

    -- current query

    SELECT e.coCRN, e.deptCRN, e.county, e.orderNo, MAXuserID = (SELECT TOP 1 userID FROM #events e2

    WHERE e2.coCRN = e.coCRN AND e2.deptCRN = e.deptCRN AND e2.county = e.county AND e2.orderNo = e.orderNo

    ORDER BY e2.logID)

    FROM #events e

    -- alternative query

    SELECT e.coCRN, e.deptCRN, e.county, e.orderNo, MAXuserID

    FROM #events e

    INNER JOIN (SELECT MAX(userID) AS MAXuserID, coCRN, deptCRN, county, orderNo

    FROM #events

    GROUP BY coCRN, deptCRN, county, orderNo

    ) e2 ON e2.coCRN = e.coCRN

    AND e2.deptCRN = e.deptCRN

    AND e2.county = e.county

    AND e2.orderNo = e.orderNo

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I wish it was available online, but if you can track down the final copy of the SQL Standard, I wrote an article comparing TOP, MAX and ROW_NUMBER as a mechanism for getting versioned data. Depending on the data sets involved, TOP works best on larger data sets and ROW_NUMBER works best on smaller data sets. But you need to test in your environment to be sure.

    Generally, something along the lines of the following works best for larger data sets:

    SELECT ...

    FROM X

    WHERE X.Val = (SELECT TOP(1) x2.Val

    FROM X AS x2

    WHERE X.id = x2.Id

    ORDER BY x.Val DESC)

    You have to have the right indexes in place of course because table/index scans will kill performance. You can also express it like a JOIN:

    SELECT...

    FROM X AS x

    JOIN X as x2

    ON x.ID = x2.ID

    AND x2.Val = (SELECT TOP(1).... -- you get the picture

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • GilaMonster (1/16/2009)


    Ramesh (1/16/2009)


    You are using correlated sub-queries, which is another form of RBAR...,

    The optimiser can handle most correlated subqueries just fine. If the comparison with the outer query is an equality, the subquery won't be processed row by row.

    If it's an inequality comparison, it's a whole 'nother story.

    Thanks gail, for letting me know about the optimizer and correcting me. One thing, is it do the same in all environments? Or is it dependent on other factors?

    --Ramesh


  • Ramesh (1/16/2009)


    Thanks gail, for letting me know about the optimizer and correcting me. One thing, is it do the same in all environments? Or is it dependent on other factors?

    What environments or factors are you thinking of?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I meant to ask you that does the optimizer always process the two tables/queries separately (i.e. not a row-by-row process as you've mentioned) and do a equality join, similar to an INNER JOIN, irrespective of the amount of data, sql 2005 edition, hardware etc.?

    --Ramesh


  • They're not processed separately, that's the point. SQL can and does 'de-correlate' the subquery and create an exec plan that processes the inner and outer query together.

    Take this example I did in a presentation a couple months back

    USE AdventureWorks

    GO

    SET NOCOUNT ON

    DECLARE @StartTime DATETIME

    SET @StartTime = GETDATE()

    SELECT productId, productnumber,

    (SELECT SUM(LineTotal) SumTotal FROM Sales.SalesOrderDetail sd WHERE sd.productID = p.productid )

    FROM Production.Product p

    WHERE ProductNumber like 'bk%'

    print 'Duration with correlated subqueries: ' + CAST(DATEDIFF(ms, @StartTime, getdate()) as varchar(10)) + ' ms'

    SET @StartTime = GETDATE()

    SELECT p.ProductID, ProductNumber, SumTotal

    FROM Production.Product p

    INNER JOIN (

    SELECT productid, SUM(LineTotal) SumTotal

    FROM Sales.SalesOrderDetail

    GROUP BY ProductID

    ) SalesTotals on p.ProductID = SalesTotals.ProductID

    WHERE ProductNumber like 'bk%'

    print 'Duration with derived table: ' + CAST(DATEDIFF(ms, @StartTime, getdate()) as varchar(10)) + ' ms'

    Near-identical execution plans and durations

    (run it twice and take the second time, the first time the data and plan caches are cold)

    As far as I know, it doesn't depend on edition, data, day of the year, etc

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hey Gail, firstly thank you very much for taking time in explaining me the concept and providing me with a working example.

    Secondly, you were right, it does provide the same plans and complete in approximately same durations.

    --Ramesh


  • THANK YOU ALL SOOOOO MUCH!!

    I really appreciate and am in awe of your skills.

    You all may have saved my job!!!

    Good karma to you all, and if any of you are ever in Sunny West Palm Beach - DRINKS ARE ON ME!!!

  • Hi Krypto

    Which solution did you decide to use? Would it be possible for you to post it, for the potential benefit of others who may experience the same or a similar problem? Thanks!

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • GilaMonster (1/16/2009)


    If the comparison with the outer query is an equality, the subquery won't be processed row by row.

    I pretty sure it will with an ORDER BY...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/16/2009)


    GilaMonster (1/16/2009)


    If the comparison with the outer query is an equality, the subquery won't be processed row by row.

    I pretty sure it will with an ORDER BY...

    Order by where? In the subquery? In the outer query?

    I'll test, I have code set up. I'm just not sure what to add where.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 1 through 15 (of 32 total)

You must be logged in to reply to this topic. Login to reply