Solving the Running Total and Ordinal Rank Problems (Rewritten)

  • SQLkiwi (7/14/2011)


    Wayne, you can avoid the join:

    UPDATE target

    SET AccountRunningTotal = art,

    AccountRunningCount = arc

    FROM

    (

    SELECT

    td.AccountRunningTotal,

    td.AccountRunningCount,

    art = SUM(td.Amount) OVER (PARTITION BY td.AccountID ORDER BY td.Date, td.TransactionDetailID),

    arc = COUNT_BIG(*) OVER (PARTITION BY td.AccountID ORDER BY td.Date, td.TransactionDetailID)

    FROM dbo.TransactionDetail AS td

    ) AS target

    What exactly limited the running total in that query to all the previous plus current "actvive" row? Is this new behavior of the existing aggregate functions? Either way, I find the notation very non-explicit and in that it propbalbly kills over a ton of existing code if it has changed.

  • There is an additional clause for the order by part, which defaults to "first row" to "current row" in the partition window.

    These three options are avaiable with the RANGE predicate:

    RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

    RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING

    RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING (equals without order by)

    Things are different with ROWS predicate:

    ROWS BETWEEN @x PRECEDING AND @y FOLLOWING

    ROWS BETWEEN @x PRECEDING AND CURRENT ROW

    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW


    N 56°04'39.16"
    E 12°55'05.25"

  • SwePeso (7/15/2011)


    There is an additional clause for the order by part, which defaults to "first row" to "current row" in the partition window.

    These three options are avaiable with the RANGE predicate:

    RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

    RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING

    RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING (equals without order by)

    Things are different with ROWS predicate:

    ROWS BETWEEN @x PRECEDING AND @y FOLLOWING

    ROWS BETWEEN @x PRECEDING AND CURRENT ROW

    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

    Ineresting amount of control, but am i right to read in this that the default for this new functionality alters the meaning of existing code?

  • Only if you add the ORDER BY clause, then the default framing will be "between unbounded preceding and current row".

    If the ORDER BY clause is not there, the default framing is "between unbounded preceding and unbounded following", which means the entire partition key as it is today.


    N 56°04'39.16"
    E 12°55'05.25"

  • peter-757102 (7/15/2011)


    Ineresting amount of control, but am i right to read in this that the default for this new functionality alters the meaning of existing code?

    No, of course not. Neither SUM nor COUNT can use OVER ... ORDER BY before Denali. It would be extraordinary if Microsoft were to change the semantic of existing (roughly standard) SQL in a new release.

    For full details of the new OVER clause syntax see http://msdn.microsoft.com/en-us/library/ms189461(v=SQL.110).aspx

  • SQLkiwi (7/15/2011)


    No, of course not. Neither SUM nor COUNT can use OVER ... ORDER BY before Denali.

    I think Peter meant if existing code was ported to Denali and the functionality was changed because of different defaults.


    N 56°04'39.16"
    E 12°55'05.25"

  • SwePeso (7/15/2011)


    SQLkiwi (7/15/2011)


    No, of course not. Neither SUM nor COUNT can use OVER ... ORDER BY before Denali.

    I think Peter meant if existing code was ported to Denali and the functionality was changed because of different defaults.

    Who knows? I think we covered most of the possible interpretations between us though.

  • SQLkiwi (7/14/2011)


    Wayne, you can avoid the join:

    :blush: I should have seen that. (Coding when tired...)

    Okay, I've tried these three different methods:

    -- DENALI CTP3 Running Totals with enhanced over clause (cte).

    ;

    WITH cte AS

    (

    SELECT AccountRunningTotal,

    AccountRunningCount,

    ART = SUM(Amount) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID),

    ARC = SUM(1) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID)

    FROM dbo.TransactionDetail

    )

    UPDATE cte

    SET AccountRunningTotal = ART,

    AccountRunningCount = ARC

    GO

    -- DENALI CTP3 Running Totals with enhanced over clause (subquery).

    ;

    UPDATE target

    SET AccountRunningTotal = ART,

    AccountRunningCount = ARC

    FROM (SELECT AccountRunningTotal,

    AccountRunningCount,

    ART = SUM(Amount) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID),

    ARC = SUM(1) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID)

    FROM dbo.TransactionDetail

    ) target

    GO

    -- DENALI CTP3 Running Totals with enhanced over clause (Joined cte).

    ;

    WITH cte AS

    (

    SELECT TransactionDetailID,

    [Date],

    AccountID,

    Amount,

    AccountRunningTotal = SUM(Amount) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID),

    AccountRunningCount = SUM(1) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID),

    NCID

    FROM dbo.TransactionDetail

    )

    UPDATE td

    SET AccountRunningTotal = cte.AccountRunningTotal,

    AccountRunningCount = cte.AccountRunningCount

    FROM dbo.TransactionDetail td

    JOIN cte

    ON cte.TransactionDetailID = td.TransactionDetailID

    GO

    All are way worse than the QU:

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Jeff Moden (7/14/2011)


    It's not going to matter... even though I'm seriously disappointed in the performance (as you've measured), the new SUM() is faster than a While Loop, easier to write, does not require a table update, and it's "officially" supported.

    I normally don't go near the bleeding edge but I guess I'm going to have to load up Denalli CTP3 and do some testing on my own.

    Thanks for the early feedback, Wayne. I really appreciate the time you put into this one.

    No problem Jeff - I have a pretty strong interest in this, since I'm sitting on the proverbial fence on the use of the QU. (The only thing I don't like about it is the non-MS documented rules on how to make it work.)

    Plus... since you're rewriting this article again, it would probably be good to have this in there to be complete. (nudge nudge)

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Hi Wayne,

    It always confuses me when a CTE and subquery are compared like that - they are equivalent.

    Beware drawing performance conclusions from CTP (beta) software. From what I have seen so far, it would be quite possible to optimize the running total scenario further. I have no inside information on whether that is planned or not.

  • SQLkiwi (7/15/2011)


    Hi Wayne,

    It always confuses me when a CTE and subquery are compared like that - they are equivalent.

    Just trying to confuse you Paul. Hey! - it worked! 🙂

    I know it's the same(pre-defined subquery); I tried it just to ensure.

    Beware drawing performance conclusions from CTP (beta) software. From what I have seen so far, it would be quite possible to optimize the running total scenario further. I have no inside information on whether that is planned or not.

    Definitely understand that. And hopefully that is exactly what is going on. What is nice is that this is now available (now = soon...)

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • There's something missing in the code Wayne... and I think it's the reason why it's so comparatively slow. Peter Larsson sent me some code for Running Totals. I'll dig it out when I get home.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • SQLkiwi (7/15/2011)[hr...Beware drawing performance conclusions from CTP (beta) software.

    Oh how I wish everyone would underastand that!

    Paul, you just aquired my vote for God's chief deputy for sensible DB design.

    Tom

  • Is it possible to include a subquery on the table being updated in the quirky update? I'm experimenting with the following code but the subqueries are not returning updated data that I expect from a pseudo-cursor. The first row updates fine, but when later rows need to get the data from the first row, the subquery comes back with 0 instead of the correct value. Am I expecting too much?

    UPDATEA

    SET@Anchor = VntPodKey,

    @Pct = A.Pct2008 =

    CASE

    WHEN A.VntTranQty < 0 OR (A.VntTranQty > 0 AND A.VntPodCount = 1) THEN -- REDUCING OR GAINING GALLONS DOES NOT CHANGE PERCENTS - GET THE LAST PERCENT FOR THIS LOT

    (SELECT TOP 1 B.Pct2008

    FROM #tmpVintage B

    WHERE B.VntLotKey = A.VntLotKey AND B.VntTranDate < A.VntTranDate

    ORDER BY B.VntTranDate DESC)

    ELSE -- INCREASING GALLONS ON A 2-SIDED TRANSACTION

    CASE

    WHEN A.VntPodCount = 2 THEN -- GET THE PCT FROM THE 'FROM' SIDE OF THE TRANSACTION

    (((SELECT B.Pct2008

    FROM #tmpVintage B

    WHERE B.VntPodKey = A.VntPodKey AND

    B.VntIssueRcpt = 'I') * A.VntTranQty) -- Qty2008

    +

    (SELECT TOP 1 C.Run2008

    FROM #tmpVintage C

    WHERE C.VntLotKey = A.VntLotKey AND C.VntTranDate < A.VntTranDate

    ORDER BY C.VntTranDate DESC))-- ADD THE LAST RUNNING TOTAL FOR THE LOT TO THE QTY2008 FOR THE RUN2008 QTY

    /

    A.VntRunTranQty -- DIVIDE THE RUN2008 BY THE LOT'S RUNNING TOTAL TO GET THE PERCENT

    END

    END,

    @Qty = A.Qty2008 =

    CASE

    WHEN A.VntTranQty < 0 OR (A.VntTranQty > 0 AND A.VntPodCount = 1) THEN

    @Pct * A.VntTranQty / 100

    ELSE

    ((SELECT B.Pct2008

    FROM #tmpVintage B

    WHERE B.VntPodKey = A.VntPodKey AND

    B.VntIssueRcpt = 'I') * A.VntTranQty) -- Qty2008

    END,

    @Run = A.Run2008 =

    CASE

    WHEN A.VntTranQty < 0 OR (A.VntTranQty > 0 AND A.VntPodCount = 1) THEN

    @Pct * A.VntRunTranQty / 100

    ELSE

    (((SELECT B.Pct2008

    FROM #tmpVintage B

    WHERE B.VntPodKey = A.VntPodKey AND

    B.VntIssueRcpt = 'I') * A.VntTranQty) -- Qty2008

    +

    (SELECT TOP 1 C.Run2008

    FROM #tmpVintage C

    WHERE C.VntLotKey = A.VntLotKey AND C.VntTranDate < A.VntTranDate

    ORDER BY C.VntTranDate DESC)) -- ADD THE LAST RUNNING TOTAL FOR THE LOT TO THE QTY2008 FOR THE RUN2008 QTY

    END

    FROM#tmpVintage A WITH (TABLOCKX)

    WHEREA.VntTagKey = 0

    OPTION (MAXDOP 1)

  • gcresse (10/31/2011)


    Am I expecting too much?

    I believe so. What you're trying to do is actually breaking Rule 5 which stipulates "No Joins". The Sub-queries you're running are correlated subqueries which are joins.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 256 through 270 (of 307 total)

You must be logged in to reply to this topic. Login to reply