Solving the Running Total and Ordinal Rank Problems (Rewritten)

  • Jeff Moden (11/12/2009)


    Ok, Martin... I'm confused... I see that all of your code examples update the "RunningBalance" column... none of the code updates the "RunningTotal" column yet your complaint is that the "RunningTotal" column goes astray according to your thoughts. What is it that updates the "RunningTotal" column?

    And, by the way, it's the "RunningTotal" column that went astray... not the "RunningBalance" column that (according to your code), that the "Quirky Update" correctly calculated. 😉

    Jeff I am not sure what you mean here?

    This is the code I posted and am complaining about. No where does it mention runningbalance.

    The running balance column already existed in my table and if its wrong there are going to be some very unhappy customers out there.

    SET NOCOUNT ON

    --===== Declare the working variables

    DECLARE @PrevAccountID INT

    DECLARE @AccountRunningTotal DECIMAL(14, 3)

    --

    --===== Update the running total for this row using the "Quirky Update" -- and a "Pseudo-cursor"

    UPDATE

    dbo.Ledger

    SET

    @AccountRunningTotal = RunningTotal =

    CASE WHEN AccountID = @PrevAccountID THEN CASE WHEN @AccountRunningTotal > 0

    THEN @AccountRunningTotal + (Credit - Debit)

    ELSE @AccountRunningTotal - (Debit - Credit) END

    ELSE (Credit - Debit)

    END,

    @PrevAccountID = AccountID

    FROM

    dbo.Ledger WITH (TABLOCKX)

    WHERE

    LedgerType = 'Account'

    OPTION

    (MAXDOP 1)

  • Dang it... you're right, Martin. Dyslexia due to lack of coffee must have kicked kicked in. My apologies. I'll check it out.

    Or maybe it's the edit you did yesterday? I clearly remember it being as I stated. Either way, though, I'll check it out.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/13/2009)


    Dang it... you're right, Martin. Dyslexia due to lack of coffee must have kicked kicked in. My apologies. I'll check it out.

    Or maybe it's the edit you did yesterday? I clearly remember it being as I stated. Either way, though, I'll check it out.

    Sorry Jeff the edit was just spelling! To many late nights me thinks 🙂

  • Jeff,

    Terrific article. I haven't used Quirky Update for a few years, and that's a good thing because I didn't know all of the rules.

    I don't want to stir up a controversy so I'll word this carefully. I have never noticed a temp table or table variable that didn't return its data in the same order that it was added. Do I just need to pay more attention, or is this another Undocumented Feature;-).

    Thanks for 22 pages of useful information.

    Tom Garth
    Vertical Solutions[/url]

    "There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." -- Will Rogers
  • Tom Garth (11/13/2009)


    Jeff,

    I don't want to stir up a controversy so I'll word this carefully. I have never noticed a temp table or table variable that didn't return its data in the same order that it was added.

    You can modify the following example to use a temporary table and see for yourself:

    http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/05/20/without-order-by-there-is-no-default-sort-order.aspx

  • Tom Garth (11/13/2009)


    Jeff,

    Terrific article. I haven't used Quirky Update for a few years, and that's a good thing because I didn't know all of the rules.

    I don't want to stir up a controversy so I'll word this carefully. I have never noticed a temp table or table variable that didn't return its data in the same order that it was added. Do I just need to pay more attention, or is this another Undocumented Feature;-).

    Thanks for 22 pages of useful information.

    Thanks Tom. And Alexander is correct. When it comes to SELECT's, you really need to use ORDER BY to guarantee the order.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Make your vioce heard

    Vote on Connect here

    http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=254388


    N 56°04'39.16"
    E 12°55'05.25"

  • Make your voice heard

    Vote on Connect here

    http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=254388


    N 56°04'39.16"
    E 12°55'05.25"

  • Joe Celko (11/13/2009)


    I have been on the road speaking at PASS and SQL Connections for two weeks. One of the topics in both places was getting a lot of play is finishing the OVER() clause to include the RANGE sub-clause. I am offering to help MS with this as an outside consultant as part of my follow-up. The syntax is a bit verbose, but readable:

    [ROWS | RANGE <range expression> says how many rows before and after the current row to apply the aggregate function. The keywords are:

    BETWEEN

    PRECEDING

    FOLLOWING

    CURRENT

    UNBOUNDED | number of rows

    so the classic running totals is

    SUM (something_amt)

    OVER (PARTITION BY floob_grps

    ORDER BY something_date

    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

    Since it is one optimized statement, it will be faster than cursors for sure, and probably beat even "diseased mutant SQL" proprietary extentions.

    It looks a bit complex, but the idea me likes 🙂

    Similarly I would like to see something like this possible into constraints as well (vertical constraints if you will). Setting check constraints relative to a preceding row is something quite often wanted and the knowledge encapsulated in it can be used by the optimizer to do neat optimizations during selects. It also would prevent trigger logic containing multiple statements in quite a few situations and make the modeling more declarative (the rules are more transparent then with triggers for sure).

  • Really interesting discussion guys. I do have one complaint about Hugo's alternative method, though.

    Here in the relevant bit from Hugo's code.

    INSERT INTO @Results(AccountID, Date, TransactionDetailID, Amount,

    RunningTotal, Rnk)

    SELECT AccountID, Date, TransactionDetailID, Amount, Amount,

    RANK() OVER (PARTITION BY AccountID

    ORDER BY Date,

    TransactionDetailID)

    FROM dbo.TransactionDetail;

    [/code]

    The insert into the temp table uses the Rank() function, which the while loop compares against an incrementing counter. Assuming that we can guarantee a unique combination of Date-TransactionDetailID within each AccountID partition, this works as expected. Indeed, the data is such in this proof-of-concept example that there are no problems.

    When this is translated to a real world situation, though, I would strongly encourage ROW_NUMBER() over RANK() for the following reason:

    Remarks

    If two or more rows tie for a rank, each tied rows receives the same rank. For example, if the two top salespeople have the same SalesYTD value, they are both ranked one. The salesperson with the next highest SalesYTD is ranked number three, because there are two rows that are ranked higher. Therefore, the RANK function does not always return consecutive integers.

    http://msdn.microsoft.com/en-us/library/ms176102.aspx

    Hugo's method depends on unique, consecutive integers, and it will fail in the event that an implementation uses a non-unique order by clause. ROW_NUMBER() solves this problem with a minimum of fuss.

  • sqlservercentralitgeekry (11/14/2009)


    Hugo's method depends on unique, consecutive integers, and it will fail in the event that an implementation uses a non-unique order by clause. ROW_NUMBER() solves this problem with a minimum of fuss.

    The order in which numbers are assigned to rows by ROW_NUMBER is not deterministic if there are ties, so it's hard to see how that helps at all.

    Paul

  • Joe Celko (11/14/2009)


    I have never noticed a temp table or table variable that didn't return its data in the same order that it was added. Do I just need to pay more attention, or is this another Undocumented Feature;-).

    Actually, you need to read that standards. Were you around when GROUP BY stopped doing a sort and ORDER BY became mandatory? Or when BIT became a numeric data type and was suddenly NULL-able? Suddenly semi-colons have become "best practices" and not "Celko's way" for SQL Server, etc .

    A professional always writes to Standards so the code will port to other SQLs and to the future releases of the same SQL product. I make a lot of my money from "Cowboy coders" who boxed in their clients with proprietary stuff. When I got home, I re-connected with Dave Winters, an old friend from Georgia Tech decades ago. He wrote a column on SQL Server after we left GT and consults. Same story -- Cowboy Coders and dialect = regular employment fixing their messes afterward.

    Please, get off the "Code only to SQL Standards Soap Box."

    I work in a MS SQL Server shop and I will write the code to make the best and most scalable and "performant" code I can. If that means using proprietary extensions, then that is what I'll do and is what is expected of me by my employer.

    If we should move to another DBMS, we'll cross that bridge when it occurs and rewrite code as is needed. I don't see that happening, however, any time soon as we are experiencing budgetary issues and ORACLE is just not going to be in our future. I also don't see us going to an open source DBMS either.

  • Joe Celko (11/13/2009)


    I have been on the road speaking at PASS and SQL Connections for two weeks. One of the topics in both places was getting a lot of play is finishing the OVER() clause to include the RANGE sub-clause. I am offering to help MS with this as an outside consultant as part of my follow-up. The syntax is a bit verbose, but readable:

    [ROWS | RANGE <range expression> says how many rows before and after the current row to apply the aggregate function. The keywords are:

    BETWEEN

    PRECEDING

    FOLLOWING

    CURRENT

    UNBOUNDED | number of rows

    so the classic running totals is

    SUM (something_amt)

    OVER (PARTITION BY floob_grps

    ORDER BY something_date

    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

    Since it is one optimized statement, it will be faster than cursors for sure, and probably beat even "diseased mutant SQL" proprietary extentions.

    Interesting. On which RDBMS did you benchmark and concluded that "one optimized statement, it will be faster than cursors"?

    In general this is simply not true - cursors may be and sometimes are implemented faster than OLAP functions. Some time ago I spent considerable time optimizing such problems on Oracle 9, and on Oracle 9 cursors were several times faster than OLAP functions. We need to be very careful making blanket statements...

  • Hi Joe

    Joe Celko (11/14/2009)


    A professional always writes to Standards so the code will port to other SQLs and to the future releases of the same SQL product.

    In my opinion, that's not correct. I'd say: "A professional always get things work."

    I know many people who tried only to use the standard SQL features which are (hopefully) supported in every DBMS. They hit the wall as soon as the load increases. A wise man once told me "Buying a DBMS like SQL Server (or Oracle or PostgreSQL or ...) to use only the standard features is like buying a scientific calculator and just use plus and minus" ;-).

    Greets

    Flo

  • Similarly I would like to see something like this possible into constraints as well (vertical constraints if you will). Setting check constraints relative to a preceding row is something quite often wanted and the knowledge encapsulated in it can be used by the optimizer to do neat optimizations during selects. It also would prevent trigger logic containing multiple statements in quite a few situations and make the modeling more declarative (the rules are more transparent then with triggers for sure).

    Same here - sometimes there are business rules such as "do not change running totals more than 20% at a time". For that, I use denormalization and CHECK constraints:

Viewing 15 posts - 61 through 75 (of 307 total)

You must be logged in to reply to this topic. Login to reply