Solving the Running Total and Ordinal Rank Problems (Rewritten)

Question

Solving the Running Total and Ordinal Rank Problems (Rewritten)

Viewing 15 posts - 61 through 75 (of 307 total)

You must be logged in to reply to this topic. Login to reply

Martin Butcher-415798 SSC Enthusiast Points: 157 More actions · Answer 1

Jeff Moden (11/12/2009)
Ok, Martin... I'm confused... I see that all of your code examples update the "RunningBalance" column... none of the code updates the "RunningTotal" column yet your complaint is that the "RunningTotal" column goes astray according to your thoughts. What is it that updates the "RunningTotal" column?
And, by the way, it's the "RunningTotal" column that went astray... not the "RunningBalance" column that (according to your code), that the "Quirky Update" correctly calculated. 😉

Jeff I am not sure what you mean here?

This is the code I posted and am complaining about. No where does it mention runningbalance.

The running balance column already existed in my table and if its wrong there are going to be some very unhappy customers out there.

SET NOCOUNT ON

--===== Declare the working variables

DECLARE @PrevAccountID INT

DECLARE @AccountRunningTotal DECIMAL(14, 3)

--

--===== Update the running total for this row using the "Quirky Update" -- and a "Pseudo-cursor"

UPDATE

dbo.Ledger

SET

@AccountRunningTotal = RunningTotal =

CASE WHEN AccountID = @PrevAccountID THEN CASE WHEN @AccountRunningTotal > 0

THEN @AccountRunningTotal + (Credit - Debit)

ELSE @AccountRunningTotal - (Debit - Credit) END

ELSE (Credit - Debit)

END,

@PrevAccountID = AccountID

FROM

dbo.Ledger WITH (TABLOCKX)

WHERE

LedgerType = 'Account'

OPTION

(MAXDOP 1)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 2

Dang it... you're right, Martin. Dyslexia due to lack of coffee must have kicked kicked in. My apologies. I'll check it out.

Or maybe it's the edit you did yesterday? I clearly remember it being as I stated. Either way, though, I'll check it out.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Martin Butcher-415798 SSC Enthusiast Points: 157 More actions · Answer 3

Jeff Moden (11/13/2009)
Dang it... you're right, Martin. Dyslexia due to lack of coffee must have kicked kicked in. My apologies. I'll check it out.
Or maybe it's the edit you did yesterday? I clearly remember it being as I stated. Either way, though, I'll check it out.

Sorry Jeff the edit was just spelling! To many late nights me thinks 🙂

Tom Garth SSCertifiable Points: 6193 More actions · Answer 4

Jeff,

Terrific article. I haven't used Quirky Update for a few years, and that's a good thing because I didn't know all of the rules.

I don't want to stir up a controversy so I'll word this carefully. I have never noticed a temp table or table variable that didn't return its data in the same order that it was added. Do I just need to pay more attention, or is this another Undocumented Feature;-).

Thanks for 22 pages of useful information.

Tom Garth
Vertical Solutions[/url]

"There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." -- Will Rogers

Alexander Kuznetsov SSCrazy Points: 2217 More actions · Answer 5

Tom Garth (11/13/2009)
Jeff,
I don't want to stir up a controversy so I'll word this carefully. I have never noticed a temp table or table variable that didn't return its data in the same order that it was added.

You can modify the following example to use a temporary table and see for yourself:

http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/05/20/without-order-by-there-is-no-default-sort-order.aspx

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 6

Tom Garth (11/13/2009)
Jeff,
Terrific article. I haven't used Quirky Update for a few years, and that's a good thing because I didn't know all of the rules.
I don't want to stir up a controversy so I'll word this carefully. I have never noticed a temp table or table variable that didn't return its data in the same order that it was added. Do I just need to pay more attention, or is this another Undocumented Feature;-).
Thanks for 22 pages of useful information.

Thanks Tom. And Alexander is correct. When it comes to SELECT's, you really need to use ORDER BY to guarantee the order.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 7

Make your vioce heard

Vote on Connect here

http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=254388

N 56°04'39.16"
E 12°55'05.25"

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 8

Make your voice heard

Vote on Connect here

http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=254388

N 56°04'39.16"
E 12°55'05.25"

peter-757102 SSCertifiable Points: 6877 More actions · Answer 9

Joe Celko (11/13/2009)
I have been on the road speaking at PASS and SQL Connections for two weeks. One of the topics in both places was getting a lot of play is finishing the OVER() clause to include the RANGE sub-clause. I am offering to help MS with this as an outside consultant as part of my follow-up. The syntax is a bit verbose, but readable:
[ROWS | RANGE <range expression> says how many rows before and after the current row to apply the aggregate function. The keywords are:
BETWEEN
PRECEDING
FOLLOWING
CURRENT
UNBOUNDED | number of rows
so the classic running totals is
SUM (something_amt)
OVER (PARTITION BY floob_grps
ORDER BY something_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
Since it is one optimized statement, it will be faster than cursors for sure, and probably beat even "diseased mutant SQL" proprietary extentions.

It looks a bit complex, but the idea me likes 🙂

Similarly I would like to see something like this possible into constraints as well (vertical constraints if you will). Setting check constraints relative to a preceding row is something quite often wanted and the knowledge encapsulated in it can be used by the optimizer to do neat optimizations during selects. It also would prevent trigger logic containing multiple statements in quite a few situations and make the modeling more declarative (the rules are more transparent then with triggers for sure).

Condorman SSC Enthusiast Points: 173 More actions · Answer 10

Really interesting discussion guys. I do have one complaint about Hugo's alternative method, though.

Here in the relevant bit from Hugo's code.

INSERT INTO @Results(AccountID, Date, TransactionDetailID, Amount,
RunningTotal, Rnk)
SELECT AccountID, Date, TransactionDetailID, Amount, Amount,
RANK() OVER (PARTITION BY AccountID
ORDER BY Date,
TransactionDetailID)
FROM dbo.TransactionDetail;
[/code]

The insert into the temp table uses the Rank() function, which the while loop compares against an incrementing counter. Assuming that we can guarantee a unique combination of Date-TransactionDetailID within each AccountID partition, this works as expected. Indeed, the data is such in this proof-of-concept example that there are no problems.

When this is translated to a real world situation, though, I would strongly encourage ROW_NUMBER() over RANK() for the following reason:

Remarks
If two or more rows tie for a rank, each tied rows receives the same rank. For example, if the two top salespeople have the same SalesYTD value, they are both ranked one. The salesperson with the next highest SalesYTD is ranked number three, because there are two rows that are ranked higher. Therefore, the RANK function does not always return consecutive integers.
http://msdn.microsoft.com/en-us/library/ms176102.aspx

Hugo's method depends on unique, consecutive integers, and it will fail in the event that an implementation uses a non-unique order by clause. ROW_NUMBER() solves this problem with a minimum of fuss.

Paul White SSC Guru Points: 150468 More actions · Answer 11

sqlservercentralitgeekry (11/14/2009)
Hugo's method depends on unique, consecutive integers, and it will fail in the event that an implementation uses a non-unique order by clause. ROW_NUMBER() solves this problem with a minimum of fuss.

The order in which numbers are assigned to rows by ROW_NUMBER is not deterministic if there are ties, so it's hard to see how that helps at all.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 12

Joe Celko (11/14/2009)
I have never noticed a temp table or table variable that didn't return its data in the same order that it was added. Do I just need to pay more attention, or is this another Undocumented Feature;-).
Actually, you need to read that standards. Were you around when GROUP BY stopped doing a sort and ORDER BY became mandatory? Or when BIT became a numeric data type and was suddenly NULL-able? Suddenly semi-colons have become "best practices" and not "Celko's way" for SQL Server, etc .
A professional always writes to Standards so the code will port to other SQLs and to the future releases of the same SQL product. I make a lot of my money from "Cowboy coders" who boxed in their clients with proprietary stuff. When I got home, I re-connected with Dave Winters, an old friend from Georgia Tech decades ago. He wrote a column on SQL Server after we left GT and consults. Same story -- Cowboy Coders and dialect = regular employment fixing their messes afterward.

Please, get off the "Code only to SQL Standards Soap Box."

I work in a MS SQL Server shop and I will write the code to make the best and most scalable and "performant" code I can. If that means using proprietary extensions, then that is what I'll do and is what is expected of me by my employer.

If we should move to another DBMS, we'll cross that bridge when it occurs and rewrite code as is needed. I don't see that happening, however, any time soon as we are experiencing budgetary issues and ORACLE is just not going to be in our future. I also don't see us going to an open source DBMS either.

Alexander Kuznetsov SSCrazy Points: 2217 More actions · Answer 13

Joe Celko (11/13/2009)
I have been on the road speaking at PASS and SQL Connections for two weeks. One of the topics in both places was getting a lot of play is finishing the OVER() clause to include the RANGE sub-clause. I am offering to help MS with this as an outside consultant as part of my follow-up. The syntax is a bit verbose, but readable:
[ROWS | RANGE <range expression> says how many rows before and after the current row to apply the aggregate function. The keywords are:
BETWEEN
PRECEDING
FOLLOWING
CURRENT
UNBOUNDED | number of rows
so the classic running totals is
SUM (something_amt)
OVER (PARTITION BY floob_grps
ORDER BY something_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
Since it is one optimized statement, it will be faster than cursors for sure, and probably beat even "diseased mutant SQL" proprietary extentions.

Interesting. On which RDBMS did you benchmark and concluded that "one optimized statement, it will be faster than cursors"?

In general this is simply not true - cursors may be and sometimes are implemented faster than OLAP functions. Some time ago I spent considerable time optimizing such problems on Oracle 9, and on Oracle 9 cursors were several times faster than OLAP functions. We need to be very careful making blanket statements...

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 14

Hi Joe

Joe Celko (11/14/2009)
A professional always writes to Standards so the code will port to other SQLs and to the future releases of the same SQL product.

In my opinion, that's not correct. I'd say: "A professional always get things work."

I know many people who tried only to use the standard SQL features which are (hopefully) supported in every DBMS. They hit the wall as soon as the load increases. A wise man once told me "Buying a DBMS like SQL Server (or Oracle or PostgreSQL or ...) to use only the standard features is like buying a scientific calculator and just use plus and minus" ;-).

Greets

Flo

Alexander Kuznetsov SSCrazy Points: 2217 More actions · Answer 15

Similarly I would like to see something like this possible into constraints as well (vertical constraints if you will). Setting check constraints relative to a preceding row is something quite often wanted and the knowledge encapsulated in it can be used by the optimizer to do neat optimizations during selects. It also would prevent trigger logic containing multiple statements in quite a few situations and make the modeling more declarative (the rules are more transparent then with triggers for sure).

Same here - sometimes there are business rules such as "do not change running totals more than 20% at a time". For that, I use denormalization and CHECK constraints: