Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5

Question

Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5

Viewing 15 posts - 106 through 120 (of 250 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004392 More actions · Answer 1

Radovan Jablonovsky (2/8/2008)
Hi,
If it is possible in your production environment to dynamically change number of processors, then it is OK. In our environment it is more conservative, therefore I am looking for solution which is independent from server setup, type of storage, number of files in tablespace, etc ... RDBMS database/server is more about consistency.
Sincerely,
Radovan Jablonovsky

Well, kind of... OPTION (MAXDOP 1) forces parallelism to go away...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Christopher Ford-327546 SSCommitted Points: 1622 More actions · Answer 2

Woooow....Noone mentioned guaranteed order via ROW_NUMBER()??

I see that you guys really focus'd hard on matching current and previous rows.

Nice thread though. I found Jeff's article extremely useful for what I was trying to do, but I read the thread and...okay, here's my thoughts.

If you're so concerned about ORDER, then use a CTE with Row_Number(), you're guaranteed order of your results. Upside, you don't have to construct an index.

Downside, many millions of rows might take a bit to "load" up. If the table is being used, then no worries, most of the rows should already be in the buffer and it would only be logical reads, bonus points if that's the case.

I ran this on my system after doing a DBCC DROPCLEANBUFFERS. The code ran for 21 seconds. Here's the stats.

I also ran this on my system after running a select on the table, no fancy code. The update statement ran in 1 second. So, big speed differences if the table is already somewhat in use. =)

Table 'JBMTest'. Scan count 1, logical reads 10363, physical reads 3, read-ahead reads 5541, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1000000 row(s) affected)

Here's the query:

DECLARE @PrevGrpBal MONEY

SET @PrevGrpBal = 0

DECLARE @PrevAcctID INT

SET @PrevAcctID = 0;

With Running_Total AS

(

SELECT ROW_NUMBER() OVER(PARTITION BY AccountID ORDER BY AccountID, Date) as rn,

AccountID, Date, Amount, GrpBal

FROM dbo.JBMTest

)

UPDATE r

SET @PrevGrpBal = r.GrpBal = CASE

WHEN r.AccountID = @PrevAcctID

THEN @PrevGrpBal + r.Amount

ELSE r.Amount -- Restarts total at "0 + current amount"

END,

@PrevAcctID = AccountID

FROM Running_Total r

Is there another example we are trying to solve here that I missed? I use the heck out of the ranking functions in SQL.

Christopher Ford

Christopher Ford-327546 SSCommitted Points: 1622 More actions · Answer 3

The previous solution also works with parallel IO on 64 bit boxes quite well, because you're updating the results of your CTE which get pushed back to the underlying table.

Things get tricky if you do table joins or sub-queries in the CTE.

From some of the code postings it looks like I might have missed a critical piece of information on the problem that's trying to be solved.

But the code above solves the running total for each account over a specified period of time.

Christopher Ford

Jeff Moden SSC Guru Points: 1004392 More actions · Answer 4

Well stated in both posts, Christopher... But, you've been spoiled by SS 2k5... none of that works in 2k and, I don't know about other folks, but the companies I've had the fortune/misfortune to work with since the year 2000, have all used the 2k version... finally found 1 that actually did use 2k5 and had to give them up because they were slaughtering the database, the data, and the server.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Christopher Ford-327546 SSCommitted Points: 1622 More actions · Answer 5

Earlier Jeff posted a question on using SUM(SAL) OVER()

Since OVER() doesn't support ORDER BY when used with an aggregation.

How about this one?

SELECT J1.AccountID,

J1.Date,

J1.Amount AS AmountThisDate,

SUM(J2.Amount) AS RunningBalance,

CAST(AVG(1.*J2.Amount) AS DECIMAL(12, 2)) AS AvgRunningBalance

FROM dbo.JBMTest AS J1

JOIN dbo.JBMTest AS J2

ON J2.AccountID = J1.AccountID

AND (J2.Date >= '01/01/2000' --Anchor Date from example

AND J2.Date <= J1.Date)

GROUP BY J1.AccountID, J1.Date, J1.Amount

ORDER BY J1.AccountID, J1.Date;

Not quite as elegant as SUM(SAL) OVER()

But I'm willing to bet that the query plans generated for both would be similiar to this query.

Credit doesn't go to me for this method. Learned this, like many other things from other people. This method gets credited to Itzik, since I learned it from one of his many articles, but it's useful in so many circumstances.

And this one works on SQL 2000. =)

Christopher Ford

Christopher Ford-327546 SSCommitted Points: 1622 More actions · Answer 6

Jeff Moden (2/13/2008)
Well stated in both posts, Christopher... But, you've been spoiled by SS 2k5... none of that works in 2k and, I don't know about other folks, but the companies I've had the fortune/misfortune to work with since the year 2000, have all used the 2k version... finally found 1 that actually did use 2k5 and had to give them up because they were slaughtering the database, the data, and the server.

I can't comment on that. 😉 I'm sure someone will see this post and who I work for and fire me if I did. :w00t:

Christopher Ford

RichB SSCrazy Eights Points: 9651 More actions · Answer 7

Christopher Ford (2/13/2008)
Earlier Jeff posted a question on using SUM(SAL) OVER()
Since OVER() doesn't support ORDER BY when used with an aggregation.
How about this one?
SELECT J1.AccountID,
J1.Date,
J1.Amount AS AmountThisDate,
SUM(J2.Amount) AS RunningBalance,
CAST(AVG(1.*J2.Amount) AS DECIMAL(12, 2)) AS AvgRunningBalance
FROM dbo.JBMTest AS J1
JOIN dbo.JBMTest AS J2
ON J2.AccountID = J1.AccountID
AND (J2.Date >= '01/01/2000' --Anchor Date from example
AND J2.Date <= J1.Date)
GROUP BY J1.AccountID, J1.Date, J1.Amount
ORDER BY J1.AccountID, J1.Date;
Not quite as elegant as SUM(SAL) OVER()
But I'm willing to bet that the query plans generated for both would be similiar to this query.
Credit doesn't go to me for this method. Learned this, like many other things from other people. This method gets credited to Itzik, since I learned it from one of his many articles, but it's useful in so many circumstances.
And this one works on SQL 2000. =)

Sadly that set based solution in my system over several million rows has a duration of about a half hour. The update variable nips through it in 10 seconds.

Christopher Ford-327546 SSCommitted Points: 1622 More actions · Answer 8

Sadly that set based solution in my system over several million rows has a duration of about a half hour. The update variable nips through it in 10 seconds.

Yeah, I can understand that, after I ran it, I ran it with Statistics IO on.

because of the way the JOIN works, it basically rescans the table quite a few times on the right side of the JOIN, J2 gets scanned like 10000 times.

BUT, it works on SQL 2000. 🙂

Christopher Ford

James Goodwin Hall of Fame Points: 3691 More actions · Answer 9

Christopher Ford (2/13/2008)
Earlier Jeff posted a question on using SUM(SAL) OVER()
Since OVER() doesn't support ORDER BY when used with an aggregation.
How about this one?
SELECT J1.AccountID,
J1.Date,
J1.Amount AS AmountThisDate,
SUM(J2.Amount) AS RunningBalance,
CAST(AVG(1.*J2.Amount) AS DECIMAL(12, 2)) AS AvgRunningBalance
FROM dbo.JBMTest AS J1
JOIN dbo.JBMTest AS J2
ON J2.AccountID = J1.AccountID
AND (J2.Date >= '01/01/2000' --Anchor Date from example
AND J2.Date <= J1.Date)
GROUP BY J1.AccountID, J1.Date, J1.Amount
ORDER BY J1.AccountID, J1.Date;
Not quite as elegant as SUM(SAL) OVER()

This is precisely the Triangular Join mentioned in the first paragraph of the article. This is what Jeff is trying to get rid of.

--

JimFive

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 10

Christopher -

First off - nice CTE solution. However - the order by inside of the ROW_NUMBER() only guarantees the order in which the numbers are assigned, and not the physical order in which things are returned. So - unless you use the same parlor room tricks Jeff is advocating, you're likely to run into the same issues everyone else is talking about.

Without guaranteeing the order, you have no guarantees that the "previous row" is the row you should be pulling the running total from.

I'll refer you to this: http://blogs.msdn.com/queryoptteam/archive/2006/05/02/588731.aspx

So again - without the clustered index in there to help you, this solution STILL can fall apart....

If you want to see - drop the clustered index, or change it to a non-clustered index.

Of course - if you DO take into account those caveats previously mentioned (clustered index, WITH(INDex(), TabLock) OPTION (MAXDOP 1), no partitioned tables), it's very elegant.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 11

VERY nice find on that blog post! It sums up the ordered-output issue(s) quite explicitly.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 12

Conor had a lot of interesting stuff in the few blog posts he has made - like rules of thumb when queries will recompile, why Top 100 PERCENT is evil, etc... I'm disappointed that he seems to have given up on it (nothing new in a few years).

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Christian Buettner-167247 SSChampion Points: 13729 More actions · Answer 13

You're talking about this Conor? 🙂

http://www.sqlskills.com/blogs/conor/

Best Regards,

Chris Büttner

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 14

Cool - something to add to my reading list....I'll have to send him a stern talking to about orphaning blog readers like that! (I'm so sure he'll care about that.....:P)

Thanks!

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Christopher Ford-327546 SSCommitted Points: 1622 More actions · Answer 15

Fine...

So...solutions which guaratee order in which rows are processed is it? Man, you guys are hard to please.

Here's 2 solutions I can think of. One for SQL 2005 and one for SQL 2000. I'm quite sure there's a much simpler solution that also guarantee's the row output. These weren't quite as speedy...in fact, the sql 2000 one really chewed the I/O a bit. Can't imagine why. =)

This is what happens when you think too hard as well. =) Or get nit picky on something like order... bah...who cares about order?? 😀 It's too hard to write queries that require order and updates and in order and accumulate. 😉

I will say that I haven't had quite as much fun coding as I have trying to come up with alternative ways to do Running Totals since finding this article.

SET STATISTICS IO ON;

---- Guarantee Order Returned ----

WITH RunningTotal AS

(

SELECT AccountID, Date, Amount, GrpBal,

ROW_NUMBER() OVER(PARTITION BY AccountID

ORDER BY AccountID, Date) AS rn

FROM dbo.JBMTest

)

SELECT Cur.AccountID,

Cur.Date AS Date,

Cur.Amount AS Amount,

ISNULL(Cur.Amount + Prv.Amount,cur.Amount) AS grpBal

FROM RunningTotal AS Cur

LEFT OUTER JOIN RunningTotal AS Prv

ON Cur.AccountID = Prv.AccountID

AND Cur.rn - 1 = Prv.rn

ORDER BY Cur.AccountID, Cur.Date;

---- Really thinking too hard and order guarantee -----

SELECT a.AccountID,

a.RowNum,

a.Date AS Date,

a.Amount AS Amount,

ISNULL(a.Amount + b.Amount,a.Amount) AS grpBal

FROM (SELECT AccountID, RowNum, Date, Amount,

(SELECT TOP(1) RowNum

FROM dbo.JBMTest AS A2

WHERE A2.AccountID = A1.AccountID

AND (A2.Date < A1.Date

OR A2.Date = A1.Date

AND A2.RowNum < A1.RowNum)

ORDER BY Date DESC, RowNum DESC) AS PrvRowNum

FROM dbo.JBMTest AS A1) AS a

LEFT OUTER JOIN dbo.JBMTest AS b

ON a.PrvRowNum = b.RowNum

ORDER BY a.AccountID, a.date, a.rowNum;

Christopher Ford