Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5

  • Radovan Jablonovsky (2/8/2008)


    Hi,

    If it is possible in your production environment to dynamically change number of processors, then it is OK. In our environment it is more conservative, therefore I am looking for solution which is independent from server setup, type of storage, number of files in tablespace, etc ... RDBMS database/server is more about consistency.

    Sincerely,

    Radovan Jablonovsky

    Well, kind of... OPTION (MAXDOP 1) forces parallelism to go away...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Woooow....Noone mentioned guaranteed order via ROW_NUMBER()??

    I see that you guys really focus'd hard on matching current and previous rows.

    Nice thread though. I found Jeff's article extremely useful for what I was trying to do, but I read the thread and...okay, here's my thoughts.

    If you're so concerned about ORDER, then use a CTE with Row_Number(), you're guaranteed order of your results. Upside, you don't have to construct an index.

    Downside, many millions of rows might take a bit to "load" up. If the table is being used, then no worries, most of the rows should already be in the buffer and it would only be logical reads, bonus points if that's the case.

    I ran this on my system after doing a DBCC DROPCLEANBUFFERS. The code ran for 21 seconds. Here's the stats.

    I also ran this on my system after running a select on the table, no fancy code. The update statement ran in 1 second. So, big speed differences if the table is already somewhat in use. =)

    Table 'JBMTest'. Scan count 1, logical reads 10363, physical reads 3, read-ahead reads 5541, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1000000 row(s) affected)

    Here's the query:

    DECLARE @PrevGrpBal MONEY

    SET @PrevGrpBal = 0

    DECLARE @PrevAcctID INT

    SET @PrevAcctID = 0;

    With Running_Total AS

    (

    SELECT ROW_NUMBER() OVER(PARTITION BY AccountID ORDER BY AccountID, Date) as rn,

    AccountID, Date, Amount, GrpBal

    FROM dbo.JBMTest

    )

    UPDATE r

    SET @PrevGrpBal = r.GrpBal = CASE

    WHEN r.AccountID = @PrevAcctID

    THEN @PrevGrpBal + r.Amount

    ELSE r.Amount -- Restarts total at "0 + current amount"

    END,

    @PrevAcctID = AccountID

    FROM Running_Total r

    Is there another example we are trying to solve here that I missed? I use the heck out of the ranking functions in SQL.

    Christopher Ford

  • The previous solution also works with parallel IO on 64 bit boxes quite well, because you're updating the results of your CTE which get pushed back to the underlying table.

    Things get tricky if you do table joins or sub-queries in the CTE.

    From some of the code postings it looks like I might have missed a critical piece of information on the problem that's trying to be solved.

    But the code above solves the running total for each account over a specified period of time.

    Christopher Ford

  • Well stated in both posts, Christopher... But, you've been spoiled by SS 2k5... none of that works in 2k and, I don't know about other folks, but the companies I've had the fortune/misfortune to work with since the year 2000, have all used the 2k version... finally found 1 that actually did use 2k5 and had to give them up because they were slaughtering the database, the data, and the server.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Earlier Jeff posted a question on using SUM(SAL) OVER()

    Since OVER() doesn't support ORDER BY when used with an aggregation.

    How about this one?

    SELECT J1.AccountID,

    J1.Date,

    J1.Amount AS AmountThisDate,

    SUM(J2.Amount) AS RunningBalance,

    CAST(AVG(1.*J2.Amount) AS DECIMAL(12, 2)) AS AvgRunningBalance

    FROM dbo.JBMTest AS J1

    JOIN dbo.JBMTest AS J2

    ON J2.AccountID = J1.AccountID

    AND (J2.Date >= '01/01/2000' --Anchor Date from example

    AND J2.Date <= J1.Date)

    GROUP BY J1.AccountID, J1.Date, J1.Amount

    ORDER BY J1.AccountID, J1.Date;

    Not quite as elegant as SUM(SAL) OVER()

    But I'm willing to bet that the query plans generated for both would be similiar to this query.

    Credit doesn't go to me for this method. Learned this, like many other things from other people. This method gets credited to Itzik, since I learned it from one of his many articles, but it's useful in so many circumstances.

    And this one works on SQL 2000. =)

    Christopher Ford

  • Jeff Moden (2/13/2008)


    Well stated in both posts, Christopher... But, you've been spoiled by SS 2k5... none of that works in 2k and, I don't know about other folks, but the companies I've had the fortune/misfortune to work with since the year 2000, have all used the 2k version... finally found 1 that actually did use 2k5 and had to give them up because they were slaughtering the database, the data, and the server.

    I can't comment on that. πŸ˜‰ I'm sure someone will see this post and who I work for and fire me if I did. :w00t:

    Christopher Ford

  • Christopher Ford (2/13/2008)


    Earlier Jeff posted a question on using SUM(SAL) OVER()

    Since OVER() doesn't support ORDER BY when used with an aggregation.

    How about this one?

    SELECT J1.AccountID,

    J1.Date,

    J1.Amount AS AmountThisDate,

    SUM(J2.Amount) AS RunningBalance,

    CAST(AVG(1.*J2.Amount) AS DECIMAL(12, 2)) AS AvgRunningBalance

    FROM dbo.JBMTest AS J1

    JOIN dbo.JBMTest AS J2

    ON J2.AccountID = J1.AccountID

    AND (J2.Date >= '01/01/2000' --Anchor Date from example

    AND J2.Date <= J1.Date)

    GROUP BY J1.AccountID, J1.Date, J1.Amount

    ORDER BY J1.AccountID, J1.Date;

    Not quite as elegant as SUM(SAL) OVER()

    But I'm willing to bet that the query plans generated for both would be similiar to this query.

    Credit doesn't go to me for this method. Learned this, like many other things from other people. This method gets credited to Itzik, since I learned it from one of his many articles, but it's useful in so many circumstances.

    And this one works on SQL 2000. =)

    Sadly that set based solution in my system over several million rows has a duration of about a half hour. The update variable nips through it in 10 seconds.

  • Sadly that set based solution in my system over several million rows has a duration of about a half hour. The update variable nips through it in 10 seconds.

    Yeah, I can understand that, after I ran it, I ran it with Statistics IO on.

    because of the way the JOIN works, it basically rescans the table quite a few times on the right side of the JOIN, J2 gets scanned like 10000 times.

    BUT, it works on SQL 2000. πŸ™‚

    Christopher Ford

  • Christopher Ford (2/13/2008)


    Earlier Jeff posted a question on using SUM(SAL) OVER()

    Since OVER() doesn't support ORDER BY when used with an aggregation.

    How about this one?

    SELECT J1.AccountID,

    J1.Date,

    J1.Amount AS AmountThisDate,

    SUM(J2.Amount) AS RunningBalance,

    CAST(AVG(1.*J2.Amount) AS DECIMAL(12, 2)) AS AvgRunningBalance

    FROM dbo.JBMTest AS J1

    JOIN dbo.JBMTest AS J2

    ON J2.AccountID = J1.AccountID

    AND (J2.Date >= '01/01/2000' --Anchor Date from example

    AND J2.Date <= J1.Date)

    GROUP BY J1.AccountID, J1.Date, J1.Amount

    ORDER BY J1.AccountID, J1.Date;

    Not quite as elegant as SUM(SAL) OVER()

    This is precisely the Triangular Join mentioned in the first paragraph of the article. This is what Jeff is trying to get rid of.

    --

    JimFive

  • Christopher -

    First off - nice CTE solution. However - the order by inside of the ROW_NUMBER() only guarantees the order in which the numbers are assigned, and not the physical order in which things are returned. So - unless you use the same parlor room tricks Jeff is advocating, you're likely to run into the same issues everyone else is talking about.

    Without guaranteeing the order, you have no guarantees that the "previous row" is the row you should be pulling the running total from.

    I'll refer you to this: http://blogs.msdn.com/queryoptteam/archive/2006/05/02/588731.aspx

    So again - without the clustered index in there to help you, this solution STILL can fall apart....

    If you want to see - drop the clustered index, or change it to a non-clustered index.

    Of course - if you DO take into account those caveats previously mentioned (clustered index, WITH(INDex(), TabLock) OPTION (MAXDOP 1), no partitioned tables), it's very elegant.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • VERY nice find on that blog post! It sums up the ordered-output issue(s) quite explicitly.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Conor had a lot of interesting stuff in the few blog posts he has made - like rules of thumb when queries will recompile, why Top 100 PERCENT is evil, etc... I'm disappointed that he seems to have given up on it (nothing new in a few years).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • You're talking about this Conor? πŸ™‚

    http://www.sqlskills.com/blogs/conor/

    Best Regards,

    Chris BΓΌttner

  • Cool - something to add to my reading list....I'll have to send him a stern talking to about orphaning blog readers like that! (I'm so sure he'll care about that.....:P)

    Thanks!

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Fine...

    So...solutions which guaratee order in which rows are processed is it? Man, you guys are hard to please.

    Here's 2 solutions I can think of. One for SQL 2005 and one for SQL 2000. I'm quite sure there's a much simpler solution that also guarantee's the row output. These weren't quite as speedy...in fact, the sql 2000 one really chewed the I/O a bit. Can't imagine why. =)

    This is what happens when you think too hard as well. =) Or get nit picky on something like order... bah...who cares about order?? πŸ˜€ It's too hard to write queries that require order and updates and in order and accumulate. πŸ˜‰

    I will say that I haven't had quite as much fun coding as I have trying to come up with alternative ways to do Running Totals since finding this article.

    SET STATISTICS IO ON;

    ---- Guarantee Order Returned ----

    WITH RunningTotal AS

    (

    SELECT AccountID, Date, Amount, GrpBal,

    ROW_NUMBER() OVER(PARTITION BY AccountID

    ORDER BY AccountID, Date) AS rn

    FROM dbo.JBMTest

    )

    SELECT Cur.AccountID,

    Cur.Date AS Date,

    Cur.Amount AS Amount,

    ISNULL(Cur.Amount + Prv.Amount,cur.Amount) AS grpBal

    FROM RunningTotal AS Cur

    LEFT OUTER JOIN RunningTotal AS Prv

    ON Cur.AccountID = Prv.AccountID

    AND Cur.rn - 1 = Prv.rn

    ORDER BY Cur.AccountID, Cur.Date;

    ---- Really thinking too hard and order guarantee -----

    SELECT a.AccountID,

    a.RowNum,

    a.Date AS Date,

    a.Amount AS Amount,

    ISNULL(a.Amount + b.Amount,a.Amount) AS grpBal

    FROM (SELECT AccountID, RowNum, Date, Amount,

    (SELECT TOP(1) RowNum

    FROM dbo.JBMTest AS A2

    WHERE A2.AccountID = A1.AccountID

    AND (A2.Date < A1.Date

    OR A2.Date = A1.Date

    AND A2.RowNum < A1.RowNum)

    ORDER BY Date DESC, RowNum DESC) AS PrvRowNum

    FROM dbo.JBMTest AS A1) AS a

    LEFT OUTER JOIN dbo.JBMTest AS b

    ON a.PrvRowNum = b.RowNum

    ORDER BY a.AccountID, a.date, a.rowNum;

    Christopher Ford

Viewing 15 posts - 106 through 120 (of 250 total)

You must be logged in to reply to this topic. Login to reply