Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5

  • AnzioBake (12/22/2008)


    What happens for for tables that the rows are not in a clustered index with the right order.

    Like I said in the article... copy to a temp table and index that. It's still faster than using a cursor.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Agreed not RBAR, but RBR LOL:D

  • As like the solution using the Anchor value, I have not seen that "fix" before. I also wondered if the clustered index was generated with a fillfactor=100 is the anchor still needed?

    Also, do you think the will ever the ability to do computations in a SELECT statement like those shown on the UPDATE statement. If they did then the solution could be as simple as:

    use [NorthWind]

    go

    DECLARE @SumFreight AS MONEY

    DECLARE @CntFreight AS INT

    SELECT

    @SumFreight = 0.0,

    @CntFreight = 0

    SELECT

    [OrderID],

    [Freight],

    @SumFreight = [RunningTotal] = @SumFreight + ISNULL([Freight], 0.0),

    @CntFreight = [RunningCount] = @CntFreight + 1

    FROM [dbo].[Orders]

    ORDER BY

    [OrderId]

  • You can do the computations in a SELECT statement, you just cannot assign to both variables and columns in a select statement.

    Shame too.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • LeeBear35 (1/16/2009)


    As like the solution using the Anchor value, I have not seen that "fix" before. I also wondered if the clustered index was generated with a fillfactor=100 is the anchor still needed?

    Also, do you think the will ever the ability to do computations in a SELECT statement like those shown on the UPDATE statement. If they did then the solution could be as simple as:

    use [NorthWind]

    go

    DECLARE @SumFreight AS MONEY

    DECLARE @CntFreight AS INT

    SELECT

    @SumFreight = 0.0,

    @CntFreight = 0

    SELECT

    [OrderID],

    [Freight],

    @SumFreight = [RunningTotal] = @SumFreight + ISNULL([Freight], 0.0),

    @CntFreight = [RunningCount] = @CntFreight + 1

    FROM [dbo].[Orders]

    ORDER BY

    [OrderId]

    I absolutely agree... Wouldn't THAT be wonderful?! That would solve so very many problems!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Great article, I 've been able to apply it to a pain in the neck I've been working on, but at the risk of looking like a thicko --:D I cant get the order by update to work with a non clustered index

    If I run the following code

    --===== Add the primary key

    ALTER TABLE dbo.JBMTest

    ADD PRIMARY KEY CLUSTERED (RowNum) --clustered for "Merry-go-Round" test

    --===== Add the "sorting index" to the table

    CREATE NONCLUSTERED INDEX IX_JBMTest_AccountID_Date --not clustered for "Merry-go-Round" test

    ON dbo.JBMTest (AccountID, Date)

    --===== Build the table 100 rows at a time to "mix things up"

    DECLARE @Counter INT

    SET @Counter = 0

    WHILE @Counter < 1000000

    BEGIN --===== Add 1000 rows to the test table

    INSERT INTO dbo.JBMTest

    (AccountID, Amount, Date)

    SELECT TOP 100

    AccountID = ABS(CHECKSUM(NEWID()))%50000+1,

    Amount = CAST(CHECKSUM(NEWID())%10000 /100.0 AS MONEY),

    Date = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME)

    FROM Master.dbo.SysColumns t1

    CROSS JOIN Master.dbo.SysColumns t2

    --===== Increment the counter

    SET @Counter = @Counter + 100

    END

    --===== Declare the required variables

    DECLARE @PrevBal MONEY

    SET @PrevBal = 0

    DECLARE @AccountID INT

    SET @AccountID = 0

    --===== Calculate the running total using SQL Server's proprietary update

    -- and force the order of the update with and ORDER BY

    UPDATE t1

    SET @PrevBal = RunBal = t1.Amount + @PrevBal, @AccountID = t1.AccountID --This does nothing but provide an "anchor"

    FROM dbo.JBMTest t1 INNER JOIN (--==== Derived table "d" provides the sort order for the update

    SELECT TOP 100 PERCENT rownum,AccountID, Date

    FROM dbo.JBMTest

    ORDER BY AccountID, Date )d ON t1.rownum = d.rownum

    I get a very quick update based on rownum order, I 'va attached the execution plan

    I looked at this and thought it was giving me the wrong results because it is scanning the clustered PK on both tables before the join and therfore is getting the rownum order so I changed the join to

    'ON t1.accountID = d.accountID and t1.date = d.date'

    and the query plan now shows a scan on the index of the subquery

    and gives me the right results,

    apart from one mention on the first page of the discussion 200 or so posts ago no one else seems to have had this problem, or am I missing the point of the merry go round index

    :unsure:

  • Thanks for the detailed explanation... I'll take a look tonight after I get home.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I have encountered problems with Merry Go Round issues too. On small samples it would always work but as data sets grew it was more and more of an issue then I realized that even with a clustered index if you have an 80% fill factor the page is only filled with the first 6K of data then goes to the next page. This happens until all the currently allocated pages in the table are are filled to 80%, then the remaining 20% is filled then a new set of pages are allocated.

    I have often found a clustered index with a WITH Fillfactor=100 prevents this issue, but does not always resolve the problem.

    Happy hunting...

  • Martin Stephenson (1/30/2009)


    Great article, I 've been able to apply it to a pain in the neck I've been working on, but at the risk of looking like a thicko --:D I cant get the order by update to work with a non clustered index...

    ... I looked at this and thought it was giving me the wrong results because it is scanning the clustered PK on both tables before the join and therfore is getting the rownum order so I changed the join to

    'ON t1.accountID = d.accountID and t1.date = d.date'

    and the query plan now shows a scan on the index of the subquery

    and gives me the right results,

    apart from one mention on the first page of the discussion 200 or so posts ago no one else seems to have had this problem, or am I missing the point of the merry go round index

    :unsure:

    Oh my my my [font="Arial Black"]MY![/font] The irony of it all! :D:P:hehe: In the words of a very wise man, "The Lord giveth, and Microsoft taketh away! 😉

    Everyone kept busting my hump about how the "quirky" update was undocumented and could "change at the next service pack" and insisted that I use an ORDER BY, which I did. Here's the really funny part... the ORDER BY update used to work just fine when I wrote this article. I hadn't yet installed SQL Server 2005. I was still working on just SQL Server 2000 sp3a (I didn't trust sp4 so never installed it) and it worked just fine... then, I never used the ORDER BY method ever again because, frankly, it was too slow for me. Since that time, I've installed SQL Server 2005 and sp2... now, NONE of the original ORDER BY code works on my box either in 2k or 2k5 with or without the additional index! Even the mods you put on the code don't work on my box even though you say it does on your box. (I wouldn't use the ORDER BY method if I were you :))

    The irony is that such a well documented feature as ORDER BY changed without warning and the "undocumented" feature keeps right on working as advertised.

    I'll make a note of this at the start of the thread so people are made aware. It also gives me incentive to expedite the rewrite I'm currently working on for the article.

    Thanks for bringing this to my attention!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • LeeBear35 (1/30/2009)


    I have encountered problems with Merry Go Round issues too. On small samples it would always work but as data sets grew it was more and more of an issue then I realized that even with a clustered index if you have an 80% fill factor the page is only filled with the first 6K of data then goes to the next page. This happens until all the currently allocated pages in the table are are filled to 80%, then the remaining 20% is filled then a new set of pages are allocated.

    I have often found a clustered index with a WITH Fillfactor=100 prevents this issue, but does not always resolve the problem.

    Happy hunting...

    Please see the post just above this one for an "explanation" of what happened.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • YES YES, it is working, had to make some minor alterations due to an extra dimension (dataareaid)

    Thanks a lot for the quick reply and the correct answer.

    I went from 4.5 hours to 7 !!!! seconds, talk about an improvement. I added my alteration in case any can use it again

    Thanks again, and specially thanks to Jeff Moden witch is the originator of the code

    Kind regards

    ABB

    -- Original code from Jeff Moden alterated by ABB

    DECLARE @PrevCompany varchar(3)

    SET @PrevCompany = ''

    DECLARE @PrevCompanyBalance MONEY --running total for each company

    SET @PrevCompanyBalance = 0

    DECLARE @PrevGrpBal MONEY --Running total resets when account changes

    SET @PrevGrpBal = 0

    DECLARE @PrevRunCnt INT --Overall running count (ordinal rank)

    SET @PrevRunCnt = 0

    DECLARE @PrevGrpCnt INT --Running count resets when account changes

    SET @PrevGrpCnt = 0

    DECLARE @PrevAcctID varchar(10) --The "account change detector"

    SET @PrevAcctID = ''

    UPDATE dbo.DW_CustTrans

    SET --===== Running Total Company

    @PrevCompanyBalance = CompanyBalance = CASE

    WHEN dataareaid = @Prevcompany

    THEN @PrevCompanyBalance + Amount

    ELSE Amount

    END,

    --===== Account Running Total (Reset when account changes)

    @PrevGrpBal = Balance_MST = CASE

    WHEN accountnumber = @PrevAcctID

    THEN @PrevGrpBal + Amount

    ELSE Amount -- Restarts total at "0 + current amount"

    END,

    --===== Running Count (Ordinal Rank)

    @PrevRunCnt = RunCnt = @PrevRunCnt + 1,

    --===== Account Running Total (Ordinal Rank, Reset when account changes)

    @PrevGrpCnt = CompanyCount = CASE

    WHEN accountnumber = @PrevAcctID

    THEN @PrevGrpCnt + 1

    ELSE 1 -- Restarts count at "1"

    END,

    --===== "Anchor" and provides for "account and Company change detection"

    @PrevAcctID = accountnumber,

    @prevCompany = dataareaid

    FROM dbo.DW_CustTrans WITH (INDEX(IX_DW_CustTrans),TABLOCKX)

  • Ummm.... no.... to be as safe as possible with this method, you MUST include a FROM clause in the update and it must use a TabLockX and index hint that forces the use of the clustered index.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, Do you have access to SQL Server 2008? At the SSWUGvConference I heard the the ORDER BY "may be" honored again in the UPDATE statement. I don't have SQL Server 2008 yet so I can't test it.

  • Lynn Pettis (4/24/2009)


    Jeff, Do you have access to SQL Server 2008? At the SSWUGvConference I heard the the ORDER BY "may be" honored again in the UPDATE statement. I don't have SQL Server 2008 yet so I can't test it.

    That's good news, but, no, I don't have 2k8, yet.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/24/2009)


    Lynn Pettis (4/24/2009)


    Jeff, Do you have access to SQL Server 2008? At the SSWUGvConference I heard the the ORDER BY "may be" honored again in the UPDATE statement. I don't have SQL Server 2008 yet so I can't test it.

    That's good news, but, no, I don't have 2k8, yet.

    Well, it isn't good news until it is proven true. I'd call it hopeful news.

Viewing 15 posts - 226 through 240 (of 250 total)

You must be logged in to reply to this topic. Login to reply