Set based approach to get previous rows value for same column which is a calculation

  • Hi,

    I'm trying to find a set based approach to this scenario. Lets say I have the following data

    declare @data table(

    thedate datetime,

    val1 integer)

    insert into @data values ('2006/12/21',3500)

    insert into @data values ('2006/12/22',3500)

    insert into @data values ('2006/12/23',3500)

    insert into @data values ('2006/12/24',3500)

    insert into @data values ('2006/12/25',3500)

    insert into @data values ('2006/12/26',3500)

    insert into @data values ('2006/12/27',3500)

    [font="Courier New"]

    I need to return this result where calc is:

    1. 0 for the first row

    2. Then calculated as the current rows val1 + the previous rows calc value

    thedateval1calc

    2006/12/2135000

    2006/12/2235003500

    2006/12/2335007000

    2006/12/24350010500

    2006/12/25350014000

    2006/12/26350017500

    2006/12/27350021000

    [/font]

    Any ideas welcome.

    #update forgot to state this is not a running total issue, this is contrived example of much more complex financial calculation.

    Cheers

  • This sort of problems are called as "Running Totals". All you need is to go thro one of the finest articles that describes what Running Totals are, how to tackle them without cursor/while-loops; this was written by MVP Jeff Moden;

    Link : Solving the "Running Total" & "Ordinal Rank" Problems (Rewritten)[/url]

    Hope that helps you!

  • forgot to state this is not a running total issue, this is contrived example of much more complex financial calculation.

  • You can do this using CTE

    ; WITH cte_data AS

    (

    SELECTROW_NUMBER() OVER ( ORDER BY thedate ) Row, *

    FROM@data

    ), cte_final_data AS

    (

    SELECTRow, thedate, val1, 0 calc

    FROMcte_data

    WHERERow = 1

    UNION ALL

    SELECTdata.Row, data.thedate, data.val1, final.calc + data.val1

    FROMcte_data data

    INNER JOIN cte_final_data final ON data.Row = final.Row + 1

    )

    SELECTthedate, val1, calc

    FROMcte_final_data


    Kingston Dhasian

    How to post data/code on a forum to get the best help - Jeff Moden
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • Thanks "SSC Eights!" but this wont for larger datasets due the heavy recursion.

  • Add the MAXRECURSION option

    SELECT thedate, val1, calc

    FROM cte_final_data

    OPTION ( MAXRECURSION 0 )

    This will work for larger datasets( 0 stands for infinite here ). But yes, no idea how this will affect performance.


    Kingston Dhasian

    How to post data/code on a forum to get the best help - Jeff Moden
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • Unfortunately performance is not great. If I bump up the test case to 3K rows its 1 second for cursor based approach and 15 seconds for recursive cte.

  • With the best will in the world, it's impossible to suggest an appropriate set-based solution without knowing more about what the complex financial calculation does. I realise you had to simplify things for a forum question, but those simplifications have removed the core of the problem on this occasion.

    This seems to be a variant of the 'running totals' problem since the value for row n depends partly on the value in row (n-1). A more efficient set-based solution can usually be found in this class of problem, but it does depend on the details. Also, any efficient set-based solution will likely require a rewrite of whatever function or stored procedure currently performs the required magic.

    Whether it is worth pursuing a set-based alternative, depends heavily on the characteristics of the existing method. Is it running unacceptably slowly at the moment? Is this slowness definitely caused by an existing cursor or loop?

    Often, an improvement to the algorithm, indexing, or data structures can provide more of a benefit that simply replacing a loop with set-based code. So...it depends 🙂

  • The real world calculation is Cumulative Performance which I can post the formula on Monday. The only complexity for me lies on the fact that I need the previous rows value for the same column which is a calculation. I don't think you can really re-engineer this, you basically loop the through an ordered result set, the first row is always zero, thereafter the value is a calculation based on other values in the same row and the previous result of the calculation. The tricky part is the fact you need the previous value of a calculation for the same column. If you required another columns previous value or the previous value of the same column that was a static this would be a trivial task.

  • Well that all sounds very much like a problem that can be solved and optimised with tweaked 'running totals' code or a custom aggregate.

    You don't need to post the exact formula or data, just something representative of the problem you face, in way that is easy to work on.

    Please also explain any non-obvious terms, for example "Cumulative Performance" means nothing special to me 🙂

  • ColdCoffee was on the money the "quirky update" method worked a treat. Thanks ColdCoffee!

    Here is the complete solution

    declare @calc integer

    declare @data table(

    thedate datetime,

    val1 integer,

    calc integer)

    insert into @data (thedate, val1) values ('2006/12/21',3500)

    insert into @data (thedate, val1) values ('2006/12/22',3500)

    insert into @data (thedate, val1) values ('2006/12/23',3500)

    insert into @data (thedate, val1) values ('2006/12/24',3500)

    insert into @data (thedate, val1) values ('2006/12/25',3500)

    insert into @data (thedate, val1) values ('2006/12/26',3500)

    insert into @data (thedate, val1) values ('2006/12/27',3500)

    ;with results

    as (select top 2147483647 thedate,

    val1,

    calc

    from @data

    order by thedate)

    update results

    set @calc = calc = case

    when @calc is null then 0

    else val1 + @calc

    end

    output inserted.thedate,

    inserted.val1,

    inserted.calc

    NB# I'm not sure if the top and order by are enough to guarantee the correct order if someone can comment on that will be great

    Cheers

  • all4miller (5/31/2010)


    NB# I'm not sure if the top and order by are enough to guarantee the correct order if someone can comment on that will be great

    Heh... I'm thinking that you didn't actually read the article ColdCoffee provided a link to. Read it and then come back and tell us what essential part that code is missing.

    The reason I'm being such a "jerk" about this is that this technique is very controversial and rightly so. If you do it wrong you could seriously damage some data. The only way to learn the right way to do it is to read the article to achieve a deep understanding of the rules and then be able to recite the rules from memory. OSHA has declared that if one more person rides my hiney about the technique that I need to install hand rails on my butt. 😛 Either study the article so you can always do it right or don't use the technique... I don't need someone getting it wrong. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for commenting Jeff. I actually got my result from the The ‘Subscription List’ SQL Problem but since the part I used is essentially the same as your post I thought ColdCoffee deserved the credit.

    I re-read your article specially the rules section and I guess the stuff I'm missing is stuff that would be implicetly done by sql server behind the scenes... or not..... I have only been sql server for 5 months 🙂

    Please let me know what I missed

    The RULES

    1. CLUSTERED INDEX MUST BE PRESENT IN THE CORRECT ORDER: you cannot create a clustered index on a table variable

    3. DON'T WORK AGAINST PARTITIONED STRUCTURES: not an issue in this case

    4. USE THE TABLOCKX HINT: If you're updating a local Temp Table, only the current session has access to that Temp Table which means that the TABLOCKX hint really isn't required. also not an issue in this case

    5. DO NOT USE JOINS: not an issue in this case

    6. YOU MUST HAVE AN "ANCHOR" COLUMN: I think this could be the missing factor yes?

    7. DO NOT USE ORDER BY: you cannot have a clustered index on a table variable so I had to use an order by

    8. DO NOT USE INDEX HINTS TO TRY TO FORCE ORDER: not an issue in this case

    So I think it's either the missing anchor column or bascially table variables just wont work?

    Cheers

  • all4miller (5/31/2010)


    Thanks for commenting Jeff. I actually got my result from the The ‘Subscription List’ SQL Problem but since the part I used is essentially the same as your post I thought ColdCoffee deserved the credit.

    I re-read your article specially the rules section and I guess the stuff I'm missing is stuff that would be implicetly done by sql server behind the scenes... or not..... I have only been sql server for 5 months 🙂

    Please let me know what I missed

    The RULES

    1. CLUSTERED INDEX MUST BE PRESENT IN THE CORRECT ORDER: you cannot create a clustered index on a table variable

    3. DON'T WORK AGAINST PARTITIONED STRUCTURES: not an issue in this case

    4. USE THE TABLOCKX HINT: If you're updating a local Temp Table, only the current session has access to that Temp Table which means that the TABLOCKX hint really isn't required. also not an issue in this case

    5. DO NOT USE JOINS: not an issue in this case

    6. YOU MUST HAVE AN "ANCHOR" COLUMN: I think this could be the missing factor yes?

    7. DO NOT USE ORDER BY: you cannot have a clustered index on a table variable so I had to use an order by

    8. DO NOT USE INDEX HINTS TO TRY TO FORCE ORDER: not an issue in this case

    So I think it's either the missing anchor column or bascially table variables just wont work?

    Cheers

    Heh... oddly enough, the post above left out the rule of concern here...

    [font="Arial Black"]2. PARALLELISM MUST BE PREVENTED:[/font] You MUST prevent parallelism from occurring. Therefore, you MUST include OPTION (MAXDOP 1) in any such code.

    So, if you make that small tweek to your original code, you'll be golden... well, except for the fact that I don't yet trust the "order by" implied by the ROW_NUMBER() because I've had it break on a slightly different problem. It's probably OK here, though, but only because the table variable was created as a heap...

    declare @calc integer

    declare @data table(

    thedate datetime,

    val1 integer,

    calc integer)

    insert into @data (thedate, val1) values ('2006/12/21',3500)

    insert into @data (thedate, val1) values ('2006/12/22',3500)

    insert into @data (thedate, val1) values ('2006/12/23',3500)

    insert into @data (thedate, val1) values ('2006/12/24',3500)

    insert into @data (thedate, val1) values ('2006/12/25',3500)

    insert into @data (thedate, val1) values ('2006/12/26',3500)

    insert into @data (thedate, val1) values ('2006/12/27',3500)

    ;with results

    as (select top 2147483647 thedate,

    val1,

    calc

    from @data

    order by thedate)

    update results

    set @calc = calc = case

    when @calc is null then 0

    else val1 + @calc

    end

    output inserted.thedate, --this is a resonable substitute for an "anchor" column but I'd still take the time to include one

    inserted.val1,

    inserted.calc

    OPTION (MAXDOP 1) --absolutely required

    Also, you state that you can't put a clustered index on a table variable... any bets there? 😉

    declare @calc integer,

    @TheDate DATETIME

    declare @data table(

    thedate datetime PRIMARY KEY CLUSTERED, --Look! Clustered index gets made by this

    val1 integer,

    calc integer)

    insert into @data (thedate, val1) values ('2006/12/21',3500)

    insert into @data (thedate, val1) values ('2006/12/22',3500)

    insert into @data (thedate, val1) values ('2006/12/23',3500)

    insert into @data (thedate, val1) values ('2006/12/24',3500)

    insert into @data (thedate, val1) values ('2006/12/25',3500)

    insert into @data (thedate, val1) values ('2006/12/26',3500)

    insert into @data (thedate, val1) values ('2006/12/27',3500)

    update @data

    set @calc = calc = case

    when @calc is null then 0

    else val1 + @calc

    end,

    @TheDate = TheDate --Anchor, a much better guarantee than an OUTPUT statement.

    output inserted.thedate,

    inserted.val1,

    inserted.calc

    FROM @data --Can't put a WITH(TABLOCKX) on a table variable but would do if this was a temp table

    OPTION (MAXDOP 1) --absolutely required

    Either way is fine but both absolutely must have the OPTION (MAXDOP 1) code to prevent parallelism. Yeah, I know... lot's of folks are going to come back and say the sorted ROW_NUMBER() makes it so it doesn't matter if parallelism occurs... I have some code at work on another method (ie, not the quirky update) where ROW_NUMBER() didn't guarantee the order until I added MAXDOP 1. I'll try to remember to post it here. In the meantime, don't take the chance because I've seen parallelism destroy the order of ROW_NUMBER() before. In fact, I'll trust the update order based on a clustered index long before I trust the order of ROW_NUMBER() during an update.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sweet thanks Jeff! 🙂

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply