Recursive CTE to replace a cursor problem

  • I am trying to use recursive CTEs to replace cursors and I am having trouble figuring out how to "reset" certain fields when a "key" field changes. In the example provided I am calculating performance for stocks and I have a recursive CTE that works for a single stock. (see results of the first CTE for 'IBM' in the example). The problem comes when I have more than one stock I can't figure out how to "reset" performance when the "loop" changes from one Asset/Stock to the next. So when I add the rows for the second stock (T), the way I have it coded it won't work because I am joining simply on sequence where I need to have the AssetID in there as sequences are repeated for each stock and need to reset running performance to 0 when the asset changes. I'm just not sure how to add the assetID in the joins and have it repeat for each stock. I'm thinking an "outer" recursive CTE that loops the AssetIDs for the "inner" CTE that calculates the performance but I am having trouble figuring how that would work. Hoping a recusive CTE expert can help out.

    Thanks in advance,

    Robb

    Here's the code and example:

    create table #DMVRunningPerformance (

    Seq int,

    LongPosition_bt bit null,

    AssetID_in int null,

    Symbol_vc varchar(255),

    DayID_in int,

    Status_vc varchar(255) null,

    MV float,

    DailyPerf float null,

    RunningPerf float null)

    CREATE CLUSTERED INDEX [indexdown] ON #DMVRunningPerformance

    (

    [Seq] ASC,

    [LongPosition_bt] ASC,

    [AssetID_in] ASC,

    [DayID_in] DESC

    )

    WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]

    -- insert first Asset

    insert into #DMVRunningPerformance

    values (1,1,100,'IBM',75006,NULL,201048.8987,NULL,NULL)

    insert into #DMVRunningPerformance

    values (2,1,100,'IBM',75005,NULL,200841.5658,NULL,NULL)

    insert into #DMVRunningPerformance

    values (3,1,100,'IBM',75004,NULL,200321.7043,NULL,NULL)

    insert into #DMVRunningPerformance

    values (4,1,100,'IBM',75003,NULL,201120.0467,NULL,NULL)

    insert into #DMVRunningPerformance

    values (5,1,100,'IBM',75002,NULL,201779.8805,NULL,NULL)

    insert into #DMVRunningPerformance

    values (6,1,100,'IBM',75001,NULL,201651.3917,NULL,NULL)

    insert into #DMVRunningPerformance

    values (7,1,100,'IBM',75000,NULL,201651.3917,NULL,NULL)

    declare @AssetID_in int=-1,

    @NDAssetID_in int=-1,

    @LongPosition_bt bit=0,

    @Status_vc varchar(255)=NULL,

    @NewGroup_bt bit=1,

    @DailyPerformance float,

    @RunningPerformance float,

    @NDMarketValue float

    ;WITH Calculator AS (

    -- Anchor row, tr.Seq = 1

    SELECT thisrow.Seq, thisrow.Symbol_vc, thisrow.DayID_in, thisrow.MV,

    DailyPerf=CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT),

    RunningPerf = CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT)

    FROM #DMVRunningPerformance thisrow

    INNER JOIN #DMVRunningPerformance nextrow ON nextrow.Seq = thisrow.Seq + 1

    WHERE thisrow.Seq = 1

    UNION ALL

    -- first recursion will be Seq = 2 (thisrow.Seq = 2, lastrow.Seq = 1)

    SELECT thisrow.Seq, thisrow.Symbol_vc, thisrow.DayID_in, thisrow.MV,

    DailyPerf = CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT),

    RunningPerf = ((1 + lastrow.RunningPerf) * (1 + CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT))) - 1

    FROM Calculator lastrow

    INNER JOIN #DMVRunningPerformance thisrow ON thisrow.Seq = lastrow.Seq + 1

    INNER JOIN #DMVRunningPerformance nextrow ON nextrow.Seq = thisrow.Seq + 1

    )

    SELECT Seq, Symbol_vc, DayID_in, MV, DailyPerf, RunningPerf

    FROM Calculator

    -- insert second Asset

    insert into #DMVRunningPerformance

    values (1,1,200,'T',75006,NULL,6484,NULL,NULL)

    insert into #DMVRunningPerformance

    values (2,1,200,'T',75005,NULL,6450,NULL,NULL)

    insert into #DMVRunningPerformance

    values (3,1,200,'T',75004,NULL,6398,NULL,NULL)

    insert into #DMVRunningPerformance

    values (4,1,200,'T',75003,NULL,6397,NULL,NULL)

    insert into #DMVRunningPerformance

    values (5,1,200,'T',75002,NULL,6446,NULL,NULL)

    insert into #DMVRunningPerformance

    values (6,1,200,'T',75001,NULL,6240,NULL,NULL)

    insert into #DMVRunningPerformance

    values (7,1,200,'T',75000,NULL,6124,NULL,NULL)

    -- this obviously will not work as it needs to work on one assetid at a time

    -- so how do I "reset" the sequence back to 1 and the running performance back to 0 for each assetid?

    ;WITH Calculator AS (

    -- Anchor row, tr.Seq = 1

    SELECT thisrow.Seq, thisrow.Symbol_vc, thisrow.DayID_in, thisrow.MV,

    DailyPerf=CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT),

    RunningPerf = CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT)

    FROM #DMVRunningPerformance thisrow

    INNER JOIN #DMVRunningPerformance nextrow ON nextrow.Seq = thisrow.Seq + 1

    WHERE thisrow.Seq = 1

    UNION ALL

    -- first recursion will be Seq = 2 (thisrow.Seq = 2, lastrow.Seq = 1)

    SELECT thisrow.Seq, thisrow.Symbol_vc, thisrow.DayID_in, thisrow.MV,

    DailyPerf = CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT),

    RunningPerf = ((1 + lastrow.RunningPerf) * (1 + CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT))) - 1

    FROM Calculator lastrow

    INNER JOIN #DMVRunningPerformance thisrow ON thisrow.Seq = lastrow.Seq + 1

    INNER JOIN #DMVRunningPerformance nextrow ON nextrow.Seq = thisrow.Seq + 1

    )

    SELECT Seq, Symbol_vc, DayID_in, MV, DailyPerf, RunningPerf

    FROM Calculator

    drop table #dmvrunningperformance

  • Here is an example of a solution using an outer cursor with an inner recursive CTE. I'm trying to eliminate the outer cursor (and return it all as a single resultset)

    create table #DMVRunningPerformance (

    Seq int,

    LongPosition_bt bit null,

    AssetID_in int null,

    Symbol_vc varchar(255),

    DayID_in int,

    Status_vc varchar(255) null,

    MV float,

    DailyPerf float null,

    RunningPerf float null)

    CREATE CLUSTERED INDEX [indexdown] ON #DMVRunningPerformance

    (

    [Seq] ASC,

    [LongPosition_bt] ASC,

    [AssetID_in] ASC,

    [DayID_in] DESC

    )

    WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]

    -- insert first Asset

    insert into #DMVRunningPerformance

    values (1,1,100,'IBM',75006,NULL,201048.8987,NULL,NULL)

    insert into #DMVRunningPerformance

    values (2,1,100,'IBM',75005,NULL,200841.5658,NULL,NULL)

    insert into #DMVRunningPerformance

    values (3,1,100,'IBM',75004,NULL,200321.7043,NULL,NULL)

    insert into #DMVRunningPerformance

    values (4,1,100,'IBM',75003,NULL,201120.0467,NULL,NULL)

    insert into #DMVRunningPerformance

    values (5,1,100,'IBM',75002,NULL,201779.8805,NULL,NULL)

    insert into #DMVRunningPerformance

    values (6,1,100,'IBM',75001,NULL,201651.3917,NULL,NULL)

    insert into #DMVRunningPerformance

    values (7,1,100,'IBM',75000,NULL,201651.3917,NULL,NULL)

    -- insert second Asset

    insert into #DMVRunningPerformance

    values (1,1,200,'T',75006,NULL,6484,NULL,NULL)

    insert into #DMVRunningPerformance

    values (2,1,200,'T',75005,NULL,6450,NULL,NULL)

    insert into #DMVRunningPerformance

    values (3,1,200,'T',75004,NULL,6398,NULL,NULL)

    insert into #DMVRunningPerformance

    values (4,1,200,'T',75003,NULL,6397,NULL,NULL)

    insert into #DMVRunningPerformance

    values (5,1,200,'T',75002,NULL,6446,NULL,NULL)

    insert into #DMVRunningPerformance

    values (6,1,200,'T',75001,NULL,6240,NULL,NULL)

    insert into #DMVRunningPerformance

    values (7,1,200,'T',75000,NULL,6124,NULL,NULL)

    declare @AssetID_in int=-1,

    @NDAssetID_in int=-1,

    @LongPosition_bt bit=0,

    @Status_vc varchar(255)=NULL,

    @NewGroup_bt bit=1,

    @DailyPerformance float,

    @RunningPerformance float,

    @NDMarketValue float

    declare AssetCursor cursor for

    select distinct AssetID_in from #DMVRunningPerformance

    open AssetCursor

    fetch next from AssetCursor into @AssetID_in

    while @@FETCH_STATUS=0

    begin

    WITH Calculator AS (

    -- Anchor row, tr.Seq = 1

    SELECT thisrow.Seq, thisrow.Symbol_vc, thisrow.DayID_in, thisrow.MV,

    DailyPerf=CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT),

    RunningPerf = CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT)

    FROM #DMVRunningPerformance thisrow

    INNER JOIN #DMVRunningPerformance nextrow ON nextrow.Seq = thisrow.Seq + 1 and nextrow.AssetID_in=@AssetID_in

    WHERE thisrow.Seq = 1 and thisrow.AssetID_in=@AssetID_in

    UNION ALL

    -- first recursion will be Seq = 2 (thisrow.Seq = 2, lastrow.Seq = 1)

    SELECT thisrow.Seq, thisrow.Symbol_vc, thisrow.DayID_in, thisrow.MV,

    DailyPerf = CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT),

    RunningPerf = ((1 + lastrow.RunningPerf) * (1 + CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT))) - 1

    FROM Calculator lastrow

    INNER JOIN #DMVRunningPerformance thisrow ON thisrow.Seq = lastrow.Seq + 1 and thisrow.AssetID_in=@AssetID_in

    INNER JOIN #DMVRunningPerformance nextrow ON nextrow.Seq = thisrow.Seq + 1 and nextrow.AssetID_in=@AssetID_in

    )

    SELECT Seq, Symbol_vc, DayID_in, MV, DailyPerf, RunningPerf

    FROM Calculator

    fetch next from AssetCursor into @AssetID_in

    end

    close AssetCursor

    deallocate AssetCursor

    drop table #dmvrunningperformance

  • I'll tell you that replacing a cursor with a recursive CTE is counter productive and won't buy you much for performance and may actually cost you about 3 times as many reads. Recursion is nothing more than "hidden RBAR" and I wouldn't waste my time changing any cursor or while loop to recursive CTE's.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hey Jeff,

    I have read through the discussion pages related to your article titled "Solving the Running Total & Ordianl Rank Problems". There is also another thread with plenty of arguments in there about why one should or shouldn't use the "quirky update" and why one should or shouldn't use a recursive CTE approach or cursors for that matter. Unfortunately I can't find the thread or posts now but I believe someone named "Hugo" maybe had provided a recusive CTE approach that was almost as fast as the Quirky Update method. Perhaps I read things or am remembering wrong, not sure. I would love to use the quirky update method but after reading all the pros and cons I am still hesitant to use it in a production system. That coupled with the fact that I could never get it to work exactly as I needed (different problem altogether). Anyway I was hoping to find an approach that didn't use cursors or while loops and barring using the quirky update a recursive CTE is what is left (or CLR which is out for other reasons).

    I guess I could go back and rewrite my original post using the Quirky Update and try to get a solution for my last "outstanding" issue. I just remember reading all the discussions and based this recursive CTE on the code that was posted offering an alternative to the quirky update. Maybe you remeber the thread I am referring to.

    Thanks,

    Robb

  • Jeff's article on the 'Quirky Update' can be found here[/url].

    └> bt



    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Robb Melancon (7/20/2010)


    Hey Jeff,

    I have read through the discussion pages related to your article titled "Solving the Running Total & Ordianl Rank Problems". There is also another thread with plenty of arguments in there about why one should or shouldn't use the "quirky update" and why one should or shouldn't use a recursive CTE approach or cursors for that matter. Unfortunately I can't find the thread or posts now but I believe someone named "Hugo" maybe had provided a recusive CTE approach that was almost as fast as the Quirky Update method. Perhaps I read things or am remembering wrong, not sure. I would love to use the quirky update method but after reading all the pros and cons I am still hesitant to use it in a production system. That coupled with the fact that I could never get it to work exactly as I needed (different problem altogether). Anyway I was hoping to find an approach that didn't use cursors or while loops and barring using the quirky update a recursive CTE is what is left (or CLR which is out for other reasons).

    I guess I could go back and rewrite my original post using the Quirky Update and try to get a solution for my last "outstanding" issue. I just remember reading all the discussions and based this recursive CTE on the code that was posted offering an alternative to the quirky update. Maybe you remeber the thread I am referring to.

    Thanks,

    Robb

    Hugo's method is in the discussions (somewhere) that follow the article. In fact, look for where I made an improvement to his code. 😉

    It's a shame you don't trust the quirky update. The rules are simple and even if you take the time to run the verification code, it's still faster than most other methods.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • My problem with using the Quirky Update is this: The procedures in our app that would be rewritten to use the Quirky Update in place of Cursors are very pervasive (in number and in scope) in our application. In the event that it changed/broke because of a Microsoft update/release/patch, there would be no backup plan to fall back on unless I maintained two methods (one using cursors one using quirky update) for every performance related procedure. It's one thing if Microsoft breaks something with a release/patch and I can escalate a "production halt" type issue/bug for them to fix but quite another if they break something that they don't support and now I have to go rewrite my procedures back to using cursors.

    I love the Quirky Update method. I aggree it is simple, consise, consistent and fast as heck as long as you follow the simple rules. I have no problem with a verification procedure that is run after every patch but what do I do when MS breaks it and there is no fix? I have to rewrite everything back to using cursors and fast because the patch we just installed is critical for some other portion of the app. That's the only problem I have with using the Quirky Update.

    I also don't understand why Microsoft doesn't come out in support of it. Clearly it is faster than cursors, cursors are not scaleable and are frowned upon almost universally for large tasks that need to scale. The fact that they don't simply say "yes this is a valid use of our update syntax" is concerning.

  • I can certainly understand that bit of paranoia but consider this... the quirky update has survived all hot fixes, all CU's, all SP's, and all revisions including the major engine changes they made between SQL Server 6.5 and 7 and then again between 2k and 2k5 not to mention all of the updates it went through way back before then (SQL Server has its roots in Sybase).

    Still, I can understand why some folks might not take a chance. In that case, my recommendation is to use the old trigger method of keeping the running total absolutely up to date using triggers on new inserts. If that's just not possible then a well written forward only read only cursor (or While Loop) with the appropriate update will only be 8 or 10 times slower. And, by the way, since cursors are RBAR, they are inherently linear (unless a triangular join is present) and are, therefor, inherently scalable to infinity. Things that are not scalable typically have other than a linear performance where they get worse in a nonlinear fashion as the row count increases.

    As a side bar, it would be interesting to see if someone could come up with a decent highspeed CLR for running totals.

    Other than that, I agree... MS should spend a little time studying the Quirky Update and figure out that it actually does work. That or coming out with proper windowing functions to do running totals would be a big help.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The way they are written currently I use "LOCAL FAST_FORWARD READ_ONLY" and they are relatively quick though Quirky Update is about twice as fast. The problem I have and the reason I say they (cursors) don't "scale" is when I use these same procedures in an SSIS job that uses a lot of parallelism, etc, the job fails or hangs. I replaced the cursors with slow while loops and the job now runs fine and never hangs, but is slow. It could be something else causing the hang but the only difference between running perfectly and hanging is the use of cursors.

    In general the results that I am processing are millions of rows and the footprint of the cursors is pretty large as there are a lot of calculations and data being tossed around. Anyway, I still may wind up using the Quirky Update for the SSIS job and just maintain the two versions, one with the loops and one with the quirky update as these procs will not change much if at all.

  • Joe Celko (7/22/2010)


    Standard SQL has the syntax and you can get it in DB2 and Oracle, but SQL Server is still behind. The Window clause has a [RANGE | ROWS] subclause that acts like a frame within the partition defined off the current row in terms of preceding and following rows. ...

    ...

    Thanks Joe, I had not seen that clause before and will see if I can incorporate that into my performance routine. If it performs well I'll post some comparisons.

  • Robb Melancon (7/22/2010)


    Joe Celko (7/22/2010)


    Standard SQL has the syntax and you can get it in DB2 and Oracle, but SQL Server is still behind. The Window clause has a [RANGE | ROWS] subclause that acts like a frame within the partition defined off the current row in terms of preceding and following rows. ...

    ...

    Thanks Joe, I had not seen that clause before and will see if I can incorporate that into my performance routine. If it performs well I'll post some comparisons.

    Like Joe said, "SQL Server is still behind" meaning that particular clause isn't yet available in SQL Server.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • CELKO (7/22/2010)


    When I worked at a Mormon owned technical school, I was referred to as "ethical diversity in the faculty" and loved it.

    Heh... oddly enough, I can see that actually happening and in a very non-ISO fashion, to boot. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, hopefully you see this... I have decided to implement the Quirky update in a specific part of our app, but in order to do so I have to solve my last issue with it. Hopefully this is a simple thing, basically all I need to do is shift one of the columns up by one row. I tried to put together the simplest example I could think of to demonstrate. My expected results are simply that the number 1 matches 'a', 2 matches 'b', etc. I can ignore the last row and in my example 'x' would be the match for 'g'. The example hopefully is easy to understand.

    create table #RunningTotal (

    AChar varchar(1),

    ANum varchar(1))

    CREATE CLUSTERED INDEX [indexdown] ON #RunningTotal

    (

    AChar desc

    )

    WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]

    insert into #RunningTotal

    values ('a','0')

    insert into #RunningTotal

    values ('b','1')

    insert into #RunningTotal

    values ('c','2')

    insert into #RunningTotal

    values ('d','3')

    insert into #RunningTotal

    values ('e','4')

    insert into #RunningTotal

    values ('f','5')

    insert into #RunningTotal

    values ('g','6')

    declare @Char varchar(1)='0',

    @Num varchar(1)='x'

    update #RunningTotal

    set

    @Char=AChar,

    ANum=@Num,

    @Num=ANum

    FROM #RunningTotal OPTION (MAXDOP 1)

    select * from #RunningTotal

    drop table #RunningTotal

  • Damn. I'm sorry. I lost track of this one. Are you all set, Robb?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply