July 20, 2010 at 3:27 pm
I am trying to use recursive CTEs to replace cursors and I am having trouble figuring out how to "reset" certain fields when a "key" field changes. In the example provided I am calculating performance for stocks and I have a recursive CTE that works for a single stock. (see results of the first CTE for 'IBM' in the example). The problem comes when I have more than one stock I can't figure out how to "reset" performance when the "loop" changes from one Asset/Stock to the next. So when I add the rows for the second stock (T), the way I have it coded it won't work because I am joining simply on sequence where I need to have the AssetID in there as sequences are repeated for each stock and need to reset running performance to 0 when the asset changes. I'm just not sure how to add the assetID in the joins and have it repeat for each stock. I'm thinking an "outer" recursive CTE that loops the AssetIDs for the "inner" CTE that calculates the performance but I am having trouble figuring how that would work. Hoping a recusive CTE expert can help out.
Thanks in advance,
Robb
Here's the code and example:
create table #DMVRunningPerformance (
Seq int,
LongPosition_bt bit null,
AssetID_in int null,
Symbol_vc varchar(255),
DayID_in int,
Status_vc varchar(255) null,
MV float,
DailyPerf float null,
RunningPerf float null)
CREATE CLUSTERED INDEX [indexdown] ON #DMVRunningPerformance
(
[Seq] ASC,
[LongPosition_bt] ASC,
[AssetID_in] ASC,
[DayID_in] DESC
)
WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
-- insert first Asset
insert into #DMVRunningPerformance
values (1,1,100,'IBM',75006,NULL,201048.8987,NULL,NULL)
insert into #DMVRunningPerformance
values (2,1,100,'IBM',75005,NULL,200841.5658,NULL,NULL)
insert into #DMVRunningPerformance
values (3,1,100,'IBM',75004,NULL,200321.7043,NULL,NULL)
insert into #DMVRunningPerformance
values (4,1,100,'IBM',75003,NULL,201120.0467,NULL,NULL)
insert into #DMVRunningPerformance
values (5,1,100,'IBM',75002,NULL,201779.8805,NULL,NULL)
insert into #DMVRunningPerformance
values (6,1,100,'IBM',75001,NULL,201651.3917,NULL,NULL)
insert into #DMVRunningPerformance
values (7,1,100,'IBM',75000,NULL,201651.3917,NULL,NULL)
declare @AssetID_in int=-1,
@NDAssetID_in int=-1,
@LongPosition_bt bit=0,
@Status_vc varchar(255)=NULL,
@NewGroup_bt bit=1,
@DailyPerformance float,
@RunningPerformance float,
@NDMarketValue float
;WITH Calculator AS (
-- Anchor row, tr.Seq = 1
SELECT thisrow.Seq, thisrow.Symbol_vc, thisrow.DayID_in, thisrow.MV,
DailyPerf=CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT),
RunningPerf = CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT)
FROM #DMVRunningPerformance thisrow
INNER JOIN #DMVRunningPerformance nextrow ON nextrow.Seq = thisrow.Seq + 1
WHERE thisrow.Seq = 1
UNION ALL
-- first recursion will be Seq = 2 (thisrow.Seq = 2, lastrow.Seq = 1)
SELECT thisrow.Seq, thisrow.Symbol_vc, thisrow.DayID_in, thisrow.MV,
DailyPerf = CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT),
RunningPerf = ((1 + lastrow.RunningPerf) * (1 + CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT))) - 1
FROM Calculator lastrow
INNER JOIN #DMVRunningPerformance thisrow ON thisrow.Seq = lastrow.Seq + 1
INNER JOIN #DMVRunningPerformance nextrow ON nextrow.Seq = thisrow.Seq + 1
)
SELECT Seq, Symbol_vc, DayID_in, MV, DailyPerf, RunningPerf
FROM Calculator
-- insert second Asset
insert into #DMVRunningPerformance
values (1,1,200,'T',75006,NULL,6484,NULL,NULL)
insert into #DMVRunningPerformance
values (2,1,200,'T',75005,NULL,6450,NULL,NULL)
insert into #DMVRunningPerformance
values (3,1,200,'T',75004,NULL,6398,NULL,NULL)
insert into #DMVRunningPerformance
values (4,1,200,'T',75003,NULL,6397,NULL,NULL)
insert into #DMVRunningPerformance
values (5,1,200,'T',75002,NULL,6446,NULL,NULL)
insert into #DMVRunningPerformance
values (6,1,200,'T',75001,NULL,6240,NULL,NULL)
insert into #DMVRunningPerformance
values (7,1,200,'T',75000,NULL,6124,NULL,NULL)
-- this obviously will not work as it needs to work on one assetid at a time
-- so how do I "reset" the sequence back to 1 and the running performance back to 0 for each assetid?
;WITH Calculator AS (
-- Anchor row, tr.Seq = 1
SELECT thisrow.Seq, thisrow.Symbol_vc, thisrow.DayID_in, thisrow.MV,
DailyPerf=CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT),
RunningPerf = CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT)
FROM #DMVRunningPerformance thisrow
INNER JOIN #DMVRunningPerformance nextrow ON nextrow.Seq = thisrow.Seq + 1
WHERE thisrow.Seq = 1
UNION ALL
-- first recursion will be Seq = 2 (thisrow.Seq = 2, lastrow.Seq = 1)
SELECT thisrow.Seq, thisrow.Symbol_vc, thisrow.DayID_in, thisrow.MV,
DailyPerf = CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT),
RunningPerf = ((1 + lastrow.RunningPerf) * (1 + CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT))) - 1
FROM Calculator lastrow
INNER JOIN #DMVRunningPerformance thisrow ON thisrow.Seq = lastrow.Seq + 1
INNER JOIN #DMVRunningPerformance nextrow ON nextrow.Seq = thisrow.Seq + 1
)
SELECT Seq, Symbol_vc, DayID_in, MV, DailyPerf, RunningPerf
FROM Calculator
drop table #dmvrunningperformance
July 20, 2010 at 3:40 pm
Here is an example of a solution using an outer cursor with an inner recursive CTE. I'm trying to eliminate the outer cursor (and return it all as a single resultset)
create table #DMVRunningPerformance (
Seq int,
LongPosition_bt bit null,
AssetID_in int null,
Symbol_vc varchar(255),
DayID_in int,
Status_vc varchar(255) null,
MV float,
DailyPerf float null,
RunningPerf float null)
CREATE CLUSTERED INDEX [indexdown] ON #DMVRunningPerformance
(
[Seq] ASC,
[LongPosition_bt] ASC,
[AssetID_in] ASC,
[DayID_in] DESC
)
WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
-- insert first Asset
insert into #DMVRunningPerformance
values (1,1,100,'IBM',75006,NULL,201048.8987,NULL,NULL)
insert into #DMVRunningPerformance
values (2,1,100,'IBM',75005,NULL,200841.5658,NULL,NULL)
insert into #DMVRunningPerformance
values (3,1,100,'IBM',75004,NULL,200321.7043,NULL,NULL)
insert into #DMVRunningPerformance
values (4,1,100,'IBM',75003,NULL,201120.0467,NULL,NULL)
insert into #DMVRunningPerformance
values (5,1,100,'IBM',75002,NULL,201779.8805,NULL,NULL)
insert into #DMVRunningPerformance
values (6,1,100,'IBM',75001,NULL,201651.3917,NULL,NULL)
insert into #DMVRunningPerformance
values (7,1,100,'IBM',75000,NULL,201651.3917,NULL,NULL)
-- insert second Asset
insert into #DMVRunningPerformance
values (1,1,200,'T',75006,NULL,6484,NULL,NULL)
insert into #DMVRunningPerformance
values (2,1,200,'T',75005,NULL,6450,NULL,NULL)
insert into #DMVRunningPerformance
values (3,1,200,'T',75004,NULL,6398,NULL,NULL)
insert into #DMVRunningPerformance
values (4,1,200,'T',75003,NULL,6397,NULL,NULL)
insert into #DMVRunningPerformance
values (5,1,200,'T',75002,NULL,6446,NULL,NULL)
insert into #DMVRunningPerformance
values (6,1,200,'T',75001,NULL,6240,NULL,NULL)
insert into #DMVRunningPerformance
values (7,1,200,'T',75000,NULL,6124,NULL,NULL)
declare @AssetID_in int=-1,
@NDAssetID_in int=-1,
@LongPosition_bt bit=0,
@Status_vc varchar(255)=NULL,
@NewGroup_bt bit=1,
@DailyPerformance float,
@RunningPerformance float,
@NDMarketValue float
declare AssetCursor cursor for
select distinct AssetID_in from #DMVRunningPerformance
open AssetCursor
fetch next from AssetCursor into @AssetID_in
while @@FETCH_STATUS=0
begin
WITH Calculator AS (
-- Anchor row, tr.Seq = 1
SELECT thisrow.Seq, thisrow.Symbol_vc, thisrow.DayID_in, thisrow.MV,
DailyPerf=CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT),
RunningPerf = CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT)
FROM #DMVRunningPerformance thisrow
INNER JOIN #DMVRunningPerformance nextrow ON nextrow.Seq = thisrow.Seq + 1 and nextrow.AssetID_in=@AssetID_in
WHERE thisrow.Seq = 1 and thisrow.AssetID_in=@AssetID_in
UNION ALL
-- first recursion will be Seq = 2 (thisrow.Seq = 2, lastrow.Seq = 1)
SELECT thisrow.Seq, thisrow.Symbol_vc, thisrow.DayID_in, thisrow.MV,
DailyPerf = CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT),
RunningPerf = ((1 + lastrow.RunningPerf) * (1 + CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT))) - 1
FROM Calculator lastrow
INNER JOIN #DMVRunningPerformance thisrow ON thisrow.Seq = lastrow.Seq + 1 and thisrow.AssetID_in=@AssetID_in
INNER JOIN #DMVRunningPerformance nextrow ON nextrow.Seq = thisrow.Seq + 1 and nextrow.AssetID_in=@AssetID_in
)
SELECT Seq, Symbol_vc, DayID_in, MV, DailyPerf, RunningPerf
FROM Calculator
fetch next from AssetCursor into @AssetID_in
end
close AssetCursor
deallocate AssetCursor
drop table #dmvrunningperformance
July 20, 2010 at 4:20 pm
I'll tell you that replacing a cursor with a recursive CTE is counter productive and won't buy you much for performance and may actually cost you about 3 times as many reads. Recursion is nothing more than "hidden RBAR" and I wouldn't waste my time changing any cursor or while loop to recursive CTE's.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 20, 2010 at 5:15 pm
Hey Jeff,
I have read through the discussion pages related to your article titled "Solving the Running Total & Ordianl Rank Problems". There is also another thread with plenty of arguments in there about why one should or shouldn't use the "quirky update" and why one should or shouldn't use a recursive CTE approach or cursors for that matter. Unfortunately I can't find the thread or posts now but I believe someone named "Hugo" maybe had provided a recusive CTE approach that was almost as fast as the Quirky Update method. Perhaps I read things or am remembering wrong, not sure. I would love to use the quirky update method but after reading all the pros and cons I am still hesitant to use it in a production system. That coupled with the fact that I could never get it to work exactly as I needed (different problem altogether). Anyway I was hoping to find an approach that didn't use cursors or while loops and barring using the quirky update a recursive CTE is what is left (or CLR which is out for other reasons).
I guess I could go back and rewrite my original post using the Quirky Update and try to get a solution for my last "outstanding" issue. I just remember reading all the discussions and based this recursive CTE on the code that was posted offering an alternative to the quirky update. Maybe you remeber the thread I am referring to.
Thanks,
Robb
July 20, 2010 at 6:07 pm
July 21, 2010 at 12:13 pm
Robb Melancon (7/20/2010)
Hey Jeff,I have read through the discussion pages related to your article titled "Solving the Running Total & Ordianl Rank Problems". There is also another thread with plenty of arguments in there about why one should or shouldn't use the "quirky update" and why one should or shouldn't use a recursive CTE approach or cursors for that matter. Unfortunately I can't find the thread or posts now but I believe someone named "Hugo" maybe had provided a recusive CTE approach that was almost as fast as the Quirky Update method. Perhaps I read things or am remembering wrong, not sure. I would love to use the quirky update method but after reading all the pros and cons I am still hesitant to use it in a production system. That coupled with the fact that I could never get it to work exactly as I needed (different problem altogether). Anyway I was hoping to find an approach that didn't use cursors or while loops and barring using the quirky update a recursive CTE is what is left (or CLR which is out for other reasons).
I guess I could go back and rewrite my original post using the Quirky Update and try to get a solution for my last "outstanding" issue. I just remember reading all the discussions and based this recursive CTE on the code that was posted offering an alternative to the quirky update. Maybe you remeber the thread I am referring to.
Thanks,
Robb
Hugo's method is in the discussions (somewhere) that follow the article. In fact, look for where I made an improvement to his code. 😉
It's a shame you don't trust the quirky update. The rules are simple and even if you take the time to run the verification code, it's still faster than most other methods.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 22, 2010 at 11:36 am
My problem with using the Quirky Update is this: The procedures in our app that would be rewritten to use the Quirky Update in place of Cursors are very pervasive (in number and in scope) in our application. In the event that it changed/broke because of a Microsoft update/release/patch, there would be no backup plan to fall back on unless I maintained two methods (one using cursors one using quirky update) for every performance related procedure. It's one thing if Microsoft breaks something with a release/patch and I can escalate a "production halt" type issue/bug for them to fix but quite another if they break something that they don't support and now I have to go rewrite my procedures back to using cursors.
I love the Quirky Update method. I aggree it is simple, consise, consistent and fast as heck as long as you follow the simple rules. I have no problem with a verification procedure that is run after every patch but what do I do when MS breaks it and there is no fix? I have to rewrite everything back to using cursors and fast because the patch we just installed is critical for some other portion of the app. That's the only problem I have with using the Quirky Update.
I also don't understand why Microsoft doesn't come out in support of it. Clearly it is faster than cursors, cursors are not scaleable and are frowned upon almost universally for large tasks that need to scale. The fact that they don't simply say "yes this is a valid use of our update syntax" is concerning.
July 22, 2010 at 11:53 am
I can certainly understand that bit of paranoia but consider this... the quirky update has survived all hot fixes, all CU's, all SP's, and all revisions including the major engine changes they made between SQL Server 6.5 and 7 and then again between 2k and 2k5 not to mention all of the updates it went through way back before then (SQL Server has its roots in Sybase).
Still, I can understand why some folks might not take a chance. In that case, my recommendation is to use the old trigger method of keeping the running total absolutely up to date using triggers on new inserts. If that's just not possible then a well written forward only read only cursor (or While Loop) with the appropriate update will only be 8 or 10 times slower. And, by the way, since cursors are RBAR, they are inherently linear (unless a triangular join is present) and are, therefor, inherently scalable to infinity. Things that are not scalable typically have other than a linear performance where they get worse in a nonlinear fashion as the row count increases.
As a side bar, it would be interesting to see if someone could come up with a decent highspeed CLR for running totals.
Other than that, I agree... MS should spend a little time studying the Quirky Update and figure out that it actually does work. That or coming out with proper windowing functions to do running totals would be a big help.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 22, 2010 at 2:44 pm
The way they are written currently I use "LOCAL FAST_FORWARD READ_ONLY" and they are relatively quick though Quirky Update is about twice as fast. The problem I have and the reason I say they (cursors) don't "scale" is when I use these same procedures in an SSIS job that uses a lot of parallelism, etc, the job fails or hangs. I replaced the cursors with slow while loops and the job now runs fine and never hangs, but is slow. It could be something else causing the hang but the only difference between running perfectly and hanging is the use of cursors.
In general the results that I am processing are millions of rows and the footprint of the cursors is pretty large as there are a lot of calculations and data being tossed around. Anyway, I still may wind up using the Quirky Update for the SSIS job and just maintain the two versions, one with the loops and one with the quirky update as these procs will not change much if at all.
July 22, 2010 at 2:47 pm
Joe Celko (7/22/2010)
Standard SQL has the syntax and you can get it in DB2 and Oracle, but SQL Server is still behind. The Window clause has a [RANGE | ROWS] subclause that acts like a frame within the partition defined off the current row in terms of preceding and following rows. ......
Thanks Joe, I had not seen that clause before and will see if I can incorporate that into my performance routine. If it performs well I'll post some comparisons.
July 22, 2010 at 5:47 pm
Robb Melancon (7/22/2010)
Joe Celko (7/22/2010)
Standard SQL has the syntax and you can get it in DB2 and Oracle, but SQL Server is still behind. The Window clause has a [RANGE | ROWS] subclause that acts like a frame within the partition defined off the current row in terms of preceding and following rows. ......
Thanks Joe, I had not seen that clause before and will see if I can incorporate that into my performance routine. If it performs well I'll post some comparisons.
Like Joe said, "SQL Server is still behind" meaning that particular clause isn't yet available in SQL Server.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 25, 2010 at 1:34 pm
CELKO (7/22/2010)
When I worked at a Mormon owned technical school, I was referred to as "ethical diversity in the faculty" and loved it.
Heh... oddly enough, I can see that actually happening and in a very non-ISO fashion, to boot. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
July 29, 2010 at 9:33 am
Jeff, hopefully you see this... I have decided to implement the Quirky update in a specific part of our app, but in order to do so I have to solve my last issue with it. Hopefully this is a simple thing, basically all I need to do is shift one of the columns up by one row. I tried to put together the simplest example I could think of to demonstrate. My expected results are simply that the number 1 matches 'a', 2 matches 'b', etc. I can ignore the last row and in my example 'x' would be the match for 'g'. The example hopefully is easy to understand.
create table #RunningTotal (
AChar varchar(1),
ANum varchar(1))
CREATE CLUSTERED INDEX [indexdown] ON #RunningTotal
(
AChar desc
)
WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
insert into #RunningTotal
values ('a','0')
insert into #RunningTotal
values ('b','1')
insert into #RunningTotal
values ('c','2')
insert into #RunningTotal
values ('d','3')
insert into #RunningTotal
values ('e','4')
insert into #RunningTotal
values ('f','5')
insert into #RunningTotal
values ('g','6')
declare @Char varchar(1)='0',
@Num varchar(1)='x'
update #RunningTotal
set
@Char=AChar,
ANum=@Num,
@Num=ANum
FROM #RunningTotal OPTION (MAXDOP 1)
select * from #RunningTotal
drop table #RunningTotal
October 5, 2010 at 5:06 am
Damn. I'm sorry. I lost track of this one. Are you all set, Robb?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply