March 7, 2009 at 8:15 am
Hi folks
I need to derive a measure using a value from the current record, and another value from a (related) record n-rows away. The data set is non-contiguous and contains nearly 30,000 records.
My attempts to resolve this have gravitated towards cursors - which I understand should be a last resort. I know this is possible using sets, but my T-SQL skills are falling short.
The formula, with source data, and expected results follow.
Formula: Current close minus close n (trading) days ago. For this example n = 5
Example (with reference to sample data below)
For COMM_SYMB = CL
CLS on day 717 = 62,79. CLS (five trading days earlier, day 710) = 61.22
Therefore momentum on day 717 is: 62.79 - 61.22 = 1.57
Thanks in advance,
Wayne
-- Source data
DROP TABLE dbo.MOM_TEST
CREATE TABLE dbo.MOM_TEST
(
TIME_KEY INT NOT NULL,
COMM_SYMB NVARCHAR(3) NOT NULL,
CLS FLOAT NOT NULL,
MOM FLOAT
)
INSERT INTO dbo.MOM_TEST(TIME_KEY, COMM_SYMB, CLS, MOM)
SELECT 703, 'BO', 29.34, NULL UNION ALL
SELECT 703, 'CL', 62.44, NULL UNION ALL
SELECT 703, 'HO', 1.8089, NULL UNION ALL
SELECT 704, 'BO', 29.34, NULL UNION ALL
SELECT 704, 'CL', 62.43, NULL UNION ALL
SELECT 704, 'HO', 1.7983, NULL UNION ALL
SELECT 705, 'BO', 28.78, NULL UNION ALL
SELECT 705, 'CL', 62.19, NULL UNION ALL
SELECT 705, 'HO', 1.794, NULL UNION ALL
SELECT 706, 'BO', 28.73, NULL UNION ALL
SELECT 706, 'CL', 62.49, NULL UNION ALL
SELECT 706, 'HO', 1.7788, NULL UNION ALL
SELECT 707, 'BO', 28.63, NULL UNION ALL
SELECT 707, 'CL', 62.03, NULL UNION ALL
SELECT 707, 'HO', 1.7573, NULL UNION ALL
SELECT 710, 'BO', 29.03, NULL UNION ALL
SELECT 710, 'CL', 61.22, NULL UNION ALL
SELECT 710, 'HO', 1.7243, NULL UNION ALL
SELECT 711, 'BO', 29.14, NULL UNION ALL
SELECT 711, 'CL', 61.02, NULL UNION ALL
SELECT 711, 'HO', 1.7224, NULL UNION ALL
SELECT 712, 'BO', 29.02, NULL UNION ALL
SELECT 712, 'CL', 62.17, NULL UNION ALL
SELECT 712, 'HO', 1.732, NULL UNION ALL
SELECT 713, 'BO', 28.79, NULL UNION ALL
SELECT 713, 'CL', 63.33, NULL UNION ALL
SELECT 713, 'HO', 1.7765, NULL UNION ALL
SELECT 714, 'BO', 28.59, NULL UNION ALL
SELECT 714, 'CL', 64.09, NULL UNION ALL
SELECT 714, 'HO', 1.7817, NULL UNION ALL
SELECT 717, 'BO', 28.15, NULL UNION ALL
SELECT 717, 'CL', 62.79, NULL UNION ALL
SELECT 717, 'HO', 1.7552, NULL UNION ALL
SELECT 718, 'BO', 28.5, NULL UNION ALL
SELECT 718, 'CL', 63.46, NULL UNION ALL
SELECT 718, 'HO', 1.7592, NULL UNION ALL
SELECT 719, 'BO', 28.53, NULL UNION ALL
SELECT 719, 'CL', 63.72, NULL UNION ALL
SELECT 719, 'HO', 1.7693, NULL UNION ALL
SELECT 720, 'BO', 28.58, NULL UNION ALL
SELECT 720, 'CL', 62.66, NULL UNION ALL
SELECT 720, 'HO', 1.74, NULL UNION ALL
SELECT 721, 'BO', 28.84, NULL UNION ALL
SELECT 721, 'CL', 62.41, NULL UNION ALL
SELECT 721, 'HO', 1.7213, NULL
-- end source data
-- Expected Results
DROP TABLE dbo.MOM_ER
CREATE TABLE dbo.MOM_ER
(
TIME_KEY INT NOT NULL,
COMM_SYMB NVARCHAR(3) NOT NULL,
CLS FLOAT NOT NULL,
MOM FLOAT
)
INSERT INTO dbo.MOM_ER(TIME_KEY, COMM_SYMB, CLS, MOM)
SELECT 703, 'BO', 29.34, 0 UNION ALL
SELECT 703, 'CL', 62.44, 0 UNION ALL
SELECT 703, 'HO', 1.8089, 0 UNION ALL
SELECT 704, 'BO', 29.34, 0 UNION ALL
SELECT 704, 'CL', 62.43, 0 UNION ALL
SELECT 704, 'HO', 1.7983, 0 UNION ALL
SELECT 705, 'BO', 28.78, 0 UNION ALL
SELECT 705, 'CL', 62.19, 0 UNION ALL
SELECT 705, 'HO', 1.794, 0 UNION ALL
SELECT 706, 'BO', 28.73, 0 UNION ALL
SELECT 706, 'CL', 62.49, 0 UNION ALL
SELECT 706, 'HO', 1.7788, 0 UNION ALL
SELECT 707, 'BO', 28.63, 0 UNION ALL
SELECT 707, 'CL', 62.03, 0 UNION ALL
SELECT 707, 'HO', 1.7573, 0 UNION ALL
SELECT 710, 'BO', 29.03, -0.309999999999999 UNION ALL
SELECT 710, 'CL', 61.22, -1.22 UNION ALL
SELECT 710, 'HO', 1.7243, -0.0846 UNION ALL
SELECT 711, 'BO', 29.14, -0.199999999999999 UNION ALL
SELECT 711, 'CL', 61.02, -1.41 UNION ALL
SELECT 711, 'HO', 1.7224, -0.0759000000000001 UNION ALL
SELECT 712, 'BO', 29.02, 0.239999999999998 UNION ALL
SELECT 712, 'CL', 62.17, -0.019999999999996 UNION ALL
SELECT 712, 'HO', 1.732, -0.0620000000000001 UNION ALL
SELECT 713, 'BO', 28.79, 0.0599999999999987 UNION ALL
SELECT 713, 'CL', 63.33, 0.839999999999996 UNION ALL
SELECT 713, 'HO', 1.7765, -0.00229999999999997 UNION ALL
SELECT 714, 'BO', 28.59, -0.0399999999999991 UNION ALL
SELECT 714, 'CL', 64.09, 2.06 UNION ALL
SELECT 714, 'HO', 1.7817, 0.0244 UNION ALL
SELECT 717, 'BO', 28.15, -0.880000000000003 UNION ALL
SELECT 717, 'CL', 62.79, 1.57 UNION ALL
SELECT 717, 'HO', 1.7552, 0.0309000000000001 UNION ALL
SELECT 718, 'BO', 28.5, -0.640000000000001 UNION ALL
SELECT 718, 'CL', 63.46, 2.44 UNION ALL
SELECT 718, 'HO', 1.7592, 0.0368000000000002 UNION ALL
SELECT 719, 'BO', 28.53, -0.489999999999998 UNION ALL
SELECT 719, 'CL', 63.72, 1.55 UNION ALL
SELECT 719, 'HO', 1.7693, 0.0373000000000001 UNION ALL
SELECT 720, 'BO', 28.58, -0.210000000000001 UNION ALL
SELECT 720, 'CL', 62.66, -0.670000000000002 UNION ALL
SELECT 720, 'HO', 1.74, -0.0365 UNION ALL
SELECT 721, 'BO', 28.84, 0.25 UNION ALL
SELECT 721, 'CL', 62.41, -1.68000000000001 UNION ALL
SELECT 721, 'HO', 1.7213, -0.0604
-- End expected Results
March 7, 2009 at 8:43 am
did you test using row_number or ranking functions ?
That in combination with a CTE or temp table may help out on this kind of stuff.
Check books online or SSC for examples:w00t:
You should at least read Jeff or Lynns article on running totals,...
(Start with Jeffs, because that one is also refered to by Lynn)
Jeffs: http://www.sqlservercentral.com/articles/Advanced+Querying/61716/
Lynns: http://www.sqlservercentral.com/articles/T-SQL/65522/
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution ๐
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
March 7, 2009 at 12:23 pm
And, sorry... the running total article I wrote is in the process of being rewritten.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 7, 2009 at 12:38 pm
Actually, in SQL Server 2k5, we don't need to solve the running total problem for this because it's just a ranking problem... this should do it...
;WITH
cteGroup AS
(
SELECT DENSE_RANK() OVER (ORDER BY Time_Key) AS Time_KeyGroup,
Time_Key, Comm_Symb, CLS
FROM dbo.Mom_Test
)
SELECT t1.Time_Key, t1.Comm_Symb, t1.CLS,
t1.CLS - t2.CLS AS Mom
FROM cteGroup t1
LEFT OUTER JOIN cteGroup t2
ON t1.Time_KeyGroup-5 = t2.Time_KeyGroup
AND t1.Comm_Symb = t2.Comm_Symb
--Jeff Moden
Change is inevitable... Change for the better is not.
March 7, 2009 at 2:16 pm
Many thanks Jeff, much appreciated.
I was hammering away with the row number function, but never got it to work successfully.
Wayne
March 7, 2009 at 2:29 pm
Thanks for the feedback, Wayne. I appreciate you taking the time as well posting your data in such a readily consumable format.
As a sidebar, even though it makes for code that's very short and easy on the eyes, a self referencing CTE may actually cause a bit of a performance problem compared to what I consider to be optimal performance. The reason for the performance problem is two fold...
Calling the CTE twice actually causes the CTE to execute twice much like it's near cousin, the derived table, would (you'd actually need to list the code twice there). The other problem may occur if the result set from the CTE is large... since part of the join is being done on a calculated column, there's virtually no chance of an index being used in the join between the two result sets of the CTE. It would likely be better to put the result set, with an additional column of precalculated column of Time_KeyGroup minus 5 (called Time_KeyGroup5) into a Temp table, index both Time_KeyGroup and the Time_KeyGroup5 columns, and then do the join. The use of SELECT/INTO will make the creation of the Temp table very, very fast even for some of the larger result sets I can see this being used for.
The only method even faster than that, would be to simply add the Time_KeyGroup columns, along with the appropriate indexes, to the orginal table and keep them up to date.
For all those getting ready to hammer me about using SELECT/INTO and the myth about the table locks it causes, it's not been a problem since version 6.5 sp1. Please don't take my word for it, though... instead, verify what I've just said through the following link, please...
http://support.microsoft.com/kb/153441/EN-US/
--Jeff Moden
Change is inevitable... Change for the better is not.
March 9, 2009 at 3:41 am
Jeff Moden (3/7/2009)
Actually, in SQL Server 2k5, we don't need to solve the running total problem for this because it's just a ranking problem... this should do it...
;WITH
cteGroup AS
(
SELECT DENSE_RANK() OVER (ORDER BY Time_Key) AS Time_KeyGroup,
Time_Key, Comm_Symb, CLS
FROM dbo.Mom_Test
)
SELECT t1.Time_Key, t1.Comm_Symb, t1.CLS,
t1.CLS - t2.CLS AS Mom
FROM cteGroup t1
LEFT OUTER JOIN cteGroup t2
ON t1.Time_KeyGroup-5 = t2.Time_KeyGroup
AND t1.Comm_Symb = t2.Comm_Symb
Surely it doesn't need to be this complex? This generates exactly the same results as are found in the Expected Results table:
SELECT now.*, ISNULL(now.CLS-old.CLS, 0) AS calcMOM
FROM #MOM_ER now
LEFT JOIN #MOM_ER old
ON old.COMM_SYMB = now.COMM_SYMB AND old.TIME_KEY + 7 = now.TIME_KEY
Unless I'm missing something?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
March 9, 2009 at 4:34 pm
Chris Morris (3/9/2009)
Jeff Moden (3/7/2009)
Actually, in SQL Server 2k5, we don't need to solve the running total problem for this because it's just a ranking problem... this should do it...
;WITH
cteGroup AS
(
SELECT DENSE_RANK() OVER (ORDER BY Time_Key) AS Time_KeyGroup,
Time_Key, Comm_Symb, CLS
FROM dbo.Mom_Test
)
SELECT t1.Time_Key, t1.Comm_Symb, t1.CLS,
t1.CLS - t2.CLS AS Mom
FROM cteGroup t1
LEFT OUTER JOIN cteGroup t2
ON t1.Time_KeyGroup-5 = t2.Time_KeyGroup
AND t1.Comm_Symb = t2.Comm_Symb
Surely it doesn't need to be this complex? This generates exactly the same results as are found in the Expected Results table:
SELECT now.*, ISNULL(now.CLS-old.CLS, 0) AS calcMOM
FROM #MOM_ER now
LEFT JOIN #MOM_ER old
ON old.COMM_SYMB = now.COMM_SYMB AND old.TIME_KEY + 7 = now.TIME_KEY
Unless I'm missing something?
Haven't tested yours but I don't believe they generate the same thing... your's looks like it picks up the Time_Keys that are a week apart... mine picks up the Time_keys from whatever data was available 5 entries ago.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 9, 2009 at 7:56 pm
Jeff is right. I have to cater for non-trading days other than weekends (which are not included in my sample data) so having a 7 day difference from the time key will not always work. I also vary these windows, some being up to 40 periods which will include many non-trading days.
Thanks,
Wayne
March 10, 2009 at 11:53 am
Jeff,
Thanks for posting the link about SELECT INTO. I've been given grief by other DBA's for using this technique for years - claiming that it locked sysobjects in tempdb. I use it because, as you stated, it's absolutely the fastest way to create a temp table.
I bench marked a report procedure running it many times consecutively so cache wasn't a consideration and the CREATE TABLE/INSERT INTO ran from 1.5 to 1.9 seconds each time. The SELECT INTO version ran in 300 ms each time.
Todd Fifield
March 10, 2009 at 12:58 pm
You bet, Todd... Thanks for the feedback. Heh... It's amazing how long the myth of Select/Into has managed to survive.
I'll have to look for it, but I've got a bit of test code hanging around somewhere that proves that there are no blocking locks and that multiple parallel routines can all be doing SELECT/INTO all at the same time.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 10, 2009 at 1:03 pm
Jeff Moden (3/10/2009)
You bet, Todd... Thanks for the feedback. Heh... It's amazing how long the myth of Select/Into has managed to survive.I'll have to look for it, but I've got a bit of test code hanging around somewhere that proves that there are no blocking locks and that multiple parallel routines can all be doing SELECT/INTO all at the same time.
It's easy enough to prove. Open two connections in Management Studio. Write a script that selects something into a temp table that will take a humanly-measurable amount of time to finish. Call it 10 seconds to be on the safe side. Copy the script into both connections, and run them at the same time.
Doesn't have the mathematical proof factor to it, but it sure does make it very, very real to people.
If they still doubted after that, I'd also open the Management Studio Current Activity window and watch that with them while it was running. Shows blocks in there.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 10, 2009 at 1:43 pm
Wayne (3/9/2009)
Jeff is right. I have to cater for non-trading days other than weekends (which are not included in my sample data) so having a 7 day difference from the time key will not always work. I also vary these windows, some being up to 40 periods which will include many non-trading days.Thanks,
Wayne
Thanks guys, thought I was losing me marbles. Nah don't answer that ๐
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
March 10, 2009 at 2:02 pm
GSquared (3/10/2009)
Jeff Moden (3/10/2009)
You bet, Todd... Thanks for the feedback. Heh... It's amazing how long the myth of Select/Into has managed to survive.I'll have to look for it, but I've got a bit of test code hanging around somewhere that proves that there are no blocking locks and that multiple parallel routines can all be doing SELECT/INTO all at the same time.
It's easy enough to prove. Open two connections in Management Studio. Write a script that selects something into a temp table that will take a humanly-measurable amount of time to finish. Call it 10 seconds to be on the safe side. Copy the script into both connections, and run them at the same time.
Doesn't have the mathematical proof factor to it, but it sure does make it very, very real to people.
If they still doubted after that, I'd also open the Management Studio Current Activity window and watch that with them while it was running. Shows blocks in there.
That's kinda what my code example does... except one takes 60 seconds to run and the other one takes 10 seconds to run and starts 10 seconds after the first one with log marks for start and end time. Don't depend on when stuff get's returned to the screen, either. Screen returns have about the lowest priority in the world in 2k and not much faster in 2k5. Ya gotta mark start and end times for both scripts.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 10, 2009 at 2:07 pm
If you set nocount on, and all both scripts do is the select into, there's nothing to show on the screen except that it completed.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply