Updating a Column Using a Recursive Definition

Question

Post reply

Updating a Column Using a Recursive Definition

bkmooney

SSCrazy

Points: 2041
More actions
December 10, 2008 at 2:59 pm

#194869

I am seeking to use SQL to find an exponential weighted average, which is defined recursively.
If V_n = value on day n
The exponential weighted average can be defined as:
ExpAvg_n := 0.03*V_n+0.97*ExpAvg_n-1
The problem is, I don't know how to deal with this kind of a recursive definition using SQL and can't seem to find any good resources.
Has anyone encountered (and conquered) this problem before?
Any help here is greatly appreciated.
Thanks!

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 1

DECLARE@Sample TABLE

(

ID INT PRIMARY KEY CLUSTERED,

Number INT,

expAvg FLOAT

)

INSERT@Sample

(

ID,

Number

)

SELECT TOP 10Number,

ABS(CHECKSUM(NEWID())) % 50000

FROMmaster..spt_values

WHEREType = 'P'

ORDER BYNumber

DECLARE@expAvg FLOAT

SET@expAvg = 0

UPDATE@Sample

SET@expAvg = ExpAvg = 0.03E * Number + 0.97E * @expAvg

SELECT*

FROM@Sample

N 56°04'39.16"
E 12°55'05.25"

Jeff Moden SSC Guru Points: 1004425 More actions · Answer 2

Nicely done, Peter.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

johncyriac SSCommitted Points: 1542 More actions · Answer 3

johncyriac

SSCommitted

Points: 1542

December 10, 2008 at 11:20 pm

#910835

Have a look at Tally/Number Table methods also

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 4

Thank you Jeff!

The use of master..spt_values or a TallyNumber table is ONLY for generating sample data.

The working piece of algorithm is the UPDATE part.

N 56°04'39.16"
E 12°55'05.25"

Jeff Moden SSC Guru Points: 1004425 More actions · Answer 5

Heh... that bit of info should be as a comment in the code. 😉

Anyway, works good.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004425 More actions · Answer 6

johncyriac (12/10/2008)
Have a look at Tally/Number Table methods also

Sure... got any links? 😉 And, how do they apply for this particular problem?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Ronald H SSCarpal Tunnel Points: 4053 More actions · Answer 7

Jeff Moden (12/11/2008)
johncyriac (12/10/2008)
Have a look at Tally/Number Table methods also
Sure... got any links? 😉 And, how do they apply for this particular problem?

Tip: add next time 'from the great articles by Jeff Moden' 😉 And I have to say, you would be correct. Though I still don't fully understand when and how I can use it. Jeff, can you write an article 'My first Tally table' or 'Tally tables for Dummies'?:D

Ronald HensbergenHelp us, help yourself... Post data so we can read and use it: http://www.sqlservercentral.com/articles/Best+Practices/61537/-------------------------------------------------------------------------2+2=5 for significant large values of 2

bkmooney SSCrazy Points: 2041 More actions · Answer 8

Thanks so much for the help. I've been able to implement your suggestions, but had to add quite a few work-around statements to skip over missing/NULL values (otherwise due to the recursion, everything forward would have been NULL.)

I thought I had the query working, and then found a problem that I can't seem to avoid. When it moves on to each subsequent SymbolID, it's still initializing it at the initial value for the first value. (I tried to avoid this with a @first variable to keep track, but I'm banging my head up against the wall with this. Can anyone see where my mistake is? I colored in RED in the results below where you can see the mistake happening, and highlighted in RED the code where those values are being set.

The relevant columns in the table I'm running it on are like:

CREATE TABLE [dbo].[Test_HistoricalData](

[RowID] [bigint] PRIMARY KEY NOT NULL,

[MarketDate] [datetime] NULL,

[Volume] [decimal](18, 0) NULL,

[SymbolID] [int] NOT NULL,

[ExpAvgVol] [decimal](18, 4) NULL

) ON [PRIMARY]

Each row represents one stock symbol on a given market date. The test table that I'm running it on right now has ~700 different symbols and their respective data over the last 2 years. I need to implement this on my production table with ~70K symbols over the last 5 years. Right now, (I think due to all my checks), I don't think this is working fast enough to feel comfortable doing so.

If anyone has any suggestions on how I might be able to speed this up a bit, I would really appreciate it.

Thanks!

PS-The PRINT statements in there were just for my troubleshooting to see if I was getting any hang-ups.

Results look like:

SymbolIDMarketDateVolumePrNonNullMktDatePrNonNullVolumeExpAvgVolfirstTimeinitialExpAvgVol

92007-01-03 16:00:00.0002575800NULLNULLNULL12575800

92007-01-04 16:00:00.00020737002007-01-03 16:00:00.0002575800257580000

92007-01-05 16:00:00.00026766002007-01-04 16:00:00.0002073700256073700

92007-01-08 16:00:00.00015572002007-01-05 16:00:00.00026766002564212.8900

....

92007-01-30 16:00:00.00014362002007-01-29 16:00:00.00020238002488137.0771495100

92007-01-31 16:00:00.00022631002007-01-30 16:00:00.00014362002456578.9648350200

102007-01-01 16:00:00.000NULLNULLNULLNULL1NULL

102007-01-02 16:00:00.00028575NULLNULLNULL12575800

102007-01-03 16:00:00.0002167332007-01-02 16:00:00.000285752499383.2500

102007-01-04 16:00:00.0002387212007-01-03 16:00:00.0002167332430903.742500

102007-01-05 16:00:00.0003805838932007-01-04 16:00:00.0002387212365138.26022500

102007-01-08 16:00:00.0001268632007-01-05 16:00:00.00038058389313711700.902418200

102007-01-09 16:00:00.000683562007-01-08 16:00:00.00012686313304155.765345700

=============CODE BELOW=========================================

-- ---------------------------------------------------------------------------------

-- PROCEDURE TO CALCULATE AND STORE EXPONENTIAL WEIGHTED AVERAGE OF VOLUME

-- Last Updated December 11, 2008

-- ---------------------------------------------------------------------------------

-- INITIALIZING RUN CRITERIA: Date and SymbolID ranges-------------------

DECLARE @startDate DATETIME

DECLARE @endDate DATETIME

DECLARE @startSymbolID INT

DECLARE @endSymbolID INT

SET @startDate = '2007-01-01'

SET @endDate = '2007-02-01'

SET @startSymbolID = 1

SET @endSymbolID = 11

-- DECLARING A TEMPORARY TABLE FOR STORAGE-------------------------------------------

DECLARE @TemporaryTable TABLE

(SymbolID INT --NVARCHAR(50)

, MarketDate DATETIME

, Volume DECIMAL (18,0)

, PrNonNullMkDate DATETIME

, PrNonNullVolume DECIMAL (18,0)

, ExpAvgVol FLOAT

, firstTime INT

, initialExpAvgVol FLOAT

)

-- FILLING THE TEMPORARY TABLE WITH KNOWNS-------------------------------------------

INSERT @TemporaryTable

(SymbolID

, MarketDate

, Volume

, PrNonNullMkDate

, PrNonNullVolume

, ExpAvgVol

)

SELECT

T1.SymbolID

, T1.MarketDate

, T1.Volume

-- Most recent prior date with Volume IS NOT NULL

, T3.PrNonNullMktDate

-- Value of the Volume on that date

, (SELECT T4.Volume

FROM Test_HistoricalData T4

WHERE T4.MarketDate=T3.PrNonNullMktDate AND T4.SymbolID=T1.SymbolID) AS PrNonNullVolume

, T1.ExpAvgVol

FROM

Test_HistoricalData T1

LEFT JOIN (SELECT T2.SymbolID

, T2.MarketDate

, (SELECT MAX(D.MarketDate)

FROM Test_HistoricalData D

WHERE D.MarketDate < T2.MarketDate

AND D.SymbolID=T2.SymbolID

AND D.Volume IS NOT NULL

) AS PrNonNullMktDate

FROM Test_HistoricalData T2

WHERE T2.SymbolID BETWEEN @startSymbolID AND @endSymbolID

AND T2.MarketDate BETWEEN @startDate AND @endDate

) AS T3 on T3.MarketDate=T1.MarketDate AND T1.SymbolID=T3.SymbolID

WHERE

T1.SymbolID BETWEEN @startSymbolID AND @endSymbolID

AND T1.MarketDate BETWEEN @startDate AND @endDate

ORDER BY

T1.SymbolID, T1.MarketDate

-- DECLARING A TEMPORARY VARIABLE TO HOLD THE ExpAvgVol and INITIALIZING IT-------------------------------------------

DECLARE @symbolID INT

DECLARE @expAvgVol FLOAT

DECLARE @initialExpAvgVol FLOAT

DECLARE @first INT

SET @symbolID = (SELECT MIN(SymbolID)

FROM @TemporaryTable )

SET @first=1

SET @initialExpAvgVol =

(CASE WHEN

--Checking to see if there is an earlier NOT NULL value for ExpAvgVol on this SymbolID

(SELECT ExpAvgVol

FROM Test_HistoricalData

WHERE SymbolID=@symbolID

AND MarketDate = (SELECT MAX(MarketDate)

FROM Test_HistoricalData

WHERE SymbolID=@symbolID

AND MarketDate < @startDate

AND ExpAvgVol IS NOT NULL) ) IS NOT NULL

THEN

--Use this earlier value as an initial value

(SELECT ExpAvgVol

FROM Test_HistoricalData

WHERE SymbolID=@symbolID

AND MarketDate = (SELECT MAX(MarketDate)

FROM Test_HistoricalData

WHERE SymbolID=@symbolID

AND MarketDate < @startDate

AND ExpAvgVol IS NOT NULL) )

ELSE

--Grab the earliest volume in this time period

(SELECT Volume

FROM @TemporaryTable

WHERE SymbolID=@symbolID

AND MarketDate = (SELECT MIN(MarketDate)

FROM @TemporaryTable

WHERE SymbolID=@symbolID) )

END )

--IF ExpAvgVol has an earlier in the stored table, intialize to that, else set to earliest volume

SET @expAvgVol = @initialExpAvgVol

PRINT 'Variables intialized. Beginning computation of ExpAvgVol.'

---- CALCULATING THE ExpAvgVol and Storing in TempTable-------------------------------------------

UPDATE @TemporaryTable

SET @expAvgVol=ExpAvgVol=

CASE WHEN @first=1

THEN 0.03E * PrNonNullVolume + 0.97E * @initialExpAvgVol

ELSE 0.03E * PrNonNullVolume + 0.97E * @expAvgVol

END

, @first=firstTime=

CASE WHEN @symbolID!=SymbolID OR PrNonNullVolume IS NULL

THEN 1

ELSE 0

END

, @symbolID=SymbolID

, @initialExpAvgVol = initialExpAvgVol=

CASE WHEN @first=1

THEN

(CASE WHEN

-- WHEN There is an earlier NOT NULL value for ExpAvgVol on this SymbolID

(SELECT T1.ExpAvgVol

FROM Test_HistoricalData T1

WHERE T1.SymbolID=@symbolID

AND T1.MarketDate = (SELECT MAX(T.MarketDate)

FROM Test_HistoricalData T

WHERE T.SymbolID=@symbolID

AND T.MarketDate < @startDate

AND T.ExpAvgVol IS NOT NULL) ) IS NOT NULL

THEN

-- Then use this earlier value as an initial value

(SELECT T1.ExpAvgVol

FROM Test_HistoricalData T1

WHERE T1.SymbolID=@symbolID

AND T1.MarketDate = (SELECT MAX(T.MarketDate)

FROM Test_HistoricalData T

WHERE T.SymbolID=@symbolID

AND T.MarketDate < @startDate

AND T.ExpAvgVol IS NOT NULL) )

ELSE

--If not, then grab the earliest volume in this time period

(SELECT Volume

FROM @TemporaryTable

WHERE SymbolID=@symbolID

AND MarketDate = (SELECT MIN(T.MarketDate)

FROM @TemporaryTable T

WHERE T.SymbolID=@symbolID

AND T.Volume IS NOT NULL) )

END )

ELSE 0

END

PRINT '@symbolID'

PRINT @symbolID

PRINT '@initialExpAvgVol'

PRINT @initialExpAvgVol

PRINT 'Completed Calculating ExpAvgVol and storing in TempTable. Last Row:'

-- DISPLAYING THE RESULTS-------------------------------------------

SELECT *

FROM @TemporaryTable

PRINT 'Completed displaying temporary table, moving on to updating actual table.'

-- UPDATING IN THE ACTUAL TABLE-------------------------------------------

UPDATE a

SET a.ExpAvgVol=T.ExpAvgVol

FROM Test_HistoricalData a

LEFT JOIN @TemporaryTable T ON T.SymbolID=a.SymbolID AND T.MarketDate=a.MarketDate

WHERE

a.SymbolID BETWEEN @startSymbolID AND @endSymbolID

AND a.MarketDate BETWEEN @startDate AND @endDate

-- DISPLAYING THE ACTUAL TABLE FOR COMPARISON-------------------------------------------

PRINT 'Completed Update to actual table. Displaying Results.'

SELECT a.SymbolID

, a.MarketDate

, a.Volume

, a.ExpAvgVol

FROM Test_HistoricalData a

WHERE

a.SymbolID BETWEEN @startSymbolID AND @endSymbolID

AND a.MarketDate BETWEEN @startDate AND @endDate

ORDER BY

a.SymbolID ASC, a.MarketDate ASC

Jeff Moden SSC Guru Points: 1004425 More actions · Answer 9

It's nice to see some documented code for a change. Thanks. 🙂

I think you've made things just a little difficult on yourself and I think I can help, but don't know because I don't have your data to play with. If you would, take a look at the link in my signature below and see if you can build some data that way so I can give 'er a try. It's important that the data be formatted as an Insert statement so I don't have to do the formatting.

A couple of hundred lines would be really good... don't post it... just attach it as a file. Thanks.

I could be wrong, but I think the principle code for the "Running Total" solution would work very well here. If you'd like to read up on it, here's an article on a very high speed method...

[font="Arial Black"]

Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5

[/font][/url]

... heh... yeah... I happen to know the author pretty well.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004425 More actions · Answer 10

r.hensbergen (12/11/2008)
Jeff Moden (12/11/2008)
johncyriac (12/10/2008)
Have a look at Tally/Number Table methods also
Sure... got any links? 😉 And, how do they apply for this particular problem?
Tip: add next time 'from the great articles by Jeff Moden' 😉 And I have to say, you would be correct. Though I still don't fully understand when and how I can use it. Jeff, can you write an article 'My first Tally table' or 'Tally tables for Dummies'?:D

Heh... Thanks Ron. I wasn't trying to be a smart guy about this... I just hate it when someone makes some off the wall suggestion and offers no code and no links to support the suggestion. To top it off, I don't believe a Tally table will have any bearing on this problem whatsoever.

Shifting gears... thanks for the great compliment. In answer to your question, I have a question... have your read the following article? I'm asking only because I don't know if you have or haven't. It falls into the same vein as your suggested article names.

[font="Arial Black"]The "Numbers" or "Tally" Table: What it is and how it replaces a loop.[/font][/url]

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

bkmooney SSCrazy Points: 2041 More actions · Answer 11

Jeff,

I'm attaching the file with the formatted test data. It should generate about 150 rows of data for you. And because it makes things more interesting (:)), I made sure to include some situations which can throw the recursion off (skipped days or days where various values are NULL).

Please let me know if you have any problems receiving the data.

Thanks again for all your help!

sgmunson SSC Guru Points: 110640 More actions · Answer 12

Looks like Peso nailed the concept for using the "quirky update" to get at the data, however, there's another issue that hasn't been mentioned, and that is accuracy. If there is even one missing value in a sequence of values, then an exponential weighted average isn't going to be accurate, and the number of missing values for any given symbol will have an impact on the overall accuracy of the number derived. The question thus becomes, how important is the accuracy of the number? Some alternatives:

1.) Exclude any symbols where there's even 1 missing value - kind of extreme, but might be necessary.

2.) Exclude any symbos with more than X number of missing values, and you decide what X is.

3.) Interpolate for missing values - in other words, decide that the missing value probably wasn't far from the average of the two trading days on either side of the missing value. If you have symbols with multiple consecutive missing values, this method might well have to exclude them. Even so, this method is going to produce inaccurate results, but probably not too far off, and depending on whether the interpolated value is sufficiently close to the actual value.

Also, you can combine 2 and 3 in lots of ways. Again, the primary concern is just how accurate these numbers need to be. It would only take one oddball actual value that is missing and is early on in the sequence for that symbol, to have an interpolation method be way off.

Steve

(aka smunson)

:):):)

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

sgmunson SSC Guru Points: 110640 More actions · Answer 13

Another thought just occurred to me. Perhaps an additional value needs to be stored: the number of values that contributes to the given exponential weighted average, and you develop the code to start the sequence fresh every time there's a missing value. I'm not very good at the "quirky update" technique, so I don't think I could effectively code it for this scenario, but you'd also need to count how many values are contributing to that average at any given point, and probably store it in the table as well, so that for any given symbol, the most recent weighted average value will only rely on a continuous data sequence, and you'll know exactly how far back the unbroken chain goes.

Steve

(aka smunson)

:):):)

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

bkmooney SSCrazy Points: 2041 More actions · Answer 14

Steve,

You've definitely got a good point that I had thought about as well. (Though for the moment, I'm most concerned with getting everything implemented correctly.)

But to address accuracy/conservatism in models, this is actually going to be compared with and/or combined with several other measures of moving average volume (ie a moving average over the last week, over the last month, as well as further exponential weighted averages with different constants (ie. 0.05 and 0.95 instead of 0.03 and 0.97.)

Also, the final idea for the exponential weighted average is that I start these moving averages far enough in the past, that by the time I get to more current data, any past anomalies will have smoothed out.

For example:

For the constants (0.97 , 0.03) it takes ~37 trading days to have any anomaly reduced to 1% of its value in the weighted average, but for (0.95, 0.05) it would take ~30 to do the same thing.

Different combinations of these averages and a healthy dose of conservatism in my models is so far the best way that I've found to address these concerns. If you've got any other ideas/suggestions, I'd love to hear them.

Thanks!