December 10, 2008 at 2:59 pm
I am seeking to use SQL to find an exponential weighted average, which is defined recursively.
If Vn = value on day n
The exponential weighted average can be defined as:
ExpAvgn := 0.03*Vn+0.97*ExpAvgn-1
The problem is, I don't know how to deal with this kind of a recursive definition using SQL and can't seem to find any good resources.
Has anyone encountered (and conquered) this problem before?
Any help here is greatly appreciated.
Thanks!
December 10, 2008 at 3:15 pm
DECLARE@Sample TABLE
(
ID INT PRIMARY KEY CLUSTERED,
Number INT,
expAvg FLOAT
)
INSERT@Sample
(
ID,
Number
)
SELECT TOP 10Number,
ABS(CHECKSUM(NEWID())) % 50000
FROMmaster..spt_values
WHEREType = 'P'
ORDER BYNumber
DECLARE@expAvg FLOAT
SET@expAvg = 0
UPDATE@Sample
SET@expAvg = ExpAvg = 0.03E * Number + 0.97E * @expAvg
SELECT*
FROM@Sample
N 56°04'39.16"
E 12°55'05.25"
December 10, 2008 at 8:40 pm
Nicely done, Peter.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 10, 2008 at 11:20 pm
Have a look at Tally/Number Table methods also
December 11, 2008 at 12:29 am
Thank you Jeff!
The use of master..spt_values or a TallyNumber table is ONLY for generating sample data.
The working piece of algorithm is the UPDATE part.
N 56°04'39.16"
E 12°55'05.25"
December 11, 2008 at 5:19 am
Heh... that bit of info should be as a comment in the code. 😉
Anyway, works good.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 11, 2008 at 5:37 am
johncyriac (12/10/2008)
Have a look at Tally/Number Table methods also
Sure... got any links? 😉 And, how do they apply for this particular problem?
--Jeff Moden
Change is inevitable... Change for the better is not.
December 11, 2008 at 5:43 am
Jeff Moden (12/11/2008)
johncyriac (12/10/2008)
Have a look at Tally/Number Table methods alsoSure... got any links? 😉 And, how do they apply for this particular problem?
Tip: add next time 'from the great articles by Jeff Moden' 😉 And I have to say, you would be correct. Though I still don't fully understand when and how I can use it. Jeff, can you write an article 'My first Tally table' or 'Tally tables for Dummies'?:D
Ronald HensbergenHelp us, help yourself... Post data so we can read and use it: http://www.sqlservercentral.com/articles/Best+Practices/61537/-------------------------------------------------------------------------2+2=5 for significant large values of 2
December 11, 2008 at 2:18 pm
Thanks so much for the help. I've been able to implement your suggestions, but had to add quite a few work-around statements to skip over missing/NULL values (otherwise due to the recursion, everything forward would have been NULL.)
I thought I had the query working, and then found a problem that I can't seem to avoid. When it moves on to each subsequent SymbolID, it's still initializing it at the initial value for the first value. (I tried to avoid this with a @first variable to keep track, but I'm banging my head up against the wall with this. Can anyone see where my mistake is? I colored in RED in the results below where you can see the mistake happening, and highlighted in RED the code where those values are being set.
The relevant columns in the table I'm running it on are like:
CREATE TABLE [dbo].[Test_HistoricalData](
[RowID] [bigint] PRIMARY KEY NOT NULL,
[MarketDate] [datetime] NULL,
[Volume] [decimal](18, 0) NULL,
[SymbolID] [int] NOT NULL,
[ExpAvgVol] [decimal](18, 4) NULL
) ON [PRIMARY]
Each row represents one stock symbol on a given market date. The test table that I'm running it on right now has ~700 different symbols and their respective data over the last 2 years. I need to implement this on my production table with ~70K symbols over the last 5 years. Right now, (I think due to all my checks), I don't think this is working fast enough to feel comfortable doing so.
If anyone has any suggestions on how I might be able to speed this up a bit, I would really appreciate it.
Thanks!
PS-The PRINT statements in there were just for my troubleshooting to see if I was getting any hang-ups.
Results look like:
SymbolIDMarketDateVolumePrNonNullMktDatePrNonNullVolumeExpAvgVolfirstTimeinitialExpAvgVol
92007-01-03 16:00:00.0002575800NULLNULLNULL12575800
92007-01-04 16:00:00.00020737002007-01-03 16:00:00.0002575800257580000
92007-01-05 16:00:00.00026766002007-01-04 16:00:00.0002073700256073700
92007-01-08 16:00:00.00015572002007-01-05 16:00:00.00026766002564212.8900
....
92007-01-30 16:00:00.00014362002007-01-29 16:00:00.00020238002488137.0771495100
92007-01-31 16:00:00.00022631002007-01-30 16:00:00.00014362002456578.9648350200
102007-01-01 16:00:00.000NULLNULLNULLNULL1NULL
102007-01-02 16:00:00.00028575NULLNULLNULL12575800
102007-01-03 16:00:00.0002167332007-01-02 16:00:00.000285752499383.2500
102007-01-04 16:00:00.0002387212007-01-03 16:00:00.0002167332430903.742500
102007-01-05 16:00:00.0003805838932007-01-04 16:00:00.0002387212365138.26022500
102007-01-08 16:00:00.0001268632007-01-05 16:00:00.00038058389313711700.902418200
102007-01-09 16:00:00.000683562007-01-08 16:00:00.00012686313304155.765345700
=============CODE BELOW=========================================
-- ---------------------------------------------------------------------------------
-- PROCEDURE TO CALCULATE AND STORE EXPONENTIAL WEIGHTED AVERAGE OF VOLUME
-- Last Updated December 11, 2008
-- ---------------------------------------------------------------------------------
-- INITIALIZING RUN CRITERIA: Date and SymbolID ranges-------------------
DECLARE @startDate DATETIME
DECLARE @endDate DATETIME
DECLARE @startSymbolID INT
DECLARE @endSymbolID INT
SET @startDate = '2007-01-01'
SET @endDate = '2007-02-01'
SET @startSymbolID = 1
SET @endSymbolID = 11
-- DECLARING A TEMPORARY TABLE FOR STORAGE-------------------------------------------
DECLARE @TemporaryTable TABLE
(SymbolID INT --NVARCHAR(50)
, MarketDate DATETIME
, Volume DECIMAL (18,0)
, PrNonNullMkDate DATETIME
, PrNonNullVolume DECIMAL (18,0)
, ExpAvgVol FLOAT
, firstTime INT
, initialExpAvgVol FLOAT
)
-- FILLING THE TEMPORARY TABLE WITH KNOWNS-------------------------------------------
INSERT @TemporaryTable
(SymbolID
, MarketDate
, Volume
, PrNonNullMkDate
, PrNonNullVolume
, ExpAvgVol
)
SELECT
T1.SymbolID
, T1.MarketDate
, T1.Volume
-- Most recent prior date with Volume IS NOT NULL
, T3.PrNonNullMktDate
-- Value of the Volume on that date
, (SELECT T4.Volume
FROM Test_HistoricalData T4
WHERE T4.MarketDate=T3.PrNonNullMktDate AND T4.SymbolID=T1.SymbolID) AS PrNonNullVolume
, T1.ExpAvgVol
FROM
Test_HistoricalData T1
LEFT JOIN (SELECT T2.SymbolID
, T2.MarketDate
, (SELECT MAX(D.MarketDate)
FROM Test_HistoricalData D
WHERE D.MarketDate < T2.MarketDate
AND D.SymbolID=T2.SymbolID
AND D.Volume IS NOT NULL
) AS PrNonNullMktDate
FROM Test_HistoricalData T2
WHERE T2.SymbolID BETWEEN @startSymbolID AND @endSymbolID
AND T2.MarketDate BETWEEN @startDate AND @endDate
) AS T3 on T3.MarketDate=T1.MarketDate AND T1.SymbolID=T3.SymbolID
WHERE
T1.SymbolID BETWEEN @startSymbolID AND @endSymbolID
AND T1.MarketDate BETWEEN @startDate AND @endDate
ORDER BY
T1.SymbolID, T1.MarketDate
-- DECLARING A TEMPORARY VARIABLE TO HOLD THE ExpAvgVol and INITIALIZING IT-------------------------------------------
DECLARE @symbolID INT
DECLARE @expAvgVol FLOAT
DECLARE @initialExpAvgVol FLOAT
DECLARE @first INT
SET @symbolID = (SELECT MIN(SymbolID)
FROM @TemporaryTable )
SET @first=1
SET @initialExpAvgVol =
(CASE WHEN
--Checking to see if there is an earlier NOT NULL value for ExpAvgVol on this SymbolID
(SELECT ExpAvgVol
FROM Test_HistoricalData
WHERE SymbolID=@symbolID
AND MarketDate = (SELECT MAX(MarketDate)
FROM Test_HistoricalData
WHERE SymbolID=@symbolID
AND MarketDate < @startDate
AND ExpAvgVol IS NOT NULL) ) IS NOT NULL
THEN
--Use this earlier value as an initial value
(SELECT ExpAvgVol
FROM Test_HistoricalData
WHERE SymbolID=@symbolID
AND MarketDate = (SELECT MAX(MarketDate)
FROM Test_HistoricalData
WHERE SymbolID=@symbolID
AND MarketDate < @startDate
AND ExpAvgVol IS NOT NULL) )
ELSE
--Grab the earliest volume in this time period
(SELECT Volume
FROM @TemporaryTable
WHERE SymbolID=@symbolID
AND MarketDate = (SELECT MIN(MarketDate)
FROM @TemporaryTable
WHERE SymbolID=@symbolID) )
END )
--IF ExpAvgVol has an earlier in the stored table, intialize to that, else set to earliest volume
SET @expAvgVol = @initialExpAvgVol
PRINT 'Variables intialized. Beginning computation of ExpAvgVol.'
---- CALCULATING THE ExpAvgVol and Storing in TempTable-------------------------------------------
UPDATE @TemporaryTable
SET @expAvgVol=ExpAvgVol=
CASE WHEN @first=1
THEN 0.03E * PrNonNullVolume + 0.97E * @initialExpAvgVol
ELSE 0.03E * PrNonNullVolume + 0.97E * @expAvgVol
END
, @first=firstTime=
CASE WHEN @symbolID!=SymbolID OR PrNonNullVolume IS NULL
THEN 1
ELSE 0
END
, @symbolID=SymbolID
, @initialExpAvgVol = initialExpAvgVol=
CASE WHEN @first=1
THEN
(CASE WHEN
-- WHEN There is an earlier NOT NULL value for ExpAvgVol on this SymbolID
(SELECT T1.ExpAvgVol
FROM Test_HistoricalData T1
WHERE T1.SymbolID=@symbolID
AND T1.MarketDate = (SELECT MAX(T.MarketDate)
FROM Test_HistoricalData T
WHERE T.SymbolID=@symbolID
AND T.MarketDate < @startDate
AND T.ExpAvgVol IS NOT NULL) ) IS NOT NULL
THEN
-- Then use this earlier value as an initial value
(SELECT T1.ExpAvgVol
FROM Test_HistoricalData T1
WHERE T1.SymbolID=@symbolID
AND T1.MarketDate = (SELECT MAX(T.MarketDate)
FROM Test_HistoricalData T
WHERE T.SymbolID=@symbolID
AND T.MarketDate < @startDate
AND T.ExpAvgVol IS NOT NULL) )
ELSE
--If not, then grab the earliest volume in this time period
(SELECT Volume
FROM @TemporaryTable
WHERE SymbolID=@symbolID
AND MarketDate = (SELECT MIN(T.MarketDate)
FROM @TemporaryTable T
WHERE T.SymbolID=@symbolID
AND T.Volume IS NOT NULL) )
END )
ELSE 0
END
PRINT '@symbolID'
PRINT @symbolID
PRINT '@initialExpAvgVol'
PRINT @initialExpAvgVol
PRINT 'Completed Calculating ExpAvgVol and storing in TempTable. Last Row:'
-- DISPLAYING THE RESULTS-------------------------------------------
SELECT *
FROM @TemporaryTable
PRINT 'Completed displaying temporary table, moving on to updating actual table.'
-- UPDATING IN THE ACTUAL TABLE-------------------------------------------
UPDATE a
SET a.ExpAvgVol=T.ExpAvgVol
FROM Test_HistoricalData a
LEFT JOIN @TemporaryTable T ON T.SymbolID=a.SymbolID AND T.MarketDate=a.MarketDate
WHERE
a.SymbolID BETWEEN @startSymbolID AND @endSymbolID
AND a.MarketDate BETWEEN @startDate AND @endDate
-- DISPLAYING THE ACTUAL TABLE FOR COMPARISON-------------------------------------------
PRINT 'Completed Update to actual table. Displaying Results.'
SELECT a.SymbolID
, a.MarketDate
, a.Volume
, a.ExpAvgVol
FROM Test_HistoricalData a
WHERE
a.SymbolID BETWEEN @startSymbolID AND @endSymbolID
AND a.MarketDate BETWEEN @startDate AND @endDate
ORDER BY
a.SymbolID ASC, a.MarketDate ASC
December 11, 2008 at 7:25 pm
It's nice to see some documented code for a change. Thanks. 🙂
I think you've made things just a little difficult on yourself and I think I can help, but don't know because I don't have your data to play with. If you would, take a look at the link in my signature below and see if you can build some data that way so I can give 'er a try. It's important that the data be formatted as an Insert statement so I don't have to do the formatting.
A couple of hundred lines would be really good... don't post it... just attach it as a file. Thanks.
I could be wrong, but I think the principle code for the "Running Total" solution would work very well here. If you'd like to read up on it, here's an article on a very high speed method...
Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5
[/font][/url]
... heh... yeah... I happen to know the author pretty well.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 11, 2008 at 7:39 pm
r.hensbergen (12/11/2008)
Jeff Moden (12/11/2008)
johncyriac (12/10/2008)
Have a look at Tally/Number Table methods alsoSure... got any links? 😉 And, how do they apply for this particular problem?
Tip: add next time 'from the great articles by Jeff Moden' 😉 And I have to say, you would be correct. Though I still don't fully understand when and how I can use it. Jeff, can you write an article 'My first Tally table' or 'Tally tables for Dummies'?:D
Heh... Thanks Ron. I wasn't trying to be a smart guy about this... I just hate it when someone makes some off the wall suggestion and offers no code and no links to support the suggestion. To top it off, I don't believe a Tally table will have any bearing on this problem whatsoever.
Shifting gears... thanks for the great compliment. In answer to your question, I have a question... have your read the following article? I'm asking only because I don't know if you have or haven't. It falls into the same vein as your suggested article names.
[font="Arial Black"]The "Numbers" or "Tally" Table: What it is and how it replaces a loop.[/font][/url]
--Jeff Moden
Change is inevitable... Change for the better is not.
December 12, 2008 at 7:31 am
Jeff,
I'm attaching the file with the formatted test data. It should generate about 150 rows of data for you. And because it makes things more interesting (:)), I made sure to include some situations which can throw the recursion off (skipped days or days where various values are NULL).
Please let me know if you have any problems receiving the data.
Thanks again for all your help!
December 12, 2008 at 8:07 am
Looks like Peso nailed the concept for using the "quirky update" to get at the data, however, there's another issue that hasn't been mentioned, and that is accuracy. If there is even one missing value in a sequence of values, then an exponential weighted average isn't going to be accurate, and the number of missing values for any given symbol will have an impact on the overall accuracy of the number derived. The question thus becomes, how important is the accuracy of the number? Some alternatives:
1.) Exclude any symbols where there's even 1 missing value - kind of extreme, but might be necessary.
2.) Exclude any symbos with more than X number of missing values, and you decide what X is.
3.) Interpolate for missing values - in other words, decide that the missing value probably wasn't far from the average of the two trading days on either side of the missing value. If you have symbols with multiple consecutive missing values, this method might well have to exclude them. Even so, this method is going to produce inaccurate results, but probably not too far off, and depending on whether the interpolated value is sufficiently close to the actual value.
Also, you can combine 2 and 3 in lots of ways. Again, the primary concern is just how accurate these numbers need to be. It would only take one oddball actual value that is missing and is early on in the sequence for that symbol, to have an interpolation method be way off.
Steve
(aka smunson)
:):):)
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
December 12, 2008 at 8:12 am
Another thought just occurred to me. Perhaps an additional value needs to be stored: the number of values that contributes to the given exponential weighted average, and you develop the code to start the sequence fresh every time there's a missing value. I'm not very good at the "quirky update" technique, so I don't think I could effectively code it for this scenario, but you'd also need to count how many values are contributing to that average at any given point, and probably store it in the table as well, so that for any given symbol, the most recent weighted average value will only rely on a continuous data sequence, and you'll know exactly how far back the unbroken chain goes.
Steve
(aka smunson)
:):):)
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
December 12, 2008 at 8:28 am
Steve,
You've definitely got a good point that I had thought about as well. (Though for the moment, I'm most concerned with getting everything implemented correctly.)
But to address accuracy/conservatism in models, this is actually going to be compared with and/or combined with several other measures of moving average volume (ie a moving average over the last week, over the last month, as well as further exponential weighted averages with different constants (ie. 0.05 and 0.95 instead of 0.03 and 0.97.)
Also, the final idea for the exponential weighted average is that I start these moving averages far enough in the past, that by the time I get to more current data, any past anomalies will have smoothed out.
For example:
For the constants (0.97 , 0.03) it takes ~37 trading days to have any anomaly reduced to 1% of its value in the weighted average, but for (0.95, 0.05) it would take ~30 to do the same thing.
Different combinations of these averages and a healthy dose of conservatism in my models is so far the best way that I've found to address these concerns. If you've got any other ideas/suggestions, I'd love to hear them.
Thanks!
Viewing 15 posts - 1 through 15 (of 21 total)
You must be logged in to reply to this topic. Login to reply