March 2, 2009 at 1:39 pm
I am adding a column to a table in my database that tracks the rolling standard deviation over a specified period of time.
So far, I have been able to implement this, but it is really slow! I'm hoping someone might have a hint as to how I might be able to speed this procedure up a bit...
Or maybe I'm heading in the complete wrong direction and SQL is not the right tool to be using for this. I know it's not meant to be a statistics workhorse, but I didn't think STDEV was really that intense of a statistical computation...
Here's information on the tables and the code I'm using so far - this is for a rolling stdev calculation over the 30 days prior.
At the start of the procedure, myTable contains rows like:
IDNumber INT
Date DATETIME
OHdiff FLOAT
OLdiff FLOAT
StDev_OH_30Day FLOAT
StDev_OL_30Day FLOAT
The StDev columns are NULL at the beginning, these are what I will be filling in with the procedure below:
NOTE: @lastUpdate, @updateThrough, @minIDNumber and @maxIDNumber are set by another part of my query, and are just making sure that I'm not duplicating work that's already been done...
UPDATE a
SET StDev_OH_30Day=(SELECT STDEV(b.OHdiff)
FROM myTable b
WHERE b.IDNumber= a.IDNumber
AND b.MarketDate < a.MarketDate
AND b.MarketDate > a.MarketDate-31 )
, StDev_OL_30Day=(SELECT STDEV(b.OLdiff)
FROM myTable b
WHERE b.IDNumber = a.IDNumber
AND b.MarketDate < a.MarketDate
AND b.MarketDate > a.MarketDate-31 )
FROM myTable a
WHERE
a.MarketDate BETWEEN @lastUpdate+.1 AND @updateThrough
AND a.IDNumber BETWEEN @minIDNumber AND @maxIDNumber
It takes about 10 minutes to run this query on a little less than 90K rows. Increasing the time period for all the stdev calcs from 30 days to 90 days increases the processing time to 15 minutes.
This is meant to be part of a sequence of other updates, so the snappier I can make it, the better =)
Any ideas are always appreciated - Thanks!
March 2, 2009 at 2:25 pm
Might be a cool time to try something built in the SQL CLR to see if you can get any perf gains.
March 2, 2009 at 3:46 pm
Sorry, but I'm not sure I really understand what you mean. I'm still really new to SQL.
Do you mean defining a user-defined function with C or some other language like in the examples on this page?http://msdn.microsoft.com/en-us/library/w2kae45k(VS.80).aspx
March 2, 2009 at 4:34 pm
can you show us the execution plan when you run the update? that will tell us exactly what the update is doing, and allow us to offer suggestions like indexes on the date column, etc that might speed up the update
Lowell
March 2, 2009 at 4:40 pm
Exactly. And it doesn't need to be just limited to one clr function...just an idea.
March 3, 2009 at 9:17 am
I would love suggestions on how to improve my indexing!
Right now I have 3 indexes on my table:
1) A clustered PK index on a RowID column
2) A non-clustered index on Date, then IDNumber which includes the StDev cols (and a few other cols not involved in this update)
3) A non-clustered index on IDNumber then Date which includes the StDev cols (and a few other cols not involved in this update)
Was that what you were looking for?
Thanks!
March 3, 2009 at 3:55 pm
I'm not sure why you have two separate subqueries to calculate the two standard deviations. It looks to me like it could be one.
I would also change the whole thing to a CTE. It should be much faster that way.
March 4, 2009 at 8:42 am
Sorry, but I'm not sure I follow... I'm learning a lot just from trying to understand what everyone here is suggesting.
Would you mind explaining a bit further how I might be able to put the two separate subqueries into one as part of an update statement?
Thanks!
March 4, 2009 at 9:39 am
In trying to combine the two subqueries into one, I got this far, but have encountered the error messages:
The multi-part identifier "a.IDNumber" could not be bound.
The multi-part identifier "a.MarketDate" could not be bound.
UPDATE a
SET StDev_OH_30Day=T.StDev_OHdiff
, StDev_OL_30Day=T.StDev_OLdiff
FROM myTable,
(SELECT STDEV(b.OHdiff) AS StDev_OHdiff
, STDEV(b.OLdiff) AS StDev_OLdiff
FROM myTable b
WHERE b.IDNumber = a.IDNumber
AND b.MarketDate < a.MarketDate
AND b.MarketDate > a.MarketDate-31 ) AS T
WHERE
a.MarketDate BETWEEN @lastUpdate+.1 AND @updateThrough
AND a.IDNumber BETWEEN @minIDNumber AND @maxIDNumber
If anyone's got any ideas or suggestions, I would really appreciate it.
Thanks!
March 4, 2009 at 11:43 am
Here's a sample of it as a CTE - note the ; before it. You will need that if this is part of a procedure that does other things.
;
WITH get_stddev AS
( SELECT a.idnumber ,
a.marketdate ,
STDEV(b.OHdiff) stddev_oh ,
STDEV(b.OLdiff) stddev_ol
FROM myTable b
JOIN myTable a
ON
(
b.IDNumber = a.IDNumber
AND b.MarketDate < a.MarketDate
AND b.MarketDate > a.MarketDate-31
)
WHERE a.MarketDate BETWEEN @lastUpdate + .1 AND @updateThrough
AND a.IDNumber BETWEEN @minIDNumber AND @maxIDNumber
GROUP BY a.idnumber,
a.marketdate
) ----- end cte
UPDATE c
SET StDev_OH_30Day = stddev_oh ,
StDev_OL_30Day = stddev_ol
FROM mytable c
JOIN get_stddev gs
ON
(
gs.IDNumber = c.IDNumber
AND c.marketdate = gs.marketdate )
I urge you to test this thoroughly. I wasn't completely sure of your data set-up. It's possible this needs some changes for your tables.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply