October 31, 2005 at 4:30 am
I have a stored procedure with a self join that after 48 hours still does not complete. I would appreciate any helpful tips on how to improve performance.
I am working with end of day stock market data. I want to know when a stock’s price has doubled within two months. I want the relevant data associated with this doubling of price saved to another table.
The source table, Fact_Asset_EOD, contains end of day stock market data. It contains over 8,000,000 records. The structure of the data is:
Stock_EOD_sk (int)
Stock_sk (int)
Trading_Date (smalldatetime)
Open_Price (money)
High_Price (money)
Low_Price (money)
Close_Price (money)
Stock_EOD_sk is the primary key and identity = yes
There is a clustered key on Stock_EOD_sk and a Unique Index on Stock_sk and Trading_Date_sk.
Each stock is represented by an integer in Stock_sk (the stock symbol and name are stored in another table) and each trading date for the stock is represented by Trading_Date.
The following stored procedure finds all stocks and all possible trading dates where the price has doubled within 2 months.
select
pole_bottom.Stock_sk
, pole_bottom.Trading_Date as 'pole_bottom_Date'
, pole_bottom.Low_Price as 'pole_bottom_Price'
, pole_top.Trading_Date as 'pole_top_Date'
, pole_top.High_Price as 'pole_top_Price'
into Summary_Flag
from dbo.Fact_Asset_EOD pole_bottom
inner join dbo.Fact_Asset_EOD pole_top
on pole_bottom.Stock_sk = pole_top.Stock_sk
where
pole_top.High_Price >= 2 * pole_bottom.Low_Price
and
pole_bottom.Trading_Date between
dateadd(m, -2, pole_top.Trading_Date)
and pole_top.Trading_Date
Any suggestions on how to improve performance?
Cheers
banocto
October 31, 2005 at 8:19 am
Which indexes are currently used by your query (view the queryanalyzer)?
Does the index wizard gives any hints?
select
pole_bottom.Stock_sk
, pole_bottom.Trading_Date as 'pole_bottom_Date'
, pole_bottom.Low_Price as 'pole_bottom_Price'
, pole_top.Trading_Date as 'pole_top_Date'
, pole_top.High_Price as 'pole_top_Price'
into Summary_Flag
from dbo.Fact_Asset_EOD pole_bottom
inner join dbo.Fact_Asset_EOD pole_top
on pole_bottom.Stock_sk = pole_top.Stock_sk
and
pole_top.High_Price >= ( pole_bottom.Low_Price + pole_bottom.Low_Price)
and pole_bottom.Trading_Date between dateadd(m, -2, pole_top.Trading_Date)
and pole_top.Trading_Date
October 31, 2005 at 10:23 pm
The SELECT INTO is one problem, this is much slower than CREATE TABLE and INSERT INTO SELECT.
Try:
-- CREATE TABLE Summary_Flag with no rows
SELECT pole_bottom.Stock_sk
, pole_bottom.Trading_Date as 'pole_bottom_Date'
, pole_bottom.Low_Price as 'pole_bottom_Price'
, pole_top.Trading_Date as 'pole_top_Date'
, pole_top.High_Price as 'pole_top_Price'
INTO Summary_Flag
FROM dbo.Fact_Asset_EOD pole_bottom
INNER JOIN dbo.Fact_Asset_EOD pole_top
ON pole_bottom.Stock_sk = pole_top.Stock_sk
WHERE pole_bottom.Stock_EOD_sk = 0
-- INSERT the data into Summary_Flag
INSERT INTO Summary_Flag (Stock_sk,pole_bottom_Date,pole_bottom_Price,pole_top_Date,pole_top_Price)
SELECT pole_bottom.Stock_sk
, pole_bottom.Trading_Date as 'pole_bottom_Date'
, pole_bottom.Low_Price as 'pole_bottom_Price'
, pole_top.Trading_Date as 'pole_top_Date'
, pole_top.High_Price as 'pole_top_Price'
FROM dbo.Fact_Asset_EOD pole_bottom
INNER JOIN dbo.Fact_Asset_EOD pole_top
ON pole_bottom.Stock_sk = pole_top.Stock_sk
AND (pole_bottom.Low_Price+pole_bottom.Low_Price) <= pole_top.High_Price
WHERE pole_bottom.Trading_Date BETWEEN DATEADD(m,-2,pole_top.Trading_Date)
AND pole_top.Trading_Date
Andy
October 31, 2005 at 10:24 pm
[snip]
Posting errored, so tried again and see two copies!
Andy
November 1, 2005 at 7:59 am
I would estimate that you would see better performance by doing the following:
Create computed columns on the table with the folowing:
Price * 2
and
date + 2 months
Then place indexes on those computed columns. This will "materialize" the calculations, perfoming them ONCE for each of the 8,000,000 rows. (This will also be done ahead of time)
Rewrite the query so that the comparisons are performed against the pre-computed columns. It should run much faster.
Currently, the execution is most likely performing the price and date calculations one time for each possible pair of columns in the self joined table, which would be approximately
(x^2)/y where x is 8,000,000 and y is the number of distinct stock ids.
So, if you have 8,000 data points for each of 1,000 stocks, you will perform the calculcations 64 billion times.
November 29, 2005 at 11:40 pm
Thanks Jo, Andy & Jeff. I’ve been out of action … hence my delayed response.
I have improved the performance significantly by implementing each of your suggestions and I've learnt a fair bit more about Execution Plans etc.
Thanks
BanOcto
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply