June 4, 2023 at 5:29 am
Hi everyone
My SQL query is close to 1000 lines with 17 separate calcs (ie sub-queries). The entire query takes almost 3 minutes to run. This is way too long. I figured out the problem area. It turns out that 70% of the time is spent on 2 subqueries. Each is the same except that the date filters are different so I will include one only. I don't know how I can speed it up. I need your help. I will provide the sub-query and then explain what it is doing.
Query:
SELECTT6.UNDERLYING_SYMBOL, T6.QUOTE_DATE, T6.EXPIRATION, T6.ROOT_SYMBOL
FROM
(
SELECTT4.UNDERLYING_SYMBOL,
T4.QUOTE_DATE,
T4.EXPIRATION,
T4.ROOT_SYMBOL,
ROW_NUMBER() OVER (PARTITION BY T4.UNDERLYING_SYMBOL, T4.QUOTE_DATE ORDER BY T4.ROOT_SYMBOL) AS ROOT_SYMBOL_SEQ
FROM(
SELECTT1.UNDERLYING_SYMBOL,
T1.QUOTE_DATE,
T1.EXPIRATION,
T1.ROOT_SYMBOL,
CASE
WHEN DATENAME(WEEKDAY,EXPIRATION) = 'Friday' THEN 'Y'
WHEN DATENAME(WEEKDAY,EXPIRATION) = 'Thursday' AND (DATEADD(DAY,1,EXPIRATION) NOT IN (SELECT DISTINCT T2.EXPIRATION FROM DBO.Stock AS T2)) THEN 'Y'
ELSE 'N'
END AS END_OF_WEEK
FROMdbo.Stock AS T1
WHEREDATEDIFF(DAY, T1.QUOTE_DATE, T1.EXPIRATION) >= 23 AND
DATEDIFF(DAY, T1.QUOTE_DATE, T1.EXPIRATION) <= 30
) AS T4 INNER JOIN
(
SELECTT3.UNDERLYING_SYMBOL, T3.QUOTE_DATE, MAX(T3.EXPIRATION) AS EXPIRATION
FROM(
SELECTT1.UNDERLYING_SYMBOL,
T1.QUOTE_DATE,
T1.EXPIRATION,
T1.ROOT_SYMBOL,
CASE
WHEN DATENAME(WEEKDAY,EXPIRATION) = 'Friday' THEN 'Y'
WHEN DATENAME(WEEKDAY,EXPIRATION) = 'Thursday' AND (DATEADD(DAY,1,EXPIRATION) NOT IN (SELECT DISTINCT T2.EXPIRATION FROM DBO.Stock AS T2)) THEN 'Y'
ELSE 'N'
END AS END_OF_WEEK
FROMdbo.Stock AS T1
WHEREDATEDIFF(DAY, T1.QUOTE_DATE, T1.EXPIRATION) >= 23 AND
DATEDIFF(DAY, T1.QUOTE_DATE, T1.EXPIRATION) <= 30
) AS T3
WHERET3.END_OF_WEEK = 'Y'
GROUP BYT3.UNDERLYING_SYMBOL, T3.QUOTE_DATE
) AS T5 ON
T4.UNDERLYING_SYMBOL = T5.UNDERLYING_SYMBOL AND
T4.QUOTE_DATE = T5.QUOTE_DATE AND
T4.EXPIRATION = T5.EXPIRATION
) AS T6
WHERE T6.ROOT_SYMBOL_SEQ = 1
Explanation:
Step 1 - Find the records that have an end of week of Friday. If Friday is a holiday then the end of week is Thursday. The reason behind this is that stocks only trade Monday to Friday excluding days that fall on a holiday.
Finding if the end of week is Friday is easy. The expiration date will tell you what day of the week it is. If the expiration date is Friday then include record else exclude. The record is marked with 'Y' if it is a Friday record.
Now, finding out if the end of week is a Thursday is a bit tricky but doable. The database has no trading records if the date falls on a holiday. Apr 15 2022 is a stat holiday. This is a Friday so markets are closed. There will be no trading record for Apr 15 2022. There will be trading records for the Apr 14 2022 (Thursday). If the day is Thursday and if I add 1 to this date to get April 15 2022 (Friday) then look for April 15 2022 in the list of unique dates. It can't be found. This means that Thursday is the end of the week. The Thursday record is marked as 'Y'.
Step 2 - For a given security and trading day, what is the largest expiration date for all records that are marked with 'Y'.
Step 3 - If there are two records with the same expiration then pick the one that has no 'w' in the last character of the root symbol.
Here is test data to show the algo in progress. It is attached. I have also included a screenshot. The first output screen shows the distinct dates. There is no 2022-04-15 date:
The second output screen shows the various records after a date filter is applied. Group all the records by END_OF_WEEK = Y. Then we pick the one with the largest expiration. There is a tie (records 16 and 17). The tie breaking rule is to pick SPX so the end result of the algo is to return record 16.
How can I speed this up?
Thank you
June 4, 2023 at 5:46 am
i wasn't able to delete the original attachment. use test-data-1.csv and ignore other one.
June 4, 2023 at 12:23 pm
Why settle a tie using query logic, what about a unique index? Some or one of those could be helpful. A table containing holidays to exclude could be useful too. This code here could be slowing things down
WHERE DATEDIFF(DAY, T1.QUOTE_DATE, T1.EXPIRATION) >= 23 AND
DATEDIFF(DAY, T1.QUOTE_DATE, T1.EXPIRATION) <= 30
Even if there were index(es) on those columns the functions on the left side of the WHERE conditions make it unlikely it/they'd be used. Since both columns are in the same table maybe create a calculated column and index it
Someone in this forum once described the general method for optimizing multipart queries so concisely it made me laugh. It goes like this: find the subquery which returns the most rows, alter the proc to store that data in a temp table instead and index the column(s) which join back to the rest of the query
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
June 4, 2023 at 9:12 pm
Why settle a tie using query logic, what about a unique index? Some or one of those could be helpful. A table containing holidays to exclude could be useful too. This code here could be slowing things down
WHERE DATEDIFF(DAY, T1.QUOTE_DATE, T1.EXPIRATION) >= 23 AND
DATEDIFF(DAY, T1.QUOTE_DATE, T1.EXPIRATION) <= 30Even if there were index(es) on those columns the functions on the left side of the WHERE conditions make it unlikely it/they'd be used. Since both columns are in the same table maybe create a calculated column and index it
Someone in this forum once described the general method for optimizing multipart queries so concisely it made me laugh. It goes like this: find the subquery which returns the most rows, alter the proc to store that data in a temp table instead and index the column(s) which join back to the rest of the query
Thanks for the reply. I actually need the datediff filter to get the necessary records.
June 4, 2023 at 9:22 pm
a image of the plans don't help much - please post the real explain plan - actual if possible.
use https://www.brentozar.com/pastetheplan/ for it
June 4, 2023 at 10:30 pm
This is an equivalent query
SELECT T4.UNDERLYING_SYMBOL,
T4.QUOTE_DATE,
T4.EXPIRATION,
T4.ROOT_SYMBOL
FROM (SELECT DISTINCT UNDERLYING_SYMBOL, QUOTE_DATE
FROM dbo.Stock) X
CROSS APPLY (SELECT TOP(1)
T1.UNDERLYING_SYMBOL,
T1.QUOTE_DATE,
T1.EXPIRATION,
T1.ROOT_SYMBOL
FROM dbo.Stock AS T1
WHERE DATEDIFF(DAY, T1.QUOTE_DATE, T1.EXPIRATION) BETWEEN 23 AND 30
AND T1.UNDERLYING_SYMBOL = X.UNDERLYING_SYMBOL
AND T1.QUOTE_DATE = X.QUOTE_DATE
ORDER BY T1.ROOT_SYMBOL
) AS T4
CROSS APPLY (SELECT TOP(1)
T3.EXPIRATION
FROM dbo.Stock AS T3
WHERE DATEDIFF(DAY, T3.QUOTE_DATE, T3.EXPIRATION) BETWEEN 23 AND 30
AND DATENAME(WEEKDAY, T3.EXPIRATION) IN ('Friday', 'Thursday')
AND NOT EXISTS(SELECT *
FROM dbo.Stock AS T2
WHERE T2.EXPIRATION = DATEADD(DAY, 1, T3.EXPIRATION))
AND T3.UNDERLYING_SYMBOL = T4.UNDERLYING_SYMBOL
AND T3.QUOTE_DATE = T4.QUOTE_DATE
ORDER BY T3.EXPIRATION DESC
) AS T5
WHERE T5.EXPIRATION = T4.EXPIRATION
;
An index on dbo.Stock would also help:
CREATE INDEX IX_Stock_1 ON dbo.Stock(UNDERLYING_SYMBOL, QUOTE_DATE, ROOT_SYMBOL, EXPIRATION);
June 4, 2023 at 10:35 pm
a image of the plans don't help much - please post the real explain plan - actual if possible.
use https://www.brentozar.com/pastetheplan/ for it
thanks for that link. here you go
https://www.brentozar.com/pastetheplan/?id=SJTjb9cIh
June 5, 2023 at 12:22 am
This is an equivalent query
SELECT T4.UNDERLYING_SYMBOL,
T4.QUOTE_DATE,
T4.EXPIRATION,
T4.ROOT_SYMBOL
FROM (SELECT DISTINCT UNDERLYING_SYMBOL, QUOTE_DATE
FROM dbo.Stock) X
CROSS APPLY (SELECT TOP(1)
T1.UNDERLYING_SYMBOL,
T1.QUOTE_DATE,
T1.EXPIRATION,
T1.ROOT_SYMBOL
FROM dbo.Stock AS T1
WHERE DATEDIFF(DAY, T1.QUOTE_DATE, T1.EXPIRATION) BETWEEN 23 AND 30
AND T1.UNDERLYING_SYMBOL = X.UNDERLYING_SYMBOL
AND T1.QUOTE_DATE = X.QUOTE_DATE
ORDER BY T1.ROOT_SYMBOL
) AS T4
CROSS APPLY (SELECT TOP(1)
T3.EXPIRATION
FROM dbo.Stock AS T3
WHERE DATEDIFF(DAY, T3.QUOTE_DATE, T3.EXPIRATION) BETWEEN 23 AND 30
AND DATENAME(WEEKDAY, T3.EXPIRATION) IN ('Friday', 'Thursday')
AND NOT EXISTS(SELECT *
FROM dbo.Stock AS T2
WHERE T2.EXPIRATION = DATEADD(DAY, 1, T3.EXPIRATION))
AND T3.UNDERLYING_SYMBOL = T4.UNDERLYING_SYMBOL
AND T3.QUOTE_DATE = T4.QUOTE_DATE
ORDER BY T3.EXPIRATION DESC
) AS T5
WHERE T5.EXPIRATION = T4.EXPIRATION
;An index on dbo.Stock would also help:
CREATE INDEX IX_Stock_1 ON dbo.Stock(UNDERLYING_SYMBOL, QUOTE_DATE, ROOT_SYMBOL, EXPIRATION);Jonathan AC Roberts wrote:This is an equivalent query
SELECT T4.UNDERLYING_SYMBOL,
T4.QUOTE_DATE,
T4.EXPIRATION,
T4.ROOT_SYMBOL
FROM (SELECT DISTINCT UNDERLYING_SYMBOL, QUOTE_DATE
FROM dbo.Stock) X
CROSS APPLY (SELECT TOP(1)
T1.UNDERLYING_SYMBOL,
T1.QUOTE_DATE,
T1.EXPIRATION,
T1.ROOT_SYMBOL
FROM dbo.Stock AS T1
WHERE DATEDIFF(DAY, T1.QUOTE_DATE, T1.EXPIRATION) BETWEEN 23 AND 30
AND T1.UNDERLYING_SYMBOL = X.UNDERLYING_SYMBOL
AND T1.QUOTE_DATE = X.QUOTE_DATE
ORDER BY T1.ROOT_SYMBOL
) AS T4
CROSS APPLY (SELECT TOP(1)
T3.EXPIRATION
FROM dbo.Stock AS T3
WHERE DATEDIFF(DAY, T3.QUOTE_DATE, T3.EXPIRATION) BETWEEN 23 AND 30
AND DATENAME(WEEKDAY, T3.EXPIRATION) IN ('Friday', 'Thursday')
AND NOT EXISTS(SELECT *
FROM dbo.Stock AS T2
WHERE T2.EXPIRATION = DATEADD(DAY, 1, T3.EXPIRATION))
AND T3.UNDERLYING_SYMBOL = T4.UNDERLYING_SYMBOL
AND T3.QUOTE_DATE = T4.QUOTE_DATE
ORDER BY T3.EXPIRATION DESC
) AS T5
WHERE T5.EXPIRATION = T4.EXPIRATION
;An index on dbo.Stock would also help:
CREATE INDEX IX_Stock_1 ON dbo.Stock(UNDERLYING_SYMBOL, QUOTE_DATE, ROOT_SYMBOL, EXPIRATION);Thanks for the reply. This query took over 3 minutes to run and it still didn't produce a result so I cancelled it. The run time on this one is even worse than the one I posted.
June 5, 2023 at 1:47 am
adding further indexes on this case likely not the best - the underlying table is a columnstore table.
the thing that would speed up the most is to have a computed column with the datediff of the quote_date and Expiration columns as that would enable a columnstore filter to be applied at source.
without the above I can only suggest that the following is tried - specially if the construct applies to the other queries (omitted by the OP)
drop table if exists #weeklastday;
/*
create table with distinct entries for expiration.
Assumption is that from the descripition of the problem this date will always be a Friday (if not a bank holiday)
and as the original code was only interested in finding if adding 1 day to a thursday did not fall on the following Friday
we deduct 1 day from this date so the original query can avoid doing another dateadd operation to find a match
*/
select distinct dateadd(day, -1, EXPIRATION) as EXPIRATION
into #weeklastday
from dbo.Stock
where datename(weekday, t1.EXPIRATION) = 'Friday'
;
create clustered index #ci_weeklastday on #weeklastday
(EXPIRATION
)
;
select t6.UNDERLYING_SYMBOL
, t6.QUOTE_DATE
, t6.expiration
, t6.ROOT_SYMBOL
from (select t4.UNDERLYING_SYMBOL
, t4.QUOTE_DATE
, t4.expiration
, t4.ROOT_SYMBOL
, row_number() over (partition by t4.UNDERLYING_SYMBOL, t4.QUOTE_DATE order by t4.ROOT_SYMBOL) as root_symbol_seq
from (select t1.UNDERLYING_SYMBOL
, t1.QUOTE_DATE
, t1.expiration
, t1.ROOT_SYMBOL
, case
when datename(weekday, t1.EXPIRATION) = 'Friday'
then 'Y'
when datename(weekday, t1.EXPIRATION) = 'Thursday'
and t2.EXPIRATION is null
then 'Y'
else 'N'
end as end_of_week
from dbo.Stock as t1
left outer join #weeklastday t2
on t2.EXPIRATION = t1.EXPIRATION
where datediff(day, t1.QUOTE_DATE, t1.expiration) >= 23
and datediff(day, t1.QUOTE_DATE, t1.expiration) <= 30
) as t4
inner join (select t3.UNDERLYING_SYMBOL
, t3.QUOTE_DATE
, max(t3.expiration) as expiration
from (select t1.UNDERLYING_SYMBOL
, t1.QUOTE_DATE
, t1.expiration
, t1.ROOT_SYMBOL
, case
when datename(weekday, t1.EXPIRATION) = 'Friday'
then 'Y'
when datename(weekday, t1.EXPIRATION) = 'Thursday'
and t2.EXPIRATION is null
then 'Y'
else 'N'
end as end_of_week
from dbo.Stock as t1
left outer join #weeklastday t2
on t2.EXPIRATION = t1.EXPIRATION
where datediff(day, t1.QUOTE_DATE, t1.expiration) >= 23
and datediff(day, t1.QUOTE_DATE, t1.expiration) <= 30
) as t3
where t3.end_of_week = 'Y'
group by t3.UNDERLYING_SYMBOL
, t3.QUOTE_DATE
) as t5
on t4.UNDERLYING_SYMBOL = t5.UNDERLYING_SYMBOL
and t4.QUOTE_DATE = t5.QUOTE_DATE
and t4.expiration = t5.expiration
) as t6
where t6.root_symbol_seq = 1
June 5, 2023 at 1:53 am
adding further indexes on this case likely not the best - the underlying table is a columnstore table.
the thing that would speed up the most is to have a computed column with the datediff of the quote_date and Expiration columns as that would enable a columnstore filter to be applied at source.
without the above I can only suggest that the following is tried - specially if the construct applies to the other queries (omitted by the OP)
drop table if exists #weeklastday;
/*
create table with distinct entries for expiration.
Assumption is that from the descripition of the problem this date will always be a Friday (if not a bank holiday)
and as the original code was only interested in finding if adding 1 day to a thursday did not fall on the following Friday
we deduct 1 day from this date so the original query can avoid doing another dateadd operation to find a match
*/select distinct dateadd(day, -1, EXPIRATION) as EXPIRATION
into #weeklastday
from dbo.Stock
where datename(weekday, t1.EXPIRATION) = 'Friday'
;
create clustered index #ci_weeklastday on #weeklastday
(EXPIRATION
)
;
select t6.UNDERLYING_SYMBOL
, t6.QUOTE_DATE
, t6.expiration
, t6.ROOT_SYMBOL
from (select t4.UNDERLYING_SYMBOL
, t4.QUOTE_DATE
, t4.expiration
, t4.ROOT_SYMBOL
, row_number() over (partition by t4.UNDERLYING_SYMBOL, t4.QUOTE_DATE order by t4.ROOT_SYMBOL) as root_symbol_seq
from (select t1.UNDERLYING_SYMBOL
, t1.QUOTE_DATE
, t1.expiration
, t1.ROOT_SYMBOL
, case
when datename(weekday, t1.EXPIRATION) = 'Friday'
then 'Y'
when datename(weekday, t1.EXPIRATION) = 'Thursday'
and t2.EXPIRATION is null
then 'Y'
else 'N'
end as end_of_week
from dbo.Stock as t1
left outer join #weeklastday t2
on t2.EXPIRATION = t1.EXPIRATION
where datediff(day, t1.QUOTE_DATE, t1.expiration) >= 23
and datediff(day, t1.QUOTE_DATE, t1.expiration) <= 30
) as t4
inner join (select t3.UNDERLYING_SYMBOL
, t3.QUOTE_DATE
, max(t3.expiration) as expiration
from (select t1.UNDERLYING_SYMBOL
, t1.QUOTE_DATE
, t1.expiration
, t1.ROOT_SYMBOL
, case
when datename(weekday, t1.EXPIRATION) = 'Friday'
then 'Y'
when datename(weekday, t1.EXPIRATION) = 'Thursday'
and t2.EXPIRATION is null
then 'Y'
else 'N'
end as end_of_week
from dbo.Stock as t1
left outer join #weeklastday t2
on t2.EXPIRATION = t1.EXPIRATION
where datediff(day, t1.QUOTE_DATE, t1.expiration) >= 23
and datediff(day, t1.QUOTE_DATE, t1.expiration) <= 30
) as t3
where t3.end_of_week = 'Y'
group by t3.UNDERLYING_SYMBOL
, t3.QUOTE_DATE
) as t5
on t4.UNDERLYING_SYMBOL = t5.UNDERLYING_SYMBOL
and t4.QUOTE_DATE = t5.QUOTE_DATE
and t4.expiration = t5.expiration
) as t6
where t6.root_symbol_seq = 1
Thank you for this.
June 5, 2023 at 2:04 am
adding further indexes on this case likely not the best - the underlying table is a columnstore table.
the thing that would speed up the most is to have a computed column with the datediff of the quote_date and Expiration columns as that would enable a columnstore filter to be applied at source.
without the above I can only suggest that the following is tried - specially if the construct applies to the other queries (omitted by the OP)
drop table if exists #weeklastday;
/*
create table with distinct entries for expiration.
Assumption is that from the descripition of the problem this date will always be a Friday (if not a bank holiday)
and as the original code was only interested in finding if adding 1 day to a thursday did not fall on the following Friday
we deduct 1 day from this date so the original query can avoid doing another dateadd operation to find a match
*/
select distinct dateadd(day, -1, EXPIRATION) as EXPIRATION
into #weeklastday
from dbo.Stock
where datename(weekday, t1.EXPIRATION) = 'Friday'
;
create clustered index #ci_weeklastday on #weeklastday
(EXPIRATION
)
;
select t6.UNDERLYING_SYMBOL
, t6.QUOTE_DATE
, t6.expiration
, t6.ROOT_SYMBOL
from (select t4.UNDERLYING_SYMBOL
, t4.QUOTE_DATE
, t4.expiration
, t4.ROOT_SYMBOL
, row_number() over (partition by t4.UNDERLYING_SYMBOL, t4.QUOTE_DATE order by t4.ROOT_SYMBOL) as root_symbol_seq
from (select t1.UNDERLYING_SYMBOL
, t1.QUOTE_DATE
, t1.expiration
, t1.ROOT_SYMBOL
, case
when datename(weekday, t1.EXPIRATION) = 'Friday'
then 'Y'
when datename(weekday, t1.EXPIRATION) = 'Thursday'
and t2.EXPIRATION is null
then 'Y'
else 'N'
end as end_of_week
from dbo.Stock as t1
left outer join #weeklastday t2
on t2.EXPIRATION = t1.EXPIRATION
where datediff(day, t1.QUOTE_DATE, t1.expiration) >= 23
and datediff(day, t1.QUOTE_DATE, t1.expiration) <= 30
) as t4
inner join (select t3.UNDERLYING_SYMBOL
, t3.QUOTE_DATE
, max(t3.expiration) as expiration
from (select t1.UNDERLYING_SYMBOL
, t1.QUOTE_DATE
, t1.expiration
, t1.ROOT_SYMBOL
, case
when datename(weekday, t1.EXPIRATION) = 'Friday'
then 'Y'
when datename(weekday, t1.EXPIRATION) = 'Thursday'
and t2.EXPIRATION is null
then 'Y'
else 'N'
end as end_of_week
from dbo.Stock as t1
left outer join #weeklastday t2
on t2.EXPIRATION = t1.EXPIRATION
where datediff(day, t1.QUOTE_DATE, t1.expiration) >= 23
and datediff(day, t1.QUOTE_DATE, t1.expiration) <= 30
) as t3
where t3.end_of_week = 'Y'
group by t3.UNDERLYING_SYMBOL
, t3.QUOTE_DATE
) as t5
on t4.UNDERLYING_SYMBOL = t5.UNDERLYING_SYMBOL
and t4.QUOTE_DATE = t5.QUOTE_DATE
and t4.expiration = t5.expiration
) as t6
where t6.root_symbol_seq = 1
I forgot to add...
Re datediff. It is only used in the where part in the problem 2 sub-queries. No where else.
I have been working on an approach which is very similar to the one you proposed. namely, have the list of expiration dates put into a temp table:
CASE
WHEN DATENAME(WEEKDAY,EXPIRATION) = 'Friday' THEN 'Y'
WHEN DATENAME(WEEKDAY,EXPIRATION) = 'Thursday' AND (DATEADD(DAY,1,EXPIRATION) NOT IN (SELECT * FROM #TEMP_TABLE1)) THEN 'Y'
ELSE 'N'
END AS END_OF_WEEK
this approach has been very promising. before doing this the query took 2m 40s to run. after doing this the query took 49 s to run. definitely a huge improvement so happy so far. i would still like to get this number down. i am going to try your code too once you confirm my other comment.
thank you again for your help
June 5, 2023 at 2:07 am
no difference.
your code was only looking for the the friday date on a particular situation e.g. when the EXPIRATION = Thursday
so by determining all the FRIDAY dates in advance, subtracting one day from it means we can replace the following critirea
(DATEADD(DAY,1,EXPIRATION) NOT IN (SELECT DISTINCT T2.EXPIRATION FROM DBO.Stock AS T2)) THEN 'Y'
ELSE 'N'
with the "t2.expiration is null" associated with a left outer join to the temp table.
June 5, 2023 at 2:20 am
no difference.
your code was only looking for the the friday date on a particular situation e.g. when the EXPIRATION = Thursday
so by determining all the FRIDAY dates in advance, subtracting one day from it means we can replace the following critirea
(DATEADD(DAY,1,EXPIRATION) NOT IN (SELECT DISTINCT T2.EXPIRATION FROM DBO.Stock AS T2)) THEN 'Y' ELSE 'N'
with the "t2.expiration is null" associated with a left outer join to the temp table.
got it thank you. i will take a look at your code and report back
Viewing 15 posts - 1 through 15 (of 34 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy