April 5, 2024 at 4:04 am
Hi everyone
I am working on a query and I am a bit stuck. I will explain the logic.
Test Data:
drop table if exists #test_table
create table #test_table
(
company char(5),
t_date date,
type char(1),
level int,
bid numeric(5,2),
ask numeric(5,2)
)
insert into #test_table
values
('ABC','2024-03-21','L','100','10','22'),
('ABC','2024-03-21','L','200','15','17'),
('ABC','2024-03-21','L','300','25','27'),
('ABC','2024-03-21','L','400','13','37'),
('ABC','2024-03-21','L','500','11','13'),
('ABC','2024-03-21','T','100','20','22'),
('ABC','2024-03-21','T','200','18','24'),
('ABC','2024-03-21','T','300','16','20'),
('ABC','2024-03-21','T','400','17','19'),
('ABC','2024-03-21','T','500','16','20'),
('DEF','2024-03-21','L','1100','20','22'),
('DEF','2024-03-21','L','1200','10','12'),
('DEF','2024-03-21','L','1300','17','13'),
('DEF','2024-03-21','L','1400','24','28'),
('DEF','2024-03-21','L','1500','18','20'),
('DEF','2024-03-21','T','1100','24','26'),
('DEF','2024-03-21','T','1200','19','23'),
('DEF','2024-03-21','T','1300','15','19'),
('DEF','2024-03-21','T','1400','27','37'),
('DEF','2024-03-21','T','1500','28','30')
Expected Outcome:
Explanation of logic:
First, find the following metric:
ABS( (average of bid and ask for type = L) - (average of bid and ask for type = T))
Second, find the level that corresponds to the min (ie lowest) metric for all L and T pairs
Finally, if there is match (ie two L and T pairs produce the same metric and they are the lowest) then pick the level that is lower.
An example with numbers will help to explain...
Let's look at ABC first. First, we partition the data by company and date. Next, we look at the metric I mentioned.
For level = 100 and type = L we have (10+22)/2 = 16
For level = 100 and type = T we have (20+22)/2 = 21
so for level = 100 the metric is abs(16-21) = 5
For level = 200 and type = L we have (15+17)/2 = 16
For level = 200 and type = T we have (18+24)/2 = 21
so for level = 200 the metric is abs(16-21) = 5
We continue this process until all levels have been processed.
The metrics are:
Level = 100 -> 5
Level = 200 -> 5
Level = 300 -> 8
Level = 400 -> 7
Level = 500 -> 6
Now, all the levels have been processed. The lowest metric is 5. However, there are two levels that have metric = 5 (ie the lowest). So to break the tie between Level = 100 and Level = 200 we pick the lower one so Level = 100 is the winner.
Same process applies for company DEF.
The metrics are:
Level = 1100 -> 4
Level = 1200 -> 10
Level = 1300 -> 2
Level = 1400 -> 6
Level = 1500 -> 10
The lowest metric is 2 so the winner is Level = 1300. Notice, there is a tie for 10 but that isn't important b/c it isn't the lowest so the tie-breaking rule isn't applied.
How do I code this in SQL that is efficient?
Thank you
April 5, 2024 at 10:13 am
Seems like you could achieve this with a couple of CTEs. I assume that the value of T_date
could differ, hence why I use window aggregation functions rather than non-windowed. Then you can get the "Top (1) per group" using a further CTE and ROW_NUMBER
, giving the following:
WITH Avgs AS(
SELECT tt.company,
tt.level,
tt.t_date,
ABS(AVG(CASE tt.type WHEN 'L' THEN V.N END) OVER (PARTITION BY tt.company, tt.level) - AVG(CASE tt.type WHEN 'T' THEN V.N END) OVER (PARTITION BY tt.company, tt.level) ) AS AvgAbs
FROM #test_table tt
CROSS APPLY (VALUES(tt.bid),
(tt.ask))V(N)),
RNs AS(
SELECT A.company,
A.level,
A.t_date,
A.AvgAbs,
ROW_NUMBER() OVER (PARTITION BY A.company ORDER BY A.AvgAbs, A.level) AS RN
FROM Avgs A)
SELECT R.company,
R.level,
R.t_date
FROM RNs R
WHERE R.RN = 1;
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
April 6, 2024 at 4:25 am
Seems like you could achieve this with a couple of CTEs. I assume that the value of
T_date
could differ, hence why I use window aggregation functions rather than non-windowed. Then you can get the "Top (1) per group" using a further CTE andROW_NUMBER
, giving the following:WITH Avgs AS(
SELECT tt.company,
tt.level,
tt.t_date,
ABS(AVG(CASE tt.type WHEN 'L' THEN V.N END) OVER (PARTITION BY tt.company, tt.level) - AVG(CASE tt.type WHEN 'T' THEN V.N END) OVER (PARTITION BY tt.company, tt.level) ) AS AvgAbs
FROM #test_table tt
CROSS APPLY (VALUES(tt.bid),
(tt.ask))V(N)),
RNs AS(
SELECT A.company,
A.level,
A.t_date,
A.AvgAbs,
ROW_NUMBER() OVER (PARTITION BY A.company ORDER BY A.AvgAbs, A.level) AS RN
FROM Avgs A)
SELECT R.company,
R.level,
R.t_date
FROM RNs R
WHERE R.RN = 1;
Thank you so much for this. Yes the date would change each day.
I did a test where I changed the date to a different day. In that case the query didn't recognize that there should be two value reported (one for each day). Instead, it combined it into one value. Here is revised dataset:
insert into #test_table
values
('ABC','2024-03-21','L','100','10','22'),
('ABC','2024-03-21','L','200','15','17'),
('ABC','2024-03-21','L','300','25','27'),
('ABC','2024-03-21','L','400','13','37'),
('ABC','2024-03-21','L','500','11','13'),
('ABC','2024-03-21','T','100','20','22'),
('ABC','2024-03-21','T','200','18','24'),
('ABC','2024-03-21','T','300','16','20'),
('ABC','2024-03-21','T','400','17','19'),
('ABC','2024-03-21','T','500','16','20'),
('ABC','2024-03-22','L','1100','20','22'),
('ABC','2024-03-22','L','1200','10','12'),
('ABC','2024-03-22','L','1300','17','13'),
('ABC','2024-03-22','L','1400','24','28'),
('ABC','2024-03-22','L','1500','18','20'),
('ABC','2024-03-22','T','1100','24','26'),
('ABC','2024-03-22','T','1200','19','23'),
('ABC','2024-03-22','T','1300','15','19'),
('ABC','2024-03-22','T','1400','27','37'),
('ABC','2024-03-22','T','1500','28','30')
I have another request. My apologies for not mentioning this earlier. I also need to know the bid/ask prices for the type = L and type = T that gave the level. It would return L = 16 and T = 21 (in addition to what it currently provides). How should the query be modified so it provides this information? Thank you again for your help.
April 6, 2024 at 4:44 am
I think I fixed the issue. I added t_date to the partition part and now I am getting the correct number of records. The query is super slow though. It took 1 min 15 secs to run on my test data of about 31 million records. Is there a faster way to do the work?
April 7, 2024 at 7:03 pm
I think I fixed the issue. I added t_date to the partition part and now I am getting the correct number of records. The query is super slow though. It took 1 min 15 secs to run on my test data of about 31 million records. Is there a faster way to do the work?
Can you provide the query you are using?
April 7, 2024 at 7:54 pm
water490 wrote:I think I fixed the issue. I added t_date to the partition part and now I am getting the correct number of records. The query is super slow though. It took 1 min 15 secs to run on my test data of about 31 million records. Is there a faster way to do the work?
Can you provide the query you are using?
np
here you go:
WITH Avgs AS(
SELECT tt.company,
tt.level,
tt.t_date,
ABS(AVG(CASE tt.type WHEN 'L' THEN V.N END) OVER (PARTITION BY tt.company, tt.t_date, tt.level) - AVG(CASE tt.type WHEN 'T' THEN V.N END) OVER (PARTITION BY tt.company, tt.t_date, tt.level) ) AS AvgAbs
FROM #test_table tt
CROSS APPLY (VALUES(tt.bid),
(tt.ask))V(N)),
RNs AS(
SELECT A.company,
A.level,
A.t_date,
A.AvgAbs,
ROW_NUMBER() OVER (PARTITION BY A.company, A.t_date ORDER BY A.AvgAbs, A.level) AS RN
FROM Avgs A)
SELECT R.company,
R.level,
R.t_date
FROM RNs R
WHERE R.RN = 1;
Thank you!
April 8, 2024 at 5:58 pm
Not sure if this will be any faster, you might have to look at putting indexes on the table if it is not:
;with cte as
(
SELECT tt.company,
tt.level,
tt.t_date,
ABS(AVG(CASE tt.type WHEN 'L' THEN (bid+ask)/2 END) - AVG(CASE tt.type WHEN 'T' THEN (bid+ask)/2 END)) AS metric
FROM #test_table tt
GROUP BY company, tt.level, tt.t_date
)
,cte2 as
(
SELECT DISTINCT
a.company,
a.t_date
FROM cte a
)
SELECT a.company,
a.t_date,
b.level,
b.metric
FROM cte2 a
CROSS APPLY (SELECT TOP(1) b.level, b.metric
FROM cte b
WHERE b.company = a.company
AND b.t_date = a.t_date
ORDER BY b.metric, b.level) b
April 8, 2024 at 7:26 pm
Not sure if this will be any faster, you might have to look at putting indexes on the table if it is not:
;with cte as
(
SELECT tt.company,
tt.level,
tt.t_date,
ABS(AVG(CASE tt.type WHEN 'L' THEN (bid+ask)/2 END) - AVG(CASE tt.type WHEN 'T' THEN (bid+ask)/2 END)) AS metric
FROM #test_table tt
GROUP BY company, tt.level, tt.t_date
)
,cte2 as
(
SELECT DISTINCT
a.company,
a.t_date
FROM cte a
)
SELECT a.company,
a.t_date,
b.level,
b.metric
FROM cte2 a
CROSS APPLY (SELECT TOP(1) b.level, b.metric
FROM cte b
WHERE b.company = a.company
AND b.t_date = a.t_date
ORDER BY b.metric, b.level) b
Thank you for this.
The metric is unsigned (expected) when it does the matching but when it is displayed in the select field I need a signed version of the metric (ie no ABS). How would I modify your query to see this? It will be avg of L - avg of T.
Thank you
April 8, 2024 at 9:50 pm
Jonathan AC Roberts wrote:Not sure if this will be any faster, you might have to look at putting indexes on the table if it is not:
;with cte as
(
SELECT tt.company,
tt.level,
tt.t_date,
ABS(AVG(CASE tt.type WHEN 'L' THEN (bid+ask)/2 END) - AVG(CASE tt.type WHEN 'T' THEN (bid+ask)/2 END)) AS metric
FROM #test_table tt
GROUP BY company, tt.level, tt.t_date
)
,cte2 as
(
SELECT DISTINCT
a.company,
a.t_date
FROM cte a
)
SELECT a.company,
a.t_date,
b.level,
b.metric
FROM cte2 a
CROSS APPLY (SELECT TOP(1) b.level, b.metric
FROM cte b
WHERE b.company = a.company
AND b.t_date = a.t_date
ORDER BY b.metric, b.level) bThank you for this.
The metric is unsigned (expected) when it does the matching but when it is displayed in the select field I need a signed version of the metric (ie no ABS). How would I modify your query to see this? It will be avg of L - avg of T.
Thank you
;with cte as
(
SELECT tt.company,
tt.level,
tt.t_date,
AVG(CASE tt.type WHEN 'L' THEN (bid+ask)/2 END) - AVG(CASE tt.type WHEN 'T' THEN (bid+ask)/2 END) AS metric
FROM #test_table tt
GROUP BY company, tt.level, tt.t_date
)
,cte2 as
(
SELECT DISTINCT
a.company,
a.t_date
FROM cte a
)
SELECT a.company,
a.t_date,
b.level,
b.metric
FROM cte2 a
CROSS APPLY (SELECT TOP(1) b.level, b.metric
FROM cte b
WHERE b.company = a.company
AND b.t_date = a.t_date
ORDER BY ABS(b.metric), b.level) b
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply