need help with a SQL query

  • Hi everyone

    I am working on a query and I am a bit stuck.  I will explain the logic.

    Test Data:

    drop table if exists #test_table

    create table #test_table
    (
    company char(5),
    t_date date,
    type char(1),
    level int,
    bid numeric(5,2),
    ask numeric(5,2)
    )

    insert into #test_table
    values
    ('ABC','2024-03-21','L','100','10','22'),
    ('ABC','2024-03-21','L','200','15','17'),
    ('ABC','2024-03-21','L','300','25','27'),
    ('ABC','2024-03-21','L','400','13','37'),
    ('ABC','2024-03-21','L','500','11','13'),
    ('ABC','2024-03-21','T','100','20','22'),
    ('ABC','2024-03-21','T','200','18','24'),
    ('ABC','2024-03-21','T','300','16','20'),
    ('ABC','2024-03-21','T','400','17','19'),
    ('ABC','2024-03-21','T','500','16','20'),
    ('DEF','2024-03-21','L','1100','20','22'),
    ('DEF','2024-03-21','L','1200','10','12'),
    ('DEF','2024-03-21','L','1300','17','13'),
    ('DEF','2024-03-21','L','1400','24','28'),
    ('DEF','2024-03-21','L','1500','18','20'),
    ('DEF','2024-03-21','T','1100','24','26'),
    ('DEF','2024-03-21','T','1200','19','23'),
    ('DEF','2024-03-21','T','1300','15','19'),
    ('DEF','2024-03-21','T','1400','27','37'),
    ('DEF','2024-03-21','T','1500','28','30')

    Expected Outcome:

    Screenshot 2024-04-04 204638

    Explanation of logic:

    First, find the following metric:

    ABS( (average of bid and ask for type = L) - (average of bid and ask for type = T))

    Second, find the level that corresponds to the min (ie lowest) metric for all L and T pairs

    Finally, if there is match (ie two L and T pairs produce the same metric and they are the lowest) then pick the level that is lower.

    An example with numbers will help to explain...

    Let's look at ABC first.  First, we partition the data by company and date.  Next, we look at the metric I mentioned.

    For level = 100 and type = L we have (10+22)/2 = 16

    For level = 100 and type = T we have (20+22)/2 = 21

    so for level = 100 the metric is abs(16-21) = 5

    For level = 200 and type = L we have (15+17)/2 = 16

    For level = 200 and type = T we have (18+24)/2 = 21

    so for level = 200 the metric is abs(16-21) = 5

    We continue this process until all levels have been processed.

    The metrics are:

    Level = 100 -> 5

    Level = 200 -> 5

    Level = 300 -> 8

    Level = 400 -> 7

    Level = 500 -> 6

    Now, all the levels have been processed.  The lowest metric is 5.  However, there are two levels that have metric = 5 (ie the lowest).  So to break the tie between Level = 100 and Level = 200 we pick the lower one so Level = 100 is the winner.

    Same process applies for company DEF.

    The metrics are:

    Level = 1100 -> 4

    Level = 1200 -> 10

    Level = 1300 -> 2

    Level = 1400 -> 6

    Level = 1500 -> 10

    The lowest metric is 2 so the winner is Level = 1300.  Notice, there is a tie for 10 but that isn't important b/c it isn't the lowest so the tie-breaking rule isn't applied.

    How do I code this in SQL that is efficient?

    Thank you

  • Seems like you could achieve this with a couple of CTEs. I assume that the value of T_date could differ, hence why I use window aggregation functions rather than non-windowed. Then you can get the "Top (1) per group" using a further CTE and ROW_NUMBER, giving the following:

    WITH Avgs AS(
    SELECT tt.company,
    tt.level,
    tt.t_date,
    ABS(AVG(CASE tt.type WHEN 'L' THEN V.N END) OVER (PARTITION BY tt.company, tt.level) - AVG(CASE tt.type WHEN 'T' THEN V.N END) OVER (PARTITION BY tt.company, tt.level) ) AS AvgAbs
    FROM #test_table tt
    CROSS APPLY (VALUES(tt.bid),
    (tt.ask))V(N)),
    RNs AS(
    SELECT A.company,
    A.level,
    A.t_date,
    A.AvgAbs,
    ROW_NUMBER() OVER (PARTITION BY A.company ORDER BY A.AvgAbs, A.level) AS RN
    FROM Avgs A)
    SELECT R.company,
    R.level,
    R.t_date
    FROM RNs R
    WHERE R.RN = 1;

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A wrote:

    Seems like you could achieve this with a couple of CTEs. I assume that the value of T_date could differ, hence why I use window aggregation functions rather than non-windowed. Then you can get the "Top (1) per group" using a further CTE and ROW_NUMBER, giving the following:

    WITH Avgs AS(
    SELECT tt.company,
    tt.level,
    tt.t_date,
    ABS(AVG(CASE tt.type WHEN 'L' THEN V.N END) OVER (PARTITION BY tt.company, tt.level) - AVG(CASE tt.type WHEN 'T' THEN V.N END) OVER (PARTITION BY tt.company, tt.level) ) AS AvgAbs
    FROM #test_table tt
    CROSS APPLY (VALUES(tt.bid),
    (tt.ask))V(N)),
    RNs AS(
    SELECT A.company,
    A.level,
    A.t_date,
    A.AvgAbs,
    ROW_NUMBER() OVER (PARTITION BY A.company ORDER BY A.AvgAbs, A.level) AS RN
    FROM Avgs A)
    SELECT R.company,
    R.level,
    R.t_date
    FROM RNs R
    WHERE R.RN = 1;

    Thank you so much for this.  Yes the date would change each day.

    I did a test where I changed the date to a different day.  In that case the query didn't recognize that there should be two value reported (one for each day).  Instead, it combined it into one value.  Here is revised dataset:

    insert into #test_table
    values
    ('ABC','2024-03-21','L','100','10','22'),
    ('ABC','2024-03-21','L','200','15','17'),
    ('ABC','2024-03-21','L','300','25','27'),
    ('ABC','2024-03-21','L','400','13','37'),
    ('ABC','2024-03-21','L','500','11','13'),
    ('ABC','2024-03-21','T','100','20','22'),
    ('ABC','2024-03-21','T','200','18','24'),
    ('ABC','2024-03-21','T','300','16','20'),
    ('ABC','2024-03-21','T','400','17','19'),
    ('ABC','2024-03-21','T','500','16','20'),
    ('ABC','2024-03-22','L','1100','20','22'),
    ('ABC','2024-03-22','L','1200','10','12'),
    ('ABC','2024-03-22','L','1300','17','13'),
    ('ABC','2024-03-22','L','1400','24','28'),
    ('ABC','2024-03-22','L','1500','18','20'),
    ('ABC','2024-03-22','T','1100','24','26'),
    ('ABC','2024-03-22','T','1200','19','23'),
    ('ABC','2024-03-22','T','1300','15','19'),
    ('ABC','2024-03-22','T','1400','27','37'),
    ('ABC','2024-03-22','T','1500','28','30')

    I have another request. My apologies for not mentioning this earlier.  I also need to know the bid/ask prices for the type = L and type = T that gave the level.  It would return L = 16 and T = 21 (in addition to what it currently provides).  How should the query be modified so it provides this information?  Thank you again for your help.

  • I think I fixed the issue.  I added t_date to the partition part and now I am getting the correct number of records.  The query is super slow though.  It took 1 min 15 secs to run on my test data of about 31 million records.  Is there a faster way to do the work?

  • water490 wrote:

    I think I fixed the issue.  I added t_date to the partition part and now I am getting the correct number of records.  The query is super slow though.  It took 1 min 15 secs to run on my test data of about 31 million records.  Is there a faster way to do the work?

    Can you provide the query you are using?

  • Jonathan AC Roberts wrote:

    water490 wrote:

    I think I fixed the issue.  I added t_date to the partition part and now I am getting the correct number of records.  The query is super slow though.  It took 1 min 15 secs to run on my test data of about 31 million records.  Is there a faster way to do the work?

    Can you provide the query you are using?

    np

    here you go:

    WITH Avgs AS(
    SELECT tt.company,
    tt.level,
    tt.t_date,
    ABS(AVG(CASE tt.type WHEN 'L' THEN V.N END) OVER (PARTITION BY tt.company, tt.t_date, tt.level) - AVG(CASE tt.type WHEN 'T' THEN V.N END) OVER (PARTITION BY tt.company, tt.t_date, tt.level) ) AS AvgAbs
    FROM #test_table tt
    CROSS APPLY (VALUES(tt.bid),
    (tt.ask))V(N)),
    RNs AS(
    SELECT A.company,
    A.level,
    A.t_date,
    A.AvgAbs,
    ROW_NUMBER() OVER (PARTITION BY A.company, A.t_date ORDER BY A.AvgAbs, A.level) AS RN
    FROM Avgs A)
    SELECT R.company,
    R.level,
    R.t_date
    FROM RNs R
    WHERE R.RN = 1;

    Thank you!

  • Not sure if this will be any faster, you might have to look at putting indexes on the table if it is not:

    ;with cte as
    (
    SELECT tt.company,
    tt.level,
    tt.t_date,
    ABS(AVG(CASE tt.type WHEN 'L' THEN (bid+ask)/2 END) - AVG(CASE tt.type WHEN 'T' THEN (bid+ask)/2 END)) AS metric
    FROM #test_table tt
    GROUP BY company, tt.level, tt.t_date
    )
    ,cte2 as
    (
    SELECT DISTINCT
    a.company,
    a.t_date
    FROM cte a
    )
    SELECT a.company,
    a.t_date,
    b.level,
    b.metric
    FROM cte2 a
    CROSS APPLY (SELECT TOP(1) b.level, b.metric
    FROM cte b
    WHERE b.company = a.company
    AND b.t_date = a.t_date
    ORDER BY b.metric, b.level) b
  • Jonathan AC Roberts wrote:

    Not sure if this will be any faster, you might have to look at putting indexes on the table if it is not:

    ;with cte as
    (
    SELECT tt.company,
    tt.level,
    tt.t_date,
    ABS(AVG(CASE tt.type WHEN 'L' THEN (bid+ask)/2 END) - AVG(CASE tt.type WHEN 'T' THEN (bid+ask)/2 END)) AS metric
    FROM #test_table tt
    GROUP BY company, tt.level, tt.t_date
    )
    ,cte2 as
    (
    SELECT DISTINCT
    a.company,
    a.t_date
    FROM cte a
    )
    SELECT a.company,
    a.t_date,
    b.level,
    b.metric
    FROM cte2 a
    CROSS APPLY (SELECT TOP(1) b.level, b.metric
    FROM cte b
    WHERE b.company = a.company
    AND b.t_date = a.t_date
    ORDER BY b.metric, b.level) b

    Thank you for this.

    The metric is unsigned (expected) when it does the matching but when it is displayed in the select field I need a signed version of the metric (ie no ABS).  How would I modify your query to see this?  It will be avg of L - avg of T.

    Thank you

  • water490 wrote:

    Jonathan AC Roberts wrote:

    Not sure if this will be any faster, you might have to look at putting indexes on the table if it is not:

    ;with cte as
    (
    SELECT tt.company,
    tt.level,
    tt.t_date,
    ABS(AVG(CASE tt.type WHEN 'L' THEN (bid+ask)/2 END) - AVG(CASE tt.type WHEN 'T' THEN (bid+ask)/2 END)) AS metric
    FROM #test_table tt
    GROUP BY company, tt.level, tt.t_date
    )
    ,cte2 as
    (
    SELECT DISTINCT
    a.company,
    a.t_date
    FROM cte a
    )
    SELECT a.company,
    a.t_date,
    b.level,
    b.metric
    FROM cte2 a
    CROSS APPLY (SELECT TOP(1) b.level, b.metric
    FROM cte b
    WHERE b.company = a.company
    AND b.t_date = a.t_date
    ORDER BY b.metric, b.level) b

    Thank you for this.

    The metric is unsigned (expected) when it does the matching but when it is displayed in the select field I need a signed version of the metric (ie no ABS).  How would I modify your query to see this?  It will be avg of L - avg of T.

    Thank you

    ;with cte as
    (
    SELECT tt.company,
    tt.level,
    tt.t_date,
    AVG(CASE tt.type WHEN 'L' THEN (bid+ask)/2 END) - AVG(CASE tt.type WHEN 'T' THEN (bid+ask)/2 END) AS metric
    FROM #test_table tt
    GROUP BY company, tt.level, tt.t_date
    )
    ,cte2 as
    (
    SELECT DISTINCT
    a.company,
    a.t_date
    FROM cte a
    )
    SELECT a.company,
    a.t_date,
    b.level,
    b.metric
    FROM cte2 a
    CROSS APPLY (SELECT TOP(1) b.level, b.metric
    FROM cte b
    WHERE b.company = a.company
    AND b.t_date = a.t_date
    ORDER BY ABS(b.metric), b.level) b

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply