How to speed up this query?

  • Hi everyone

    I have a query that is taking a real long time.  It takes close to 30 minutes to process about 14 million records.  Is there any way to speed it up?

    Query:

    DROP TABLE IF EXISTS #TestTable
    DROP TABLE IF EXISTS #TEMP1

    CREATE TABLE #TestTable
    (
    [UNIT] [nvarchar](10) NOT NULL,
    [PURCHASE_DATE] [date] NOT NULL,
    [METRIC1] [float] NULL,
    [METRIC2] [float] NULL
    )

    INSERT INTO #TestTable
    VALUES ('ABC','2024-12-12',435.090,210.33),
    ('ABC','2024-12-09',213.390,4013.4901),
    ('DEF','2024-12-12',34,99),
    ('ABC','2024-12-10',11,3.008),
    ('DEF','2024-12-11',57.903,9),
    ('DEF','2024-12-10',440.023,62),
    ('ABC','2024-12-08',33.924,80.02),
    ('DEF','2024-12-08',43.40,12),
    ('DEF','2024-12-07',11.30,48.90),
    ('DEF','2024-12-06',21.984,16.33)

    SELECT T1.UNIT,
    T1.PURCHASE_DATE,
    T1.METRIC1,
    COUNT,
    ROW_NUMBER() OVER (PARTITION BY T1.UNIT ORDER BY T1.PURCHASE_DATE) AS ROW_NUM
    INTO #TEMP1
    FROM #TestTable AS T1
    CROSS APPLY (SELECT COUNT(*) COUNT
    FROM (SELECT NULL X) X
    CROSS APPLY (SELECT TOP(3) *
    FROM #TestTable AS T2
    WHERE T2.PURCHASE_DATE <= T1.PURCHASE_DATE AND T2.UNIT = T1.UNIT
    ORDER BY T2.PURCHASE_DATE DESC) AS T3
    WHERE T3.METRIC1 < T1.METRIC1 AND T3.UNIT = T1.UNIT
    ) AS T4

    SELECT*
    FROM#TestTable

    -- expected outcome
    SELECTT5.UNIT,
    T5.PURCHASE_DATE,
    T5.METRIC1,
    T5.COUNT
    FROM#TEMP1 AS T5
    WHERET5.ROW_NUM >= 3
    ORDER BY 1,2

    Description:

    The above code shows the expected outcome.  Here is a description on what is happening...

    Step 1 - partition the data by UNIT

    Step 2 - order the partitioned data by PURCHASE_DATE in descending order

    Step 3 - Look at the current record plus the 2 earlier records for a total of 3 records in consideration.  Let's look at a few examples...

    ABC on 2024-12-12.  The records under consideration are 2024-12-09, 2024-12-10, 2024-12-12.  How many records are less than 435.09?  There are 2 such records.  Count = 2

    ABC on 2024-12-09.  There are only 2 records so skip processing.

    Thank you

  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • Hi

    Is anyone able to help me with this?

    Thank you

  • 1. I hope you are not using floats for prices!

    2. Do you really use reserved names for columns?

    3. Are you really writing new code with the depreciated order by column position?

    WITH Counts
    AS
    (
    SELECT UNIT, PURCHASE_DATE, METRIC1
    ,IIF(LAG(METRIC1, 1) OVER (PARTITION BY UNIT ORDER BY PURCHASE_DATE) < METRIC1, 1, 0)
    + IIF(LAG(METRIC1, 2) OVER (PARTITION BY UNIT ORDER BY PURCHASE_DATE) < METRIC1, 1, 0) AS LessThanCount
    ,ROW_NUMBER() OVER (PARTITION BY UNIT ORDER BY PURCHASE_DATE) AS rn
    FROM #TestTable
    )
    SELECT UNIT, PURCHASE_DATE, METRIC1, LessThanCount
    FROM Counts
    WHERE rn >= 3
    ORDER BY UNIT, PURCHASE_DATE;
  • That's a nice looking query Ken.  The OP's query used "SELECT ... INTO #TEMP1" which could be a bottleneck to a larger process imo.  Of course, it's not possible to know whether we're looking at a non-representative simplification.  The typical reasons to allocate a temp table are: a) to re-use an intermediate row set, and/or b) the number of intermediate rows is large enough for an index to be beneficial.  In this case, "processing" about 14 million rows seems likely to return quite a few rows, enough to benefit from (an) index(es).  If so, then ditching "SELECT INTO" and pre-defining the temp table DDL could speed things up imo.  Tables created using SELECT INTO do not inherit any indexes or optimizations from the queried tables and adding indexes after INSERT can be quite a bit slower

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Ken McKelvey wrote:

    1. I hope you are not using floats for prices! 2. Do you really use reserved names for columns? 3. Are you really writing new code with the depreciated order by column position?

    WITH Counts
    AS
    (
    SELECT UNIT, PURCHASE_DATE, METRIC1
    ,IIF(LAG(METRIC1, 1) OVER (PARTITION BY UNIT ORDER BY PURCHASE_DATE) < METRIC1, 1, 0)
    + IIF(LAG(METRIC1, 2) OVER (PARTITION BY UNIT ORDER BY PURCHASE_DATE) < METRIC1, 1, 0) AS LessThanCount
    ,ROW_NUMBER() OVER (PARTITION BY UNIT ORDER BY PURCHASE_DATE) AS rn
    FROM #TestTable
    )
    SELECT UNIT, PURCHASE_DATE, METRIC1, LessThanCount
    FROM Counts
    WHERE rn >= 3
    ORDER BY UNIT, PURCHASE_DATE;

    Thank you Ken

    I wasn't clear in my post.  Sorry about that.  I gave the example of 3.  This is arbitrary and can change.  Would your code work if say the period is 100 and not 3?

  • You could generate the code as dynamic SQL to cope with different numbers but you would need to check carefully how efficient a value of 100 was.

    Whatever you are trying to do, it might be worth hiring a consultant for a day or two. With 14 million rows, as well as Steve's good comments, things like columnstore indexes come into play.

     

     

    • This reply was modified 18 hours, 5 minutes ago by  Ken McKelvey.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply