How to speed up this query?

Question

Post reply

How to speed up this query?

water490

Ten Centuries

Points: 1026

December 13, 2024 at 5:57 am

#4497572

Hi everyone

I have a query that is taking a real long time. It takes close to 30 minutes to process about 14 million records. Is there any way to speed it up?

Query:

DROP TABLE IF EXISTS #TestTable
DROP TABLE IF EXISTS #TEMP1

CREATE TABLE #TestTable
(
[UNIT] [nvarchar](10) NOT NULL,
[PURCHASE_DATE] [date] NOT NULL,
[METRIC1] [float] NULL,
[METRIC2] [float] NULL
)

INSERT INTO #TestTable
VALUES ('ABC','2024-12-12',435.090,210.33),
       ('ABC','2024-12-09',213.390,4013.4901),
   ('DEF','2024-12-12',34,99),
   ('ABC','2024-12-10',11,3.008),
   ('DEF','2024-12-11',57.903,9),
   ('DEF','2024-12-10',440.023,62),
   ('ABC','2024-12-08',33.924,80.02),
   ('DEF','2024-12-08',43.40,12),
   ('DEF','2024-12-07',11.30,48.90),
   ('DEF','2024-12-06',21.984,16.33)

SELECT  T1.UNIT,
T1.PURCHASE_DATE,
T1.METRIC1,
COUNT,
ROW_NUMBER() OVER (PARTITION BY T1.UNIT ORDER BY T1.PURCHASE_DATE) AS ROW_NUM
INTO #TEMP1
FROM #TestTable AS T1
CROSS APPLY (SELECT COUNT(*) COUNT
 FROM (SELECT NULL X) X
 CROSS APPLY (SELECT TOP(3) *
  FROM #TestTable AS T2
  WHERE T2.PURCHASE_DATE <= T1.PURCHASE_DATE AND T2.UNIT = T1.UNIT
  ORDER BY T2.PURCHASE_DATE DESC) AS T3
 WHERE T3.METRIC1 < T1.METRIC1 AND T3.UNIT = T1.UNIT
) AS T4

SELECT*
FROM#TestTable

-- expected outcome
SELECTT5.UNIT,
T5.PURCHASE_DATE,
T5.METRIC1,
T5.COUNT
FROM#TEMP1 AS T5
WHERET5.ROW_NUM >= 3
ORDER BY 1,2

Description:

The above code shows the expected outcome. Here is a description on what is happening...

Step 1 - partition the data by UNIT

Step 2 - order the partitioned data by PURCHASE_DATE in descending order

Step 3 - Look at the current record plus the 2 earlier records for a total of 3 records in consideration. Let's look at a few examples...

ABC on 2024-12-12. The records under consideration are 2024-12-09, 2024-12-10, 2024-12-12. How many records are less than 435.09? There are 2 such records. Count = 2

ABC on 2024-12-09. There are only 2 records so skip processing.

Thank you

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Site Owners SSC Guru Points: 80342 More actions · Answer 1

Thanks for posting your issue and hopefully someone will answer soon.

This is an automated bump to increase visibility of your question.

water490 Ten Centuries Points: 1026 More actions · Answer 2

water490

Ten Centuries

Points: 1026

December 20, 2024 at 6:21 am

#4502205

Hi

Is anyone able to help me with this?

Thank you

Ken McKelvey SSCoach Points: 18890 More actions · Answer 3

1. I hope you are not using floats for prices!

2. Do you really use reserved names for columns?

3. Are you really writing new code with the depreciated order by column position?

WITH Counts
AS
(
   SELECT UNIT, PURCHASE_DATE, METRIC1
       ,IIF(LAG(METRIC1, 1) OVER (PARTITION BY UNIT ORDER BY PURCHASE_DATE) < METRIC1, 1, 0)
          + IIF(LAG(METRIC1, 2) OVER (PARTITION BY UNIT ORDER BY PURCHASE_DATE) < METRIC1, 1, 0) AS LessThanCount
      ,ROW_NUMBER() OVER (PARTITION BY UNIT ORDER BY PURCHASE_DATE) AS rn
   FROM #TestTable
)
SELECT UNIT, PURCHASE_DATE, METRIC1, LessThanCount
FROM Counts
WHERE rn >= 3
ORDER BY UNIT, PURCHASE_DATE;

Steve Collins SSCrazy Points: 2187 More actions · Answer 4

That's a nice looking query Ken. The OP's query used "SELECT ... INTO #TEMP1" which could be a bottleneck to a larger process imo. Of course, it's not possible to know whether we're looking at a non-representative simplification. The typical reasons to allocate a temp table are: a) to re-use an intermediate row set, and/or b) the number of intermediate rows is large enough for an index to be beneficial. In this case, "processing" about 14 million rows seems likely to return quite a few rows, enough to benefit from (an) index(es). If so, then ditching "SELECT INTO" and pre-defining the temp table DDL could speed things up imo. Tables created using SELECT INTO do not inherit any indexes or optimizations from the queried tables and adding indexes after INSERT can be quite a bit slower

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

water490 Ten Centuries Points: 1026 More actions · Answer 5

Ken McKelvey wrote:

1. I hope you are not using floats for prices! 2. Do you really use reserved names for columns? 3. Are you really writing new code with the depreciated order by column position?

WITH Counts
AS
(
   SELECT UNIT, PURCHASE_DATE, METRIC1
       ,IIF(LAG(METRIC1, 1) OVER (PARTITION BY UNIT ORDER BY PURCHASE_DATE) < METRIC1, 1, 0)
          + IIF(LAG(METRIC1, 2) OVER (PARTITION BY UNIT ORDER BY PURCHASE_DATE) < METRIC1, 1, 0) AS LessThanCount
      ,ROW_NUMBER() OVER (PARTITION BY UNIT ORDER BY PURCHASE_DATE) AS rn
   FROM #TestTable
)
SELECT UNIT, PURCHASE_DATE, METRIC1, LessThanCount
FROM Counts
WHERE rn >= 3
ORDER BY UNIT, PURCHASE_DATE;

Thank you Ken

I wasn't clear in my post. Sorry about that. I gave the example of 3. This is arbitrary and can change. Would your code work if say the period is 100 and not 3?

Ken McKelvey SSCoach Points: 18890 More actions · Answer 6

You could generate the code as dynamic SQL to cope with different numbers but you would need to check carefully how efficient a value of 100 was.

Whatever you are trying to do, it might be worth hiring a consultant for a day or two. With 14 million rows, as well as Steve's good comments, things like columnstore indexes come into play.

This reply was modified 3 months, 2 weeks ago by Ken McKelvey.

How to speed up this query?

Cookies on SQLServerCentral