December 13, 2024 at 5:57 am
Hi everyone
I have a query that is taking a real long time. It takes close to 30 minutes to process about 14 million records. Is there any way to speed it up?
Query:
DROP TABLE IF EXISTS #TestTable
DROP TABLE IF EXISTS #TEMP1
CREATE TABLE #TestTable
(
[UNIT] [nvarchar](10) NOT NULL,
[PURCHASE_DATE] [date] NOT NULL,
[METRIC1] [float] NULL,
[METRIC2] [float] NULL
)
INSERT INTO #TestTable
VALUES ('ABC','2024-12-12',435.090,210.33),
('ABC','2024-12-09',213.390,4013.4901),
('DEF','2024-12-12',34,99),
('ABC','2024-12-10',11,3.008),
('DEF','2024-12-11',57.903,9),
('DEF','2024-12-10',440.023,62),
('ABC','2024-12-08',33.924,80.02),
('DEF','2024-12-08',43.40,12),
('DEF','2024-12-07',11.30,48.90),
('DEF','2024-12-06',21.984,16.33)
SELECT T1.UNIT,
T1.PURCHASE_DATE,
T1.METRIC1,
COUNT,
ROW_NUMBER() OVER (PARTITION BY T1.UNIT ORDER BY T1.PURCHASE_DATE) AS ROW_NUM
INTO #TEMP1
FROM #TestTable AS T1
CROSS APPLY (SELECT COUNT(*) COUNT
FROM (SELECT NULL X) X
CROSS APPLY (SELECT TOP(3) *
FROM #TestTable AS T2
WHERE T2.PURCHASE_DATE <= T1.PURCHASE_DATE AND T2.UNIT = T1.UNIT
ORDER BY T2.PURCHASE_DATE DESC) AS T3
WHERE T3.METRIC1 < T1.METRIC1 AND T3.UNIT = T1.UNIT
) AS T4
SELECT*
FROM#TestTable
-- expected outcome
SELECTT5.UNIT,
T5.PURCHASE_DATE,
T5.METRIC1,
T5.COUNT
FROM#TEMP1 AS T5
WHERET5.ROW_NUM >= 3
ORDER BY 1,2
Description:
The above code shows the expected outcome. Here is a description on what is happening...
Step 1 - partition the data by UNIT
Step 2 - order the partitioned data by PURCHASE_DATE in descending order
Step 3 - Look at the current record plus the 2 earlier records for a total of 3 records in consideration. Let's look at a few examples...
ABC on 2024-12-12. The records under consideration are 2024-12-09, 2024-12-10, 2024-12-12. How many records are less than 435.09? There are 2 such records. Count = 2
ABC on 2024-12-09. There are only 2 records so skip processing.
Thank you
December 14, 2024 at 6:10 am
Thanks for posting your issue and hopefully someone will answer soon.
This is an automated bump to increase visibility of your question.
December 20, 2024 at 6:21 am
Hi
Is anyone able to help me with this?
Thank you
December 20, 2024 at 12:58 pm
1. I hope you are not using floats for prices!
2. Do you really use reserved names for columns?
3. Are you really writing new code with the depreciated order by column position?
WITH Counts
AS
(
SELECT UNIT, PURCHASE_DATE, METRIC1
,IIF(LAG(METRIC1, 1) OVER (PARTITION BY UNIT ORDER BY PURCHASE_DATE) < METRIC1, 1, 0)
+ IIF(LAG(METRIC1, 2) OVER (PARTITION BY UNIT ORDER BY PURCHASE_DATE) < METRIC1, 1, 0) AS LessThanCount
,ROW_NUMBER() OVER (PARTITION BY UNIT ORDER BY PURCHASE_DATE) AS rn
FROM #TestTable
)
SELECT UNIT, PURCHASE_DATE, METRIC1, LessThanCount
FROM Counts
WHERE rn >= 3
ORDER BY UNIT, PURCHASE_DATE;
December 20, 2024 at 5:05 pm
That's a nice looking query Ken. The OP's query used "SELECT ... INTO #TEMP1" which could be a bottleneck to a larger process imo. Of course, it's not possible to know whether we're looking at a non-representative simplification. The typical reasons to allocate a temp table are: a) to re-use an intermediate row set, and/or b) the number of intermediate rows is large enough for an index to be beneficial. In this case, "processing" about 14 million rows seems likely to return quite a few rows, enough to benefit from (an) index(es). If so, then ditching "SELECT INTO" and pre-defining the temp table DDL could speed things up imo. Tables created using SELECT INTO do not inherit any indexes or optimizations from the queried tables and adding indexes after INSERT can be quite a bit slower
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
December 20, 2024 at 7:01 pm
1. I hope you are not using floats for prices! 2. Do you really use reserved names for columns? 3. Are you really writing new code with the depreciated order by column position?
WITH Counts
AS
(
SELECT UNIT, PURCHASE_DATE, METRIC1
,IIF(LAG(METRIC1, 1) OVER (PARTITION BY UNIT ORDER BY PURCHASE_DATE) < METRIC1, 1, 0)
+ IIF(LAG(METRIC1, 2) OVER (PARTITION BY UNIT ORDER BY PURCHASE_DATE) < METRIC1, 1, 0) AS LessThanCount
,ROW_NUMBER() OVER (PARTITION BY UNIT ORDER BY PURCHASE_DATE) AS rn
FROM #TestTable
)
SELECT UNIT, PURCHASE_DATE, METRIC1, LessThanCount
FROM Counts
WHERE rn >= 3
ORDER BY UNIT, PURCHASE_DATE;
Thank you Ken
I wasn't clear in my post. Sorry about that. I gave the example of 3. This is arbitrary and can change. Would your code work if say the period is 100 and not 3?
December 20, 2024 at 9:18 pm
You could generate the code as dynamic SQL to cope with different numbers but you would need to check carefully how efficient a value of 100 was.
Whatever you are trying to do, it might be worth hiring a consultant for a day or two. With 14 million rows, as well as Steve's good comments, things like columnstore indexes come into play.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply