Question on Grouping -Working number of days

Question

Post reply

Question on Grouping -Working number of days

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 1

okfine08 - Saturday, February 2, 2019 4:48 AM

-- GENERALLY RECURSIVE CTE IS NOT GOOD FOR PERFORMANCE

Then why on this good Green Earth are you posting it as a solution!? :blink::Whistling:

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

okfine08 SSC Rookie Points: 49 More actions · Answer 2

hi jeff

My reasons ..

1) For small amounts of data

2) something different

3) easy to understand .. how to code ...

There are a lot of different kinds
Of reasons in general
Performance
Less code
Logical reads
Formatting
Reusability
Etc etc etc

I am a very casual
sql
????

sgmunson SSC Guru Points: 110640 More actions · Answer 3

Okay folks. Recursion is not necessarily a problem, although it certainly can be. For smaller data sets, it can be quite helpful. I took a look at this and my first thought was, "why not recursion?". So here we go, and with a DATEDIFF so that date gaps in between rows are covered.

CREATE -- DROP -- TRUNCATE TABLE #SampleData ( Row_Num INTEGER NOT NULL, Request_Date DATE NOT NULL, Code VARCHAR(50) NOT NULL );

INSERT INTO #SampleData (Row_Num, Request_Date, Code) SELECT N, DATEADD(DAY, N, '20190101'), CASE WHEN RIGHT(CAST(RAND(N) AS VARCHAR(50)), 1) LIKE '[13579]' THEN 'Urgent Needs' ELSE 'Awaiting' END FROM ( SELECT ROW_NUMBER() OVER (ORDER BY name) AS N FROM sys.objects ) AS TempTally WHERE N <= 20;

SELECT * FROM #SampleData ORDER BY Row_Num;

WITH RECURSIVE_CTE AS (

SELECT SD.Row_Num, SD.Request_Date, SD.Code, 0 AS Num_Days FROM #SampleData AS SD WHERE SD.Row_Num = 1 UNION ALL SELECT SD.Row_Num, SD.Request_Date, SD.Code, CASE WHEN RC.Code = SD.Code THEN RC.Num_Days + DATEDIFF(day, RC.Request_Date, SD.Request_Date) ELSE 0 END AS Num_Days FROM RECURSIVE_CTE AS RC INNER JOIN #SampleData AS SD ON RC.Row_Num + 1 = SD.Row_Num ) SELECT * FROM RECURSIVE_CTE ORDER BY Row_Num;

DROP TABLE #SampleData;

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 4

sgmunson - Wednesday, February 13, 2019 7:14 AM

Okay folks. Recursion is not necessarily a problem, although it certainly can be. For smaller data sets, it can be quite helpful. I took a look at this and my first thought was, "why not recursion?".

The answer to that question is because, despite the smaller number of rows, it's still a performance problem for the smaller number of rows when an "incremental" rCTE that counts by "1" is used. It's all a part of the "Death by a thousand cuts" that occurs in every database and makes it nearly impossible to fix because management doesn't "get it" either. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 5

okfine08 - Monday, February 4, 2019 11:56 PM
hi jeff
My reasons ..
1) For small amounts of data
2) something different
3) easy to understand .. how to code ...
There are a lot of different kinds
Of reasons in general
Performance
Less code
Logical reads
Formatting
Reusability
Etc etc etc
I am a very casual
sql
ðŸ˜€ðŸ˜€

Please see my reply above. There is no justification for slow code no matter the number of rows. As for performance and logic reads, please see the following article and pay particular attention to where I talk about the {Red "sky-rocket"} which proves that the performance and logical reads of incremental rCTEs sucks so bad even for small row sets that it has its own center of gravity. Even a properly written WHILE loop will beat it in performance and logical reads.
Hidden RBAR: Counting with Recursive CTE's[/url]
Help stop the "Death by a Thousand Cuts". Stop promoting the use of "incremental rCTEs".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

sgmunson SSC Guru Points: 110640 More actions · Answer 6

Jeff Moden - Wednesday, February 13, 2019 7:36 AM

sgmunson - Wednesday, February 13, 2019 7:14 AM
Okay folks. Recursion is not necessarily a problem, although it certainly can be. For smaller data sets, it can be quite helpful. I took a look at this and my first thought was, "why not recursion?".

The answer to that question is because, despite the smaller number of rows, it's still a performance problem for the smaller number of rows when an "incremental" rCTE that counts by "1" is used. It's all a part of the "Death by a thousand cuts" that occurs in every database and makes it nearly impossible to fix because management doesn't "get it" either. 😉

I get the concept, but I'm having a hard time seeing how the entire query; including the randomized insert; running in 0.02000120000 seconds is going to become a problem. Having an insert that could form the basis for a good test, I've now tested this on 3,000,000 rows. With returning the results to SSMS, the query ran in 2 minutes, 43 seconds. Of course, it actually had to deliver 6 million rows because there are two selects, so I commented out the first one and it ran in 2 minutes, 30 seconds. I then decided to run the data into a temp table to remove all the returning of the rows, and it ran in 2 minutes, 20 seconds. The initial INSERT accounts for 1 minute, 27 seconds of that time. At first I tried to test with 20,000,000 rows, but quickly found that there are only just over 3 million dates to work with between 1753-01-01 and 9999-12-31. FYI.

I'm pretty sure that applying this to considerably more data would likely be a bad idea, but for the small sample size, 0.02 seconds shouldn't be a problem.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 7

sgmunson - Wednesday, February 13, 2019 8:32 AM

Jeff Moden - Wednesday, February 13, 2019 7:36 AM
sgmunson - Wednesday, February 13, 2019 7:14 AM
Okay folks. Recursion is not necessarily a problem, although it certainly can be. For smaller data sets, it can be quite helpful. I took a look at this and my first thought was, "why not recursion?".
The answer to that question is because, despite the smaller number of rows, it's still a performance problem for the smaller number of rows when an "incremental" rCTE that counts by "1" is used. It's all a part of the "Death by a thousand cuts" that occurs in every database and makes it nearly impossible to fix because management doesn't "get it" either. 😉

I get the concept, but I'm having a hard time seeing how the entire query; including the randomized insert; running in 0.02000120000 seconds is going to become a problem. Having an insert that could form the basis for a good test, I've now tested this on 3,000,000 rows. With returning the results to SSMS, the query ran in 2 minutes, 43 seconds. Of course, it actually had to deliver 6 million rows because there are two selects, so I commented out the first one and it ran in 2 minutes, 30 seconds. I then decided to run the data into a temp table to remove all the returning of the rows, and it ran in 2 minutes, 20 seconds. The initial INSERT accounts for 1 minute, 27 seconds of that time. At first I tried to test with 20,000,000 rows, but quickly found that there are only just over 3 million dates to work with between 1753-01-01 and 9999-12-31. FYI.

I'm pretty sure that applying this to considerably more data would likely be a bad idea, but for the small sample size, 0.02 seconds shouldn't be a problem.

That's what I'm talking about. People are happy with .02 seconds. For such a small row set, it should be virtually immeasurable.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)