Question on Grouping -Working number of days

  • okfine08 - Saturday, February 2, 2019 4:48 AM

    -- GENERALLY RECURSIVE CTE IS NOT GOOD FOR PERFORMANCE

    Then why on this good Green Earth are you posting it as a solution!? :blink::Whistling:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • hi jeff 

    My reasons ..

    1)  For small amounts of data

    2)  something different

    3)  easy to understand .. how to  code ... 

    There are a lot of different kinds
    Of reasons in general
    Performance 
    Less code 
    Logical reads 
    Formatting 
    Reusability 
    Etc etc etc

    I am a very casual
    sql 
    ????

  • Okay folks.   Recursion is not necessarily a problem, although it certainly can be.  For smaller data sets, it can be quite helpful.   I took a look at this and my first thought was, "why not recursion?".     So here we go, and with a DATEDIFF so that date gaps in between rows are covered.

    CREATE  -- DROP  -- TRUNCATE
    TABLE  #SampleData
      (
      Row_Num   INTEGER   NOT NULL,
      Request_Date  DATE   NOT NULL,
      Code   VARCHAR(50)  NOT NULL
      );

    INSERT INTO #SampleData (Row_Num, Request_Date, Code)
    SELECT  N,
      DATEADD(DAY, N, '20190101'),
      CASE WHEN RIGHT(CAST(RAND(N) AS VARCHAR(50)), 1) LIKE '[13579]' THEN 'Urgent Needs' ELSE 'Awaiting' END
    FROM  (
      SELECT  ROW_NUMBER() OVER (ORDER BY name) AS N
      FROM  sys.objects
      ) AS TempTally
    WHERE  N <= 20;

    SELECT  *
    FROM  #SampleData
    ORDER BY Row_Num;

    WITH RECURSIVE_CTE AS (

        SELECT
            SD.Row_Num,
            SD.Request_Date,
            SD.Code,
            0 AS Num_Days
        FROM #SampleData AS SD
        WHERE SD.Row_Num = 1
        UNION ALL
        SELECT
            SD.Row_Num,
            SD.Request_Date,
            SD.Code,
            CASE
                WHEN RC.Code = SD.Code THEN RC.Num_Days + DATEDIFF(day, RC.Request_Date, SD.Request_Date)
                ELSE 0
            END AS Num_Days
        FROM RECURSIVE_CTE AS RC
        INNER JOIN #SampleData AS SD
            ON RC.Row_Num + 1 = SD.Row_Num
    )
    SELECT *
    FROM RECURSIVE_CTE
    ORDER BY Row_Num;

    DROP TABLE #SampleData;

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • sgmunson - Wednesday, February 13, 2019 7:14 AM

    Okay folks.   Recursion is not necessarily a problem, although it certainly can be.  For smaller data sets, it can be quite helpful.   I took a look at this and my first thought was, "why not recursion?".    

    The answer to that question is because, despite the smaller number of rows, it's still a performance problem for the smaller number of rows when an "incremental" rCTE that counts by "1" is used.  It's all a part of the "Death by a thousand cuts" that occurs in every database and makes it nearly impossible to fix because management doesn't "get it" either. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • okfine08 - Monday, February 4, 2019 11:56 PM

    hi jeff 

    My reasons ..

    1)  For small amounts of data

    2)  something different

    3)  easy to understand .. how to  code ... 

    There are a lot of different kinds
    Of reasons in general
    Performance 
    Less code 
    Logical reads 
    Formatting 
    Reusability 
    Etc etc etc

    I am a very casual
    sql 
    😀😀

    Please see my reply above.  There is no justification for slow code no matter the number of rows.  As for performance and logic reads, please see the following article and pay particular attention to where I talk about the {Red "sky-rocket"} which proves that the performance and logical reads of incremental rCTEs sucks so bad even for small row sets that it has its own center of gravity.  Even a properly written WHILE loop will beat it in performance and logical reads.
    Hidden RBAR: Counting with Recursive CTE's[/url]
    Help stop the "Death by a Thousand Cuts".  Stop promoting the use of "incremental rCTEs".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Wednesday, February 13, 2019 7:36 AM

    sgmunson - Wednesday, February 13, 2019 7:14 AM

    Okay folks.   Recursion is not necessarily a problem, although it certainly can be.  For smaller data sets, it can be quite helpful.   I took a look at this and my first thought was, "why not recursion?".    

    The answer to that question is because, despite the smaller number of rows, it's still a performance problem for the smaller number of rows when an "incremental" rCTE that counts by "1" is used.  It's all a part of the "Death by a thousand cuts" that occurs in every database and makes it nearly impossible to fix because management doesn't "get it" either. 😉

    I get the concept, but I'm having a hard time seeing how the entire query; including the randomized insert; running in 0.02000120000 seconds is going to become a problem.   Having an insert that could form the basis for a good test, I've now tested this on 3,000,000 rows.   With returning the results to SSMS, the query ran in 2 minutes, 43 seconds.   Of course, it actually had to deliver 6 million rows because there are two selects, so I commented out the first one and it ran in 2 minutes, 30 seconds.   I then decided to run the data into a temp table to remove all the returning of the rows, and it ran in 2 minutes, 20 seconds.   The initial INSERT accounts for 1 minute, 27 seconds of that time.   At first I tried to test with 20,000,000 rows, but quickly found that there are only just over 3 million dates to work with between 1753-01-01 and 9999-12-31.   FYI.

    I'm pretty sure that applying this to considerably more data would likely be a bad idea, but for the small sample size, 0.02 seconds shouldn't be a problem.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • sgmunson - Wednesday, February 13, 2019 8:32 AM

    Jeff Moden - Wednesday, February 13, 2019 7:36 AM

    sgmunson - Wednesday, February 13, 2019 7:14 AM

    Okay folks.   Recursion is not necessarily a problem, although it certainly can be.  For smaller data sets, it can be quite helpful.   I took a look at this and my first thought was, "why not recursion?".    

    The answer to that question is because, despite the smaller number of rows, it's still a performance problem for the smaller number of rows when an "incremental" rCTE that counts by "1" is used.  It's all a part of the "Death by a thousand cuts" that occurs in every database and makes it nearly impossible to fix because management doesn't "get it" either. 😉

    I get the concept, but I'm having a hard time seeing how the entire query; including the randomized insert; running in 0.02000120000 seconds is going to become a problem.   Having an insert that could form the basis for a good test, I've now tested this on 3,000,000 rows.   With returning the results to SSMS, the query ran in 2 minutes, 43 seconds.   Of course, it actually had to deliver 6 million rows because there are two selects, so I commented out the first one and it ran in 2 minutes, 30 seconds.   I then decided to run the data into a temp table to remove all the returning of the rows, and it ran in 2 minutes, 20 seconds.   The initial INSERT accounts for 1 minute, 27 seconds of that time.   At first I tried to test with 20,000,000 rows, but quickly found that there are only just over 3 million dates to work with between 1753-01-01 and 9999-12-31.   FYI.

    I'm pretty sure that applying this to considerably more data would likely be a bad idea, but for the small sample size, 0.02 seconds shouldn't be a problem.

    That's what I'm talking about.  People are happy with .02 seconds.  For such a small row set, it should be virtually immeasurable.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply