Question on Grouping -Working number of days

  • Hi All,

    I am trying to work out the number of days for each (CODE) from the Code (REQUEST DATE) till next Code appear and update the ( No.Days) .it was working fine using LAG/LEAD function but requirement changed so if same codes coming repeatedly then we need number of days from first appearance i.e 'Urgent Needs' appear first on (14/08/2018) until  (24/01/2019) then calculate all days and if recent record still active then count days up to today's date instead of 24/01/2019. Hope you would be able to help.

    

    Regards
    Nadeem

  • Have you looked at first_Value? You can partition this and get the first lead date for a code item.

  • Steve Jones - SSC Editor - Tuesday, January 29, 2019 8:48 AM

    Have you looked at first_Value? You can partition this and get the first lead date for a code item.

    Thanks Steve; I will check it out . Regards

  • They say that a picture is worth a thousand words, but not in the case of data.  Actual code is worth a thousand pictures.  Please provide sample data and expected results as outlined in the first link in my signature.  The goal is to allow someone to copy the code and paste it into SSMS and hit "Run" and see your sample data.

    DO Post your code directly in your reply.  DO NOT include it as an attached file.  People are hesitant to open any file that may contain malware no matter how remote the possibility.

    DO use temp tables or table variables.  This makes it much easier to clean up.

    DO use table value constructors.  The instructions in my signature were written before table value constructors were implemented, so the instructions use individual INSERT statements, but it's much cleaner to use table value constructors.  DO NOT use a SELECT statement to populate the tables.  Unless you want to open up your tables to the entire Internet and open yourself up to a data breach, we won't be able to execute SELECT statements that read your tables.

    DO provide expected results.  DO make sure that the results are consistent with the sample data.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • nadeem161 - Tuesday, January 29, 2019 9:10 AM

    Steve Jones - SSC Editor - Tuesday, January 29, 2019 8:48 AM

    Have you looked at first_Value? You can partition this and get the first lead date for a code item.

    Thanks Steve; I will check it out . Regards

    FIRST_VALUE will work if you can ignore gaps.  I suspect that you will need to account for gaps.  If so, we would still need sample data.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Thanks much Drew, Valid points and It looks FIRST_VALUE  will do the trick with some alteration.

    Regards
    Nadeem

  • To emphasize the problem that Drew brings up...

    First_Value can certainly used to solve the given problem but it won't correctly solve the problem that will occur if there's a gap in the dates for any given group of the "Code" column.  You should probably count on that problem happening. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)


  • Hi ,

    First of all thanks for your help.

    I have tried to do a code snippet. So to elaborate the issue again I want to work out the NODAYS based on CODE and REQUEST_DATE . For each time if a specific CODE logged then I want to  count NODAYS based start REQUEST_DATE of that occurrence. Below is dummy code example and worked out NODAYS for illustration.
     

    --===== If the output test table already exists, drop it

    IF OBJECT_ID('TempDB..#LUCOUTPUT','U') IS NOT NULL

    DROP TABLE #LUCOUTPUT

    --===== Create the test table with required columns

    CREATE TABLE #LUCOUTPUT

    (

    ROW_NUM INT,

    ID INT,

    REQUEST_DATE DATETIME,

    CODE varchar(50),

    NODAYS int

    )

    INSERT INTO #LUCOUTPUT

    (ROW_NUM,ID, REQUEST_DATE, CODE,NODAYS)

    SELECT '1','1','Oct 17 2007 12:00AM','Urgent','0' UNION ALL

    SELECT '2','1','Oct 18 2007 12:00AM','Urgent','1' UNION ALL

    SELECT '3','1','Oct 19 2007 12:00AM','Delayed','0' UNION ALL

    SELECT '4','1','Oct 20 2007 12:00AM','Delayed','1' UNION ALL

    SELECT '5','1','Oct 21 2007 12:00AM','Delayed','2' UNION ALL

    SELECT '6','1','Oct 25 2007 12:00AM','Delayed','6' UNION ALL

    SELECT '7','1','Oct 26 2007 12:00AM','Requested','0' UNION ALL

    SELECT '8','1','Oct 27 2007 12:00AM','Requested','1' UNION ALL

    SELECT '9','1','Oct 28 2007 12:00AM','Delayed','0' UNION ALL

    SELECT '10','1','Oct 30 2007 12:00AM','Urgent','2'

    --===== Select output data from output table

    Select ROW_NUM,ID, REQUEST_DATE, CODE,NODAYS from #LUCOUTPUT

  • I took a stab at this one.  I needed sample data, so I generated it using the following code.  Because RAND is used with a predictable seed, the values will come out the same every time, meaning the sample data should be predictable.
    CREATE    -- DROP    -- TRUNCATE
    TABLE    #SampleData
        (
        Row_Num        INTEGER        NOT NULL,
        Request_Date    DATE        NOT NULL,
        Code        VARCHAR(50)    NOT NULL
        );

    INSERT INTO #SampleData (Row_Num, Request_Date, Code)
    SELECT    N,
        DATEADD(DAY, N, '20190101'),
        CASE WHEN RIGHT(CAST(RAND(N) AS VARCHAR(50)), 1) LIKE '[13579]' THEN 'Urgent Needs' ELSE 'Awaiting' END
    FROM    (
        SELECT    ROW_NUMBER() OVER (ORDER BY name) AS N
        FROM    sys.objects
        ) AS TempTally
    WHERE    N <= 20;

    SELECT    *
    FROM    #SampleData
    ORDER BY Row_Num

    Row_Num Request_Date Code
    ----------- ------------ --------------------------------------------------
        1 2019-01-02 Awaiting
        2 2019-01-03 Urgent Needs
        3 2019-01-04 Urgent Needs
        4 2019-01-05 Awaiting
        5 2019-01-06 Urgent Needs
        6 2019-01-07 Urgent Needs
        7 2019-01-08 Awaiting
        8 2019-01-09 Awaiting
        9 2019-01-10 Urgent Needs
       10 2019-01-11 Awaiting
       11 2019-01-12 Awaiting
       12 2019-01-13 Urgent Needs
       13 2019-01-14 Awaiting
       14 2019-01-15 Awaiting
       15 2019-01-16 Urgent Needs
       16 2019-01-17 Urgent Needs
       17 2019-01-18 Urgent Needs
       18 2019-01-19 Urgent Needs
       19 2019-01-20 Urgent Needs
       20 2019-01-21 Awaiting

    (20 rows affected)

    This is my attempt.
    WITH    BaseData
    AS    (
        SELECT    Row_Num, Request_Date, Code,
            CASE WHEN LAG(Code, 1) OVER (ORDER BY Request_Date) = Code THEN 'N' ELSE 'Y' END AS First_In_Block
        FROM    #SampleData
        )
    SELECT    Row_Num,
        Request_Date,
        Code,
        CASE WHEN LEAD(Request_Date, 1) OVER (ORDER BY Request_Date) IS NULL
            AND LAG(Code, 1) OVER (ORDER BY Request_Date) = Code THEN First_Date ELSE NULL END AS [No. Days]
    FROM    (
        SELECT    b.Row_Num,
            b.Request_Date,
            b.Code,
            MAX(x.Request_Date) AS First_Date
        FROM        BaseData b
            LEFT JOIN BaseData x
            ON        x.Request_Date < b.Request_Date
                AND    x.Code = b.Code
                AND    x.First_In_Block = 'N'
        WHERE    b.Row_Num <= 19 /* TESTING: Change selection to change which rows included */
        GROUP BY b.Row_Num,
            b.Request_Date,
            b.Code
        ) q
    ORDER BY Row_Num,
        Request_Date;

    In my sample data, row 20 is "Awaiting", so I limited the data to rows 1-19 so that "Urgent Needs" has the final five rows.  When this happens, "2019-01-16" shows in the "No. Days" column.  When all 20 rows are included, the final rows is "Awaiting" but the prior row is "Urgent Needs", so no value is shown in the "No. Days" column.  Hopefully I have understood the issue correctly.  If not, please clarify what is needed, and I'll try again.

  • fahey.jonathan - Thursday, January 31, 2019 12:14 PM

    I took a stab at this one.  I needed sample data, so I generated it using the following code.  Because RAND is used with a predictable seed, the values will come out the same every time, meaning the sample data should be predictable.
    CREATE    -- DROP    -- TRUNCATE
    TABLE    #SampleData
        (
        Row_Num        INTEGER        NOT NULL,
        Request_Date    DATE        NOT NULL,
        Code        VARCHAR(50)    NOT NULL
        );

    INSERT INTO #SampleData (Row_Num, Request_Date, Code)
    SELECT    N,
        DATEADD(DAY, N, '20190101'),
        CASE WHEN RIGHT(CAST(RAND(N) AS VARCHAR(50)), 1) LIKE '[13579]' THEN 'Urgent Needs' ELSE 'Awaiting' END
    FROM    (
        SELECT    ROW_NUMBER() OVER (ORDER BY name) AS N
        FROM    sys.objects
        ) AS TempTally
    WHERE    N <= 20;

    SELECT    *
    FROM    #SampleData
    ORDER BY Row_Num

    Row_Num Request_Date Code
    ----------- ------------ --------------------------------------------------
        1 2019-01-02 Awaiting
        2 2019-01-03 Urgent Needs
        3 2019-01-04 Urgent Needs
        4 2019-01-05 Awaiting
        5 2019-01-06 Urgent Needs
        6 2019-01-07 Urgent Needs
        7 2019-01-08 Awaiting
        8 2019-01-09 Awaiting
        9 2019-01-10 Urgent Needs
       10 2019-01-11 Awaiting
       11 2019-01-12 Awaiting
       12 2019-01-13 Urgent Needs
       13 2019-01-14 Awaiting
       14 2019-01-15 Awaiting
       15 2019-01-16 Urgent Needs
       16 2019-01-17 Urgent Needs
       17 2019-01-18 Urgent Needs
       18 2019-01-19 Urgent Needs
       19 2019-01-20 Urgent Needs
       20 2019-01-21 Awaiting

    (20 rows affected)

    This is my attempt.
    WITH    BaseData
    AS    (
        SELECT    Row_Num, Request_Date, Code,
            CASE WHEN LAG(Code, 1) OVER (ORDER BY Request_Date) = Code THEN 'N' ELSE 'Y' END AS First_In_Block
        FROM    #SampleData
        )
    SELECT    Row_Num,
        Request_Date,
        Code,
        CASE WHEN LEAD(Request_Date, 1) OVER (ORDER BY Request_Date) IS NULL
            AND LAG(Code, 1) OVER (ORDER BY Request_Date) = Code THEN First_Date ELSE NULL END AS [No. Days]
    FROM    (
        SELECT    b.Row_Num,
            b.Request_Date,
            b.Code,
            MAX(x.Request_Date) AS First_Date
        FROM        BaseData b
            LEFT JOIN BaseData x
            ON        x.Request_Date < b.Request_Date
                AND    x.Code = b.Code
                AND    x.First_In_Block = 'N'
        WHERE    b.Row_Num <= 19 /* TESTING: Change selection to change which rows included */
        GROUP BY b.Row_Num,
            b.Request_Date,
            b.Code
        ) q
    ORDER BY Row_Num,
        Request_Date;

    In my sample data, row 20 is "Awaiting", so I limited the data to rows 1-19 so that "Urgent Needs" has the final five rows.  When this happens, "2019-01-16" shows in the "No. Days" column.  When all 20 rows are included, the final rows is "Awaiting" but the prior row is "Urgent Needs", so no value is shown in the "No. Days" column.  Hopefully I have understood the issue correctly.  If not, please clarify what is needed, and I'll try again.

    Hi Thanks for your reply; I am looking to workout Number of days each time a code logged . If a code logged more than once in contiguous rows then I want to calculate days based on the first date its logged and last date its logged before the new code start. So from above example of your expected result will be like below


    Row_Num  Request_Date Code                No. Days
    ----------- ------------ -------------------------------------------------- ----------
    1    2019-01-02 Awaiting               0
    2    2019-01-03 Urgent Needs             0
    3    2019-01-04 Urgent Needs             1
    4    2019-01-05 Awaiting               0
    5    2019-01-06 Urgent Needs             0
    6    2019-01-07 Urgent Needs             1
    7    2019-01-08 Awaiting               0
    8    2019-01-09 Awaiting               1
    9    2019-01-10 Urgent Needs             0
    10    2019-01-11 Awaiting               0
    11    2019-01-12 Awaiting               1
    12    2019-01-13 Urgent Needs             0
    13    2019-01-14 Awaiting               0
    14    2019-01-15 Awaiting               1
    15    2019-01-16 Urgent Needs             0
    16    2019-01-17 Urgent Needs             1
    17    2019-01-18 Urgent Needs             2
    18    2019-01-19 Urgent Needs             3
    19    2019-01-20 Urgent Needs             4
    20    2019-01-21 Awaiting               0

    Thanks much

  • OK, attempt #2.  The CTE tells which rows are first and last in their block, the "Blocks" query puts those values into ranges, which can then be joined to the main data.  Because we now know the start date of each block, we can calculate the number of days difference between the block start date and the "current" date.  The ISNULL is needed on the LEAD and LAG statements to ensure that the first row returns IsFirst = 1 and the last row returns IsLast = 1.
    WITH    BaseData
    AS    (
        SELECT    Request_Date,
            Code,
            CASE WHEN ISNULL(LAG(Code, 1) OVER (ORDER BY Request_Date), 'Missing') <> Code THEN 1 ELSE 0 END AS IsFirst,
            CASE WHEN ISNULL(LEAD(Code, 1) OVER (ORDER BY Request_Date), 'Missing') <> Code THEN 1 ELSE 0 END AS IsLast
        FROM    #SampleData
        )
    SELECT    s.Row_Num,
        s.Request_Date,
        s.Code,
        DATEDIFF(DAY, b.Block_Start_Date, s.Request_Date) AS [No. Days]
        /* TESTING ONLY */ --, b.Block_Start_Date, b.Block_End_Date
    FROM        #SampleData s
        JOIN    (
            SELECT    f.Request_Date        AS Block_Start_Date,
                f.Code,
                MIN(l.Request_Date)    AS Block_End_Date
            FROM        (
                    SELECT    Code, Request_Date
                    FROM    BaseData
                    WHERE    IsFirst <> 0
                    ) f
                JOIN    (
                    SELECT    Code, Request_Date
                    FROM    BaseData
                    WHERE    IsLast <> 0
                    ) l
                ON        l.Code = f.Code
                    AND    l.Request_Date >= f.Request_Date
            GROUP BY f.Request_Date,
                f.Code
            ) b /* Blocks */
        ON    s.Request_Date BETWEEN b.Block_Start_Date AND b.Block_End_Date
    ORDER BY s.Row_Num;

  • fahey.jonathan - Friday, February 1, 2019 7:42 AM

    OK, attempt #2.  The CTE tells which rows are first and last in their block, the "Blocks" query puts those values into ranges, which can then be joined to the main data.  Because we now know the start date of each block, we can calculate the number of days difference between the block start date and the "current" date.  The ISNULL is needed on the LEAD and LAG statements to ensure that the first row returns IsFirst = 1 and the last row returns IsLast = 1.
    WITH    BaseData
    AS    (
        SELECT    Request_Date,
            Code,
            CASE WHEN ISNULL(LAG(Code, 1) OVER (ORDER BY Request_Date), 'Missing') <> Code THEN 1 ELSE 0 END AS IsFirst,
            CASE WHEN ISNULL(LEAD(Code, 1) OVER (ORDER BY Request_Date), 'Missing') <> Code THEN 1 ELSE 0 END AS IsLast
        FROM    #SampleData
        )
    SELECT    s.Row_Num,
        s.Request_Date,
        s.Code,
        DATEDIFF(DAY, b.Block_Start_Date, s.Request_Date) AS [No. Days]
        /* TESTING ONLY */ --, b.Block_Start_Date, b.Block_End_Date
    FROM        #SampleData s
        JOIN    (
            SELECT    f.Request_Date        AS Block_Start_Date,
                f.Code,
                MIN(l.Request_Date)    AS Block_End_Date
            FROM        (
                    SELECT    Code, Request_Date
                    FROM    BaseData
                    WHERE    IsFirst <> 0
                    ) f
                JOIN    (
                    SELECT    Code, Request_Date
                    FROM    BaseData
                    WHERE    IsLast <> 0
                    ) l
                ON        l.Code = f.Code
                    AND    l.Request_Date >= f.Request_Date
            GROUP BY f.Request_Date,
                f.Code
            ) b /* Blocks */
        ON    s.Request_Date BETWEEN b.Block_Start_Date AND b.Block_End_Date
    ORDER BY s.Row_Num;

    Excellent. That works like a charm. Thank much

  • fahey.jonathan - Friday, February 1, 2019 7:42 AM

    OK, attempt #2.  The CTE tells which rows are first and last in their block, the "Blocks" query puts those values into ranges, which can then be joined to the main data.  Because we now know the start date of each block, we can calculate the number of days difference between the block start date and the "current" date.  The ISNULL is needed on the LEAD and LAG statements to ensure that the first row returns IsFirst = 1 and the last row returns IsLast = 1.
    WITH    BaseData
    AS    (
        SELECT    Request_Date,
            Code,
            CASE WHEN ISNULL(LAG(Code, 1) OVER (ORDER BY Request_Date), 'Missing') <> Code THEN 1 ELSE 0 END AS IsFirst,
            CASE WHEN ISNULL(LEAD(Code, 1) OVER (ORDER BY Request_Date), 'Missing') <> Code THEN 1 ELSE 0 END AS IsLast
        FROM    #SampleData
        )
    SELECT    s.Row_Num,
        s.Request_Date,
        s.Code,
        DATEDIFF(DAY, b.Block_Start_Date, s.Request_Date) AS [No. Days]
        /* TESTING ONLY */ --, b.Block_Start_Date, b.Block_End_Date
    FROM        #SampleData s
        JOIN    (
            SELECT    f.Request_Date        AS Block_Start_Date,
                f.Code,
                MIN(l.Request_Date)    AS Block_End_Date
            FROM        (
                    SELECT    Code, Request_Date
                    FROM    BaseData
                    WHERE    IsFirst <> 0
                    ) f
                JOIN    (
                    SELECT    Code, Request_Date
                    FROM    BaseData
                    WHERE    IsLast <> 0
                    ) l
                ON        l.Code = f.Code
                    AND    l.Request_Date >= f.Request_Date
            GROUP BY f.Request_Date,
                f.Code
            ) b /* Blocks */
        ON    s.Request_Date BETWEEN b.Block_Start_Date AND b.Block_End_Date
    ORDER BY s.Row_Num;

    First, you do not need the ISNULL if you use the third (optional) parameter for LEAD/LAG, e.g., LAG(Code, 1, 'Missing') OVER(....).

    Second, there is a simpler approach.  This only requires one scan of the table as opposed to the three that yours requires.

    WITH Code_Groups AS
    (
        SELECT *, ROW_NUMBER() OVER(ORDER BY Request_Date) - ROW_NUMBER() OVER(PARTITION BY Code ORDER BY Request_Date) AS code_grp
        FROM #SampleData
    )
    SELECT Row_Num, Request_Date, Code, DATEDIFF(DAY, MIN(Request_Date) OVER(PARTITION BY Code, code_grp), Request_Date) AS Num_Days
    FROM Code_Groups

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Friday, February 1, 2019 9:32 AM

    First, you do not need the ISNULL if you use the third (optional) parameter for LEAD/LAG, e.g., LAG(Code, 1, 'Missing') OVER(....).

    Second, there is a simpler approach.  This only requires one scan of the table as opposed to the three that yours requires.

    WITH Code_Groups AS
    (
        SELECT *, ROW_NUMBER() OVER(ORDER BY Request_Date) - ROW_NUMBER() OVER(PARTITION BY Code ORDER BY Request_Date) AS code_grp
        FROM #SampleData
    )
    SELECT Row_Num, Request_Date, Code, DATEDIFF(DAY, MIN(Request_Date) OVER(PARTITION BY Code, code_grp), Request_Date) AS Num_Days
    FROM Code_Groups

    Drew

    See, this is why I like the forums.  I learn new things all the time.  Thanks for the lesson.

  • HI 

    ANOTHER WAY OF RESOLVING THIS

    USING RECURSIVE CTE
    -- GENERALLY RECURSIVE CTE IS NOT GOOD FOR PERFORMANCE

    IF IT HELPS GREAT
    🙂
    🙂

    DROP TABLE #SampleData
    go

    CREATE TABLE  #SampleData
      (
      Row_Num   INTEGER   NOT NULL,
      Request_Date  DATE   NOT NULL,
      Code   VARCHAR(50)  NOT NULL
      );

    INSERT INTO #SampleData (Row_Num, Request_Date, Code)
    SELECT  N,
      DATEADD(DAY, N, '20190101'),
      CASE WHEN RIGHT(CAST(RAND(N) AS VARCHAR(50)), 1) LIKE '[13579]' THEN 'Urgent Needs' ELSE 'Awaiting' END
    FROM  (
      SELECT  ROW_NUMBER() OVER (ORDER BY name) AS N
      FROM  sys.objects
      ) AS TempTally
    WHERE  N <= 20;

    SELECT  *
    FROM  #SampleData
    ORDER BY Row_Num

    MY RECURSIVE CTE SOLUTION 
    ; WITH RECURSIVE_CTE AS
    (
    SELECT *,0 AS GRP FROM #SampleData WHERE Row_Num = 1
    UNION ALL
    SELECT A.*, CASE WHEN A.Code = B.Code THEN B.GRP + 1 WHEN A.Code <> B.Code THEN 0 END
    FROM #SampleData A JOIN RECURSIVE_CTE B ON A.Row_Num = B.Row_Num + 1
    )
    SELECT * FROM RECURSIVE_CTE
    GO


Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply