Very complex SQL Server logic to return multiple rows in a query even if there is only a single row of data, based on report date---will explain in post

  • I have a very complex scenario I am trying to work out in SQL server to save having to enter many duplicate rows of data. I will try and explain it to the best of my ability. Please read the entire thing closely. Even though it is a fairly lengthy post, I promise you ALL of what I am about to explain below matters in coming up with a proper solution for this problem.

    Every day I get a daily report that gives me all Cases that are still open. I create a running monthly tally of these that simply appends all of that days open cases to the end of that months open case file. So at the end of each month I will have roughly 20-22 days worth of data combined together for all non-weekend and holidays.

    The cases contain a field called [Last_Updated_On], which is the date it was last updated. There are roughly 5000 open cases every day. However, of these 5000, approximately 3800 cases every day have the exact same [Last_Updated_On] date, meaning they are exact duplicates of all fields showing up multiple days in a row, as this is the ONLY field that ever changes while a case is open.

    As you can see, if I were to upload all of these exact duplicates to the server, the database would have approximately 75,000 - 80,000 extra records every month for no real reason since the case details are exactly the same. So before I upload the data each month, I remove duplicate records in Excel, leaving me only the 20,000 or so records with [Case_ID] having a unique [Last_Updated_On] date.

    Let's say I have the following columns in the datatable: [Case_ID], [Last_Updated_On] and [Contact_Date]. Additionally, I want there to be 3 virtual calculated columns---[Report Date], [Resolved_Date] and [Age].

    Age is defined as the number of days open between the [Contact_Date] and the [Report_Date]. Now what I want to do is pass in two dates to the Stored Procedure, a Report_Start_Date and a Report_End_Date. [Report_Date] would start with the [Report_Start_Date] and end with [Report_End_Date], incrementing once for each row of data returned. [Resolved_Date] runs a sub_query to check the [Resolved_Case] table for a matching [Case_ID] to determine what date(if at all) the case was Resolved.

    I have 3 examples of how this needs to work below. For simplicity sake in these examples, let's say I pass in one week (only M-F) for the date range of Report_Date_Start and Report_Date_End (3/18/2019 - 3/22/2019).

    In the first case, let's also say this case is not resolved so it has been open all 5 days, and there have been no changes to [Last_Updated_On]. This means that it will store only 1 ROW in the database as the other 4 days would be exact duplicates. However, when I return the query, I need it to return 5 ROWS of data, 1 for each day in the [Report_Start_Date] through [Report_End_Date] range used. The only difference would be in the [Age] and [Report_Date] fields, since each day the case is open, it would increment Age by 1.

    So I know there must be a way to do something like this, but I am really clueless as to where to even start. How do I return multiple rows of data from a single row? the only experience I had in this happening was when I did a Full join before but I am unsure how applicable this is to this situation.

    So for instance here is a map out of what I described above:
    Data in server:

    [Case_ID]        [Last_Updated_On]          [Contact_Date]       [Resolved_Date]  
    US2309-2323         3/15/2019                3/15/2019               NULL

    Query_Returns: 

    [Case_ID]        [Last_Updated_On]           [Contact_Date]       [Report_Date]    [Age] 
    US2309-2323          3/15/2019                3/15/2019            3/18/2019          1 
    US2309-2323          3/15/2019                3/15/2019            3/19/2019          2 
    US2309-2323          3/15/2019                3/15/2019            3/20/2019          3 
    US2309-2323          3/15/2019                3/15/2019            3/21/2019          4 
    US2309-2323          3/15/2019                3/15/2019            3/22/2019          5

    As you can see, the query would return 5 rows---one for each date in the report_date range where the report_date is less than the resolved_date(or all dates if the resolved_date is NULL).

    Now here would be another instance with an additional row for this case based on different [Last_Contacted_On] dates being there and there being multiple rows in the database:

    Data in server: 

    [Case_ID]      [Last_Updated_On]     [Contact_Date]  [Resolved_Date]  
    US2309-2323       3/15/2019            3/15/2019        NULL 
    US2309-2323       3/19/2019            3/15/2019        NULL 
    US2309-2323       3/21/2019            3/15/2019        NULL

    Query_Returns: 

    [Case_ID]      [Last_Updated_On] [Contact_Date] [Report_Date] [Age] 
    US2309-2323       3/15/2019       3/15/2019      3/18/2019        1 
    US2309-2323       3/19/2019       3/15/2019      3/19/2019        2 
    US2309-2323       3/19/2019       3/15/2019      3/20/2019        3 
    US2309-2323       3/21/2019       3/15/2019      3/21/2019        4 
    US2309-2323       3/21/2019       3/15/2019      3/22/2019        5

    As you can see, which [Last_Updated_On] date gets returned for each row is dependant on the [Report_Date]. There are 3 date checks that would need to be made here to determine which one to bring back. 3/15/2019, 3/19/2019 and 3/21/2019. It should be self explanatory why the [Last_Updated_On] dates are what they are.

    The [Report_Date] only takes the latest available [Last_Updated_On] date that is less than or equal to itself. On 3/18, the row would return a [Last_Updated_On] date of 3/15 since 3/19 hasn't occurred yet and it couldn't possibly have been updated tomorrow for the day before. On 3/19, it was last updated on the same day, so it now returns 3/19 for the [Last_Updated_On] date. Same for 3/20 since the [Last_Updated_On] Date used the maximum day that is Less than or equal to the [Report_Date]. On 3/21, it was updated again, so now the [Last_Updated_On] date is 3/21 for the [Report_date] 3/21 and 3/22.

    Now for the 3rd scenario, where The case gets resolved somewhere in the middle of the [Report_Start_Date] and [Report_End_Date] range.

    Data in server:

    [Case_ID]  [Last_Updated_On] [Contact_Date] [Resolved_Date]
    US2309-2323      3/15/2019        3/15/2019       NULL
    US2309-2323      3/19/2019        3/15/2019       NULL
    US2309-2323      3/21/2019        3/15/2019       3/21/2019

    Query_Returns:

    [Case_ID]  [Last_Updated_On] [Contact_Date] [Report_Date] [Age]
    US2309-2323    3/15/2019        3/15/2019      3/18/2019     1
    US2309-2323    3/19/2019        3/15/2019      3/19/2019     2
    US2309-2323    3/19/2019        3/15/2019      3/20/2019     3
    US2309-2323    3/21/2019        3/15/2019      3/21/2019     4

    Notice here that even though the reporting dates were set from 3/18/2019 to 3/22/2019, since the case was resolved on 3/21/2019, there is no row returned for 3/22/2019 since as of 3/21/2019 the case was no longer open.

    I hope this explains in depth what I am trying to accomplish, and I know there must be a way to do this, but this is really much more complex of a query than I have ever even attempted. I don't even know where to begin so any help that could be provided would be greatly appreciated.

  • matter2003 - Thursday, March 21, 2019 5:16 PM

    I have a very complex scenario I am trying to work out in SQL server to save having to enter many duplicate rows of data. I will try and explain it to the best of my ability. Please read the entire thing closely. Even though it is a fairly lengthy post, I promise you ALL of what I am about to explain below matters in coming up with a proper solution for this problem.

    Every day I get a daily report that gives me all Cases that are still open. I create a running monthly tally of these that simply appends all of that days open cases to the end of that months open case file. So at the end of each month I will have roughly 20-22 days worth of data combined together for all non-weekend and holidays.

    The cases contain a field called [Last_Updated_On], which is the date it was last updated. There are roughly 5000 open cases every day. However, of these 5000, approximately 3800 cases every day have the exact same [Last_Updated_On] date, meaning they are exact duplicates of all fields showing up multiple days in a row, as this is the ONLY field that ever changes while a case is open.

    As you can see, if I were to upload all of these exact duplicates to the server, the database would have approximately 75,000 - 80,000 extra records every month for no real reason since the case details are exactly the same. So before I upload the data each month, I remove duplicate records in Excel, leaving me only the 20,000 or so records with [Case_ID] having a unique [Last_Updated_On] date.

    Let's say I have the following columns in the datatable: [Case_ID], [Last_Updated_On] and [Contact_Date]. Additionally, I want there to be 3 virtual calculated columns---[Report Date], [Resolved_Date] and [Age].

    Age is defined as the number of days open between the [Contact_Date] and the [Report_Date]. Now what I want to do is pass in two dates to the Stored Procedure, a Report_Start_Date and a Report_End_Date. [Report_Date] would start with the [Report_Start_Date] and end with [Report_End_Date], incrementing once for each row of data returned. [Resolved_Date] runs a sub_query to check the [Resolved_Case] table for a matching [Case_ID] to determine what date(if at all) the case was Resolved.

    I have 3 examples of how this needs to work below. For simplicity sake in these examples, let's say I pass in one week (only M-F) for the date range of Report_Date_Start and Report_Date_End (3/18/2019 - 3/22/2019).

    In the first case, let's also say this case is not resolved so it has been open all 5 days, and there have been no changes to [Last_Updated_On]. This means that it will store only 1 ROW in the database as the other 4 days would be exact duplicates. However, when I return the query, I need it to return 5 ROWS of data, 1 for each day in the [Report_Start_Date] through [Report_End_Date] range used. The only difference would be in the [Age] and [Report_Date] fields, since each day the case is open, it would increment Age by 1.

    So I know there must be a way to do something like this, but I am really clueless as to where to even start. How do I return multiple rows of data from a single row? the only experience I had in this happening was when I did a Full join before but I am unsure how applicable this is to this situation.

    So for instance here is a map out of what I described above:
    Data in server:

    [Case_ID]        [Last_Updated_On]          [Contact_Date]       [Resolved_Date]  
    US2309-2323         3/15/2019                3/15/2019               NULL

    Query_Returns: 

    [Case_ID]        [Last_Updated_On]           [Contact_Date]       [Report_Date]    [Age] 
    US2309-2323          3/15/2019                3/15/2019            3/18/2019          1 
    US2309-2323          3/15/2019                3/15/2019            3/19/2019          2 
    US2309-2323          3/15/2019                3/15/2019            3/20/2019          3 
    US2309-2323          3/15/2019                3/15/2019            3/21/2019          4 
    US2309-2323          3/15/2019                3/15/2019            3/22/2019          5

    As you can see, the query would return 5 rows---one for each date in the report_date range where the report_date is less than the resolved_date(or all dates if the resolved_date is NULL).

    Now here would be another instance with an additional row for this case based on different [Last_Contacted_On] dates being there and there being multiple rows in the database:

    Data in server: 

    [Case_ID]      [Last_Updated_On]     [Contact_Date]  [Resolved_Date]  
    US2309-2323       3/15/2019            3/15/2019        NULL 
    US2309-2323       3/19/2019            3/15/2019        NULL 
    US2309-2323       3/21/2019            3/15/2019        NULL

    Query_Returns: 

    [Case_ID]      [Last_Updated_On] [Contact_Date] [Report_Date] [Age] 
    US2309-2323       3/15/2019       3/15/2019      3/18/2019        1 
    US2309-2323       3/19/2019       3/15/2019      3/19/2019        2 
    US2309-2323       3/19/2019       3/15/2019      3/20/2019        3 
    US2309-2323       3/21/2019       3/15/2019      3/21/2019        4 
    US2309-2323       3/21/2019       3/15/2019      3/22/2019        5

    As you can see, which [Last_Updated_On] date gets returned for each row is dependant on the [Report_Date]. There are 3 date checks that would need to be made here to determine which one to bring back. 3/15/2019, 3/19/2019 and 3/21/2019. It should be self explanatory why the [Last_Updated_On] dates are what they are.

    The [Report_Date] only takes the latest available [Last_Updated_On] date that is less than or equal to itself. On 3/18, the row would return a [Last_Updated_On] date of 3/15 since 3/19 hasn't occurred yet and it couldn't possibly have been updated tomorrow for the day before. On 3/19, it was last updated on the same day, so it now returns 3/19 for the [Last_Updated_On] date. Same for 3/20 since the [Last_Updated_On] Date used the maximum day that is Less than or equal to the [Report_Date]. On 3/21, it was updated again, so now the [Last_Updated_On] date is 3/21 for the [Report_date] 3/21 and 3/22.

    Now for the 3rd scenario, where The case gets resolved somewhere in the middle of the [Report_Start_Date] and [Report_End_Date] range.

    Data in server:

    [Case_ID]  [Last_Updated_On] [Contact_Date] [Resolved_Date]
    US2309-2323      3/15/2019        3/15/2019       NULL
    US2309-2323      3/19/2019        3/15/2019       NULL
    US2309-2323      3/21/2019        3/15/2019       3/21/2019

    Query_Returns:

    [Case_ID]  [Last_Updated_On] [Contact_Date] [Report_Date] [Age]
    US2309-2323    3/15/2019        3/15/2019      3/18/2019     1
    US2309-2323    3/19/2019        3/15/2019      3/19/2019     2
    US2309-2323    3/19/2019        3/15/2019      3/20/2019     3
    US2309-2323    3/21/2019        3/15/2019      3/21/2019     4

    Notice here that even though the reporting dates were set from 3/18/2019 to 3/22/2019, since the case was resolved on 3/21/2019, there is no row returned for 3/22/2019 since as of 3/21/2019 the case was no longer open.

    I hope this explains in depth what I am trying to accomplish, and I know there must be a way to do this, but this is really much more complex of a query than I have ever even attempted. I don't even know where to begin so any help that could be provided would be greatly appreciated.

    What would be better would be to include the DDL for the table, sample data as insert statements.  I would make the sample data match what you posted, perhaps with different case ids for the different scenarios you posted.

  • Lynn Pettis - Thursday, March 21, 2019 10:25 PM

    matter2003 - Thursday, March 21, 2019 5:16 PM

    I have a very complex scenario I am trying to work out in SQL server to save having to enter many duplicate rows of data. I will try and explain it to the best of my ability. Please read the entire thing closely. Even though it is a fairly lengthy post, I promise you ALL of what I am about to explain below matters in coming up with a proper solution for this problem.

    Every day I get a daily report that gives me all Cases that are still open. I create a running monthly tally of these that simply appends all of that days open cases to the end of that months open case file. So at the end of each month I will have roughly 20-22 days worth of data combined together for all non-weekend and holidays.

    The cases contain a field called [Last_Updated_On], which is the date it was last updated. There are roughly 5000 open cases every day. However, of these 5000, approximately 3800 cases every day have the exact same [Last_Updated_On] date, meaning they are exact duplicates of all fields showing up multiple days in a row, as this is the ONLY field that ever changes while a case is open.

    As you can see, if I were to upload all of these exact duplicates to the server, the database would have approximately 75,000 - 80,000 extra records every month for no real reason since the case details are exactly the same. So before I upload the data each month, I remove duplicate records in Excel, leaving me only the 20,000 or so records with [Case_ID] having a unique [Last_Updated_On] date.

    Let's say I have the following columns in the datatable: [Case_ID], [Last_Updated_On] and [Contact_Date]. Additionally, I want there to be 3 virtual calculated columns---[Report Date], [Resolved_Date] and [Age].

    Age is defined as the number of days open between the [Contact_Date] and the [Report_Date]. Now what I want to do is pass in two dates to the Stored Procedure, a Report_Start_Date and a Report_End_Date. [Report_Date] would start with the [Report_Start_Date] and end with [Report_End_Date], incrementing once for each row of data returned. [Resolved_Date] runs a sub_query to check the [Resolved_Case] table for a matching [Case_ID] to determine what date(if at all) the case was Resolved.

    I have 3 examples of how this needs to work below. For simplicity sake in these examples, let's say I pass in one week (only M-F) for the date range of Report_Date_Start and Report_Date_End (3/18/2019 - 3/22/2019).

    In the first case, let's also say this case is not resolved so it has been open all 5 days, and there have been no changes to [Last_Updated_On]. This means that it will store only 1 ROW in the database as the other 4 days would be exact duplicates. However, when I return the query, I need it to return 5 ROWS of data, 1 for each day in the [Report_Start_Date] through [Report_End_Date] range used. The only difference would be in the [Age] and [Report_Date] fields, since each day the case is open, it would increment Age by 1.

    So I know there must be a way to do something like this, but I am really clueless as to where to even start. How do I return multiple rows of data from a single row? the only experience I had in this happening was when I did a Full join before but I am unsure how applicable this is to this situation.

    So for instance here is a map out of what I described above:
    Data in server:

    [Case_ID]        [Last_Updated_On]          [Contact_Date]       [Resolved_Date]  
    US2309-2323         3/15/2019                3/15/2019               NULL

    Query_Returns: 

    [Case_ID]        [Last_Updated_On]           [Contact_Date]       [Report_Date]    [Age] 
    US2309-2323          3/15/2019                3/15/2019            3/18/2019          1 
    US2309-2323          3/15/2019                3/15/2019            3/19/2019          2 
    US2309-2323          3/15/2019                3/15/2019            3/20/2019          3 
    US2309-2323          3/15/2019                3/15/2019            3/21/2019          4 
    US2309-2323          3/15/2019                3/15/2019            3/22/2019          5

    As you can see, the query would return 5 rows---one for each date in the report_date range where the report_date is less than the resolved_date(or all dates if the resolved_date is NULL).

    Now here would be another instance with an additional row for this case based on different [Last_Contacted_On] dates being there and there being multiple rows in the database:

    Data in server: 

    [Case_ID]      [Last_Updated_On]     [Contact_Date]  [Resolved_Date]  
    US2309-2323       3/15/2019            3/15/2019        NULL 
    US2309-2323       3/19/2019            3/15/2019        NULL 
    US2309-2323       3/21/2019            3/15/2019        NULL

    Query_Returns: 

    [Case_ID]      [Last_Updated_On] [Contact_Date] [Report_Date] [Age] 
    US2309-2323       3/15/2019       3/15/2019      3/18/2019        1 
    US2309-2323       3/19/2019       3/15/2019      3/19/2019        2 
    US2309-2323       3/19/2019       3/15/2019      3/20/2019        3 
    US2309-2323       3/21/2019       3/15/2019      3/21/2019        4 
    US2309-2323       3/21/2019       3/15/2019      3/22/2019        5

    As you can see, which [Last_Updated_On] date gets returned for each row is dependant on the [Report_Date]. There are 3 date checks that would need to be made here to determine which one to bring back. 3/15/2019, 3/19/2019 and 3/21/2019. It should be self explanatory why the [Last_Updated_On] dates are what they are.

    The [Report_Date] only takes the latest available [Last_Updated_On] date that is less than or equal to itself. On 3/18, the row would return a [Last_Updated_On] date of 3/15 since 3/19 hasn't occurred yet and it couldn't possibly have been updated tomorrow for the day before. On 3/19, it was last updated on the same day, so it now returns 3/19 for the [Last_Updated_On] date. Same for 3/20 since the [Last_Updated_On] Date used the maximum day that is Less than or equal to the [Report_Date]. On 3/21, it was updated again, so now the [Last_Updated_On] date is 3/21 for the [Report_date] 3/21 and 3/22.

    Now for the 3rd scenario, where The case gets resolved somewhere in the middle of the [Report_Start_Date] and [Report_End_Date] range.

    Data in server:

    [Case_ID]  [Last_Updated_On] [Contact_Date] [Resolved_Date]
    US2309-2323      3/15/2019        3/15/2019       3/21/2019
    US2309-2323      3/19/2019        3/15/2019       3/21/2019
    US2309-2323      3/21/2019        3/15/2019       3/21/2019

    Query_Returns:

    [Case_ID]  [Last_Updated_On] [Contact_Date] [Report_Date] [Age]
    US2309-2323    3/15/2019        3/15/2019      3/18/2019     1
    US2309-2323    3/19/2019        3/15/2019      3/19/2019     2
    US2309-2323    3/19/2019        3/15/2019      3/20/2019     3
    US2309-2323    3/21/2019        3/15/2019      3/21/2019     4

    Notice here that even though the reporting dates were set from 3/18/2019 to 3/22/2019, since the case was resolved on 3/21/2019, there is no row returned for 3/22/2019 since as of 3/21/2019 the case was no longer open.

    I hope this explains in depth what I am trying to accomplish, and I know there must be a way to do this, but this is really much more complex of a query than I have ever even attempted. I don't even know where to begin so any help that could be provided would be greatly appreciated.

    What would be better would be to include the DDL for the table, sample data as insert statements.  I would make the sample data match what you posted, perhaps with different case ids for the different scenarios you posted.


    CREATE TABLE Open_Cases (
      [Case_ID] varchar(15),
      [Last_Updated_On] datetime,
      [Contact_Date] datetime,
    )
    CREATE TABLE Resolved_Cases(
    [Case_ID] varchar(15),
    [Resolved_Date] datetime
    )

    INSERT INTO Open_Cases VALUES ('US2309-2323',  '3/15/2019',   '3/15/2019 ')
    INSERT INTO Open_Cases VALUES ('US2309-2323',  '3/19/2019',   '3/15/2019 ')
    INSERT INTO Open_Cases VALUES ('US2309-2323',  '3/20/2019',   '3/15/2019 ')

    INSERT INTO Resolved_Cases VALUES ('US2309-2323',  '3/21/2019')

    DECLARE @Report_Start_Date date;
    DECLARE @Reprot_End_Date date;
    SET @Report_Start_Date = '3/18/2019';
    SET @Report_End_Date = '3/22/2019';

    BEGIN 
      WITH GetReportDate AS (
      --***SQL Code to generate each day based on the start and end date range 
    ),
       SELECT o.Case_ID, o.Last_Updated_On, o.Contact_Date, r.Resolved_Date,
       (SELECT CalcAge(@Report_Start_Date, @Report_End_Date)  AS Age,
        FROM Open_Cases AS o
       LEFT JOIN Resolved_Cases as r
       ON r.Case_ID = o.Case_ID
       EXCEPT
       SELECT r.Resolved_Date, FROM Resolved_Cases as r
       LEFT JOIN GetReportDate  as Rep
       ON Rep.Report_Date > r.Resolved_Date
    END

    This should be everything needed be able to help with this issue. Note CalcAge is a Scalar Table Function returning an INT which gives us the number of days between the 2 dates subtracting Holidays and weekends.  I'm not sure if a CTE is what I want to generate a row for each Report_Date based on the start and end date range, but I'm guessing this might work? I do know that I can't do a comparison in the EXCEPT Statement between Report_Date and Resolved_Date because Report_Date is a calculated column, and I know using a CTE would allow the comparison, but again, not sure if this is a valid method to get this working.

      Also guessing that I should use an Except based on the report_date being greater than the Resolved_Date...feel free to correct and/or change any of that Statement as I am giving my best shot at guessing what should go in there...hoping someone can fill in the rest for me.

  • matter2003 - Thursday, March 21, 2019 10:37 PM

    Lynn Pettis - Thursday, March 21, 2019 10:25 PM

    matter2003 - Thursday, March 21, 2019 5:16 PM

    I have a very complex scenario I am trying to work out in SQL server to save having to enter many duplicate rows of data. I will try and explain it to the best of my ability. Please read the entire thing closely. Even though it is a fairly lengthy post, I promise you ALL of what I am about to explain below matters in coming up with a proper solution for this problem.

    Every day I get a daily report that gives me all Cases that are still open. I create a running monthly tally of these that simply appends all of that days open cases to the end of that months open case file. So at the end of each month I will have roughly 20-22 days worth of data combined together for all non-weekend and holidays.

    The cases contain a field called [Last_Updated_On], which is the date it was last updated. There are roughly 5000 open cases every day. However, of these 5000, approximately 3800 cases every day have the exact same [Last_Updated_On] date, meaning they are exact duplicates of all fields showing up multiple days in a row, as this is the ONLY field that ever changes while a case is open.

    As you can see, if I were to upload all of these exact duplicates to the server, the database would have approximately 75,000 - 80,000 extra records every month for no real reason since the case details are exactly the same. So before I upload the data each month, I remove duplicate records in Excel, leaving me only the 20,000 or so records with [Case_ID] having a unique [Last_Updated_On] date.

    Let's say I have the following columns in the datatable: [Case_ID], [Last_Updated_On] and [Contact_Date]. Additionally, I want there to be 3 virtual calculated columns---[Report Date], [Resolved_Date] and [Age].

    Age is defined as the number of days open between the [Contact_Date] and the [Report_Date]. Now what I want to do is pass in two dates to the Stored Procedure, a Report_Start_Date and a Report_End_Date. [Report_Date] would start with the [Report_Start_Date] and end with [Report_End_Date], incrementing once for each row of data returned. [Resolved_Date] runs a sub_query to check the [Resolved_Case] table for a matching [Case_ID] to determine what date(if at all) the case was Resolved.

    I have 3 examples of how this needs to work below. For simplicity sake in these examples, let's say I pass in one week (only M-F) for the date range of Report_Date_Start and Report_Date_End (3/18/2019 - 3/22/2019).

    In the first case, let's also say this case is not resolved so it has been open all 5 days, and there have been no changes to [Last_Updated_On]. This means that it will store only 1 ROW in the database as the other 4 days would be exact duplicates. However, when I return the query, I need it to return 5 ROWS of data, 1 for each day in the [Report_Start_Date] through [Report_End_Date] range used. The only difference would be in the [Age] and [Report_Date] fields, since each day the case is open, it would increment Age by 1.

    So I know there must be a way to do something like this, but I am really clueless as to where to even start. How do I return multiple rows of data from a single row? the only experience I had in this happening was when I did a Full join before but I am unsure how applicable this is to this situation.

    So for instance here is a map out of what I described above:
    Data in server:

    [Case_ID]        [Last_Updated_On]          [Contact_Date]       [Resolved_Date]  
    US2309-2323         3/15/2019                3/15/2019               NULL

    Query_Returns: 

    [Case_ID]        [Last_Updated_On]           [Contact_Date]       [Report_Date]    [Age] 
    US2309-2323          3/15/2019                3/15/2019            3/18/2019          1 
    US2309-2323          3/15/2019                3/15/2019            3/19/2019          2 
    US2309-2323          3/15/2019                3/15/2019            3/20/2019          3 
    US2309-2323          3/15/2019                3/15/2019            3/21/2019          4 
    US2309-2323          3/15/2019                3/15/2019            3/22/2019          5

    As you can see, the query would return 5 rows---one for each date in the report_date range where the report_date is less than the resolved_date(or all dates if the resolved_date is NULL).

    Now here would be another instance with an additional row for this case based on different [Last_Contacted_On] dates being there and there being multiple rows in the database:

    Data in server: 

    [Case_ID]      [Last_Updated_On]     [Contact_Date]  [Resolved_Date]  
    US2309-2323       3/15/2019            3/15/2019        NULL 
    US2309-2323       3/19/2019            3/15/2019        NULL 
    US2309-2323       3/21/2019            3/15/2019        NULL

    Query_Returns: 

    [Case_ID]      [Last_Updated_On] [Contact_Date] [Report_Date] [Age] 
    US2309-2323       3/15/2019       3/15/2019      3/18/2019        1 
    US2309-2323       3/19/2019       3/15/2019      3/19/2019        2 
    US2309-2323       3/19/2019       3/15/2019      3/20/2019        3 
    US2309-2323       3/21/2019       3/15/2019      3/21/2019        4 
    US2309-2323       3/21/2019       3/15/2019      3/22/2019        5

    As you can see, which [Last_Updated_On] date gets returned for each row is dependant on the [Report_Date]. There are 3 date checks that would need to be made here to determine which one to bring back. 3/15/2019, 3/19/2019 and 3/21/2019. It should be self explanatory why the [Last_Updated_On] dates are what they are.

    The [Report_Date] only takes the latest available [Last_Updated_On] date that is less than or equal to itself. On 3/18, the row would return a [Last_Updated_On] date of 3/15 since 3/19 hasn't occurred yet and it couldn't possibly have been updated tomorrow for the day before. On 3/19, it was last updated on the same day, so it now returns 3/19 for the [Last_Updated_On] date. Same for 3/20 since the [Last_Updated_On] Date used the maximum day that is Less than or equal to the [Report_Date]. On 3/21, it was updated again, so now the [Last_Updated_On] date is 3/21 for the [Report_date] 3/21 and 3/22.

    Now for the 3rd scenario, where The case gets resolved somewhere in the middle of the [Report_Start_Date] and [Report_End_Date] range.

    Data in server:

    [Case_ID]  [Last_Updated_On] [Contact_Date] [Resolved_Date]
    US2309-2323      3/15/2019        3/15/2019       3/21/2019
    US2309-2323      3/19/2019        3/15/2019       3/21/2019
    US2309-2323      3/21/2019        3/15/2019       3/21/2019

    Query_Returns:

    [Case_ID]  [Last_Updated_On] [Contact_Date] [Report_Date] [Age]
    US2309-2323    3/15/2019        3/15/2019      3/18/2019     1
    US2309-2323    3/19/2019        3/15/2019      3/19/2019     2
    US2309-2323    3/19/2019        3/15/2019      3/20/2019     3
    US2309-2323    3/21/2019        3/15/2019      3/21/2019     4

    Notice here that even though the reporting dates were set from 3/18/2019 to 3/22/2019, since the case was resolved on 3/21/2019, there is no row returned for 3/22/2019 since as of 3/21/2019 the case was no longer open.

    I hope this explains in depth what I am trying to accomplish, and I know there must be a way to do this, but this is really much more complex of a query than I have ever even attempted. I don't even know where to begin so any help that could be provided would be greatly appreciated.

    What would be better would be to include the DDL for the table, sample data as insert statements.  I would make the sample data match what you posted, perhaps with different case ids for the different scenarios you posted.


    CREATE TABLE Open_Cases (
      [Case_ID] varchar(15),
      [Last_Updated_On] datetime,
      [Contact_Date] datetime,
    )
    CREATE TABLE Resolved_Cases(
    [Case_ID] varchar(15),
    [Resolved_Date] datetime
    )

    INSERT INTO Open_Cases VALUES ('US2309-2323',  '3/15/2019',   '3/15/2019 ')
    INSERT INTO Open_Cases VALUES ('US2309-2323',  '3/19/2019',   '3/15/2019 ')
    INSERT INTO Open_Cases VALUES ('US2309-2323',  '3/20/2019',   '3/15/2019 ')

    INSERT INTO Resolved_Cases VALUES ('US2309-2323',  '3/21/2019')

    DECLARE @Report_Start_Date date;
    DECLARE @Reprot_End_Date date;
    SET @Report_Start_Date = '3/18/2019';
    SET @Report_End_Date = '3/22/2019';

    BEGIN 
      WITH GetReportDate AS (
      --***SQL Code to generate each day based on the start and end date range 
    ),
       SELECT o.Case_ID, o.Last_Updated_On, o.Contact_Date, r.Resolved_Date,
       (SELECT CalcAge(@Report_Start_Date, @Report_End_Date)  AS Age,
        FROM Open_Cases AS o
       LEFT JOIN Resolved_Cases as r
       ON r.Case_ID = o.Case_ID
       EXCEPT
       SELECT r.Resolved_Date, FROM Resolved_Cases as r
       LEFT JOIN GetReportDate  as Rep
       ON Rep.Report_Date > r.Resolved_Date
    END

    This should be everything needed be able to help with this issue. Note CalcAge is a Scalar Table Function returning an INT which gives us the number of days between the 2 dates subtracting Holidays and weekends.  I'm not sure if a CTE is what I want to generate a row for each Report_Date based on the start and end date range, but I'm guessing this might work? I do know that I can't do a comparison in the EXCEPT Statement between Report_Date and Resolved_Date because Report_Date is a calculated column, and I know using a CTE would allow the comparison, but again, not sure if this is a valid method to get this working.

      Also guessing that I should use an Except based on the report_date being greater than the Resolved_Date...feel free to correct and/or change any of that Statement as I am giving my best shot at guessing what should go in there...hoping someone can fill in the rest for me.

    Please post the code for the UDF.  Also, did you actually post enough sample data to cover all the cases you posted in your problem description?  It looks rather lean on that side.  I also asked you use different case ids for each of the different cases you posted.

  • Part of the problem is that your UDF is a scalar function (not a table-valued function) given the way that you are calling it.  I've substituted a Table Value Constructor for your function.
    PS: Scalar functions and table-valued functions are mutually exclusive.  You cannot have a scalar table-valued function.

    As Lynn mentioned, your sample data only covers one case.  I added a second case where the case had not yet been resolved.

    I believe that this gives you what you need.  It is very loosely based on work by Itzik Ben-Gan on packing intervals.

    WITH Case_Statuses AS
    (
        SELECT
            o.Case_ID
        ,    o.Contact_Date
        ,    o.Last_Updated_On AS Update_Status_Begin_Date
        ,    LEAD(o.Last_Updated_On, 1, COALESCE(r.Resolved_Date, '9999-12-31')) OVER(PARTITION BY o.Case_ID ORDER BY o.Last_Updated_On) AS Update_Status_End_Dt
        FROM #Open_Cases o
        LEFT OUTER JOIN #Resolved_Cases r
            ON o.Case_ID = r.Case_ID
    )

    SELECT *
    FROM Case_Statuses cs
    CROSS APPLY ( VALUES(1), (2), (3), (4), (5), (6) ) CalcAge(Age)
    CROSS APPLY ( VALUES(DATEADD(DAY, CalcAge.Age - 1, @Report_Start_Date) ) ) rd(Report_Date)
    WHERE cs.Update_Status_Begin_Date <= rd.Report_Date
        AND rd.Report_Date < cs.Update_Status_End_Dt
        AND rd.Report_Date BETWEEN @Report_Start_Date AND @Report_End_Date

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • This is not a full solution, I know, but I made a start for you.  I had a hard time understanding exactly how the sample data you provided related to the table layouts you showed, so I may be a little off in my solution.  Here is what I have; explanation to follow:
    CREATE    -- TRUNCATE -- DROP
    TABLE    Open_Cases
        (
        [Case_ID] varchar(15),
        [Last_Updated_On] datetime,
        [Contact_Date] datetime,
        )

    CREATE -- TRUNCATE -- DROP
    TABLE    Resolved_Cases
        (
        [Case_ID] varchar(15),
        [Resolved_Date] datetime
        )

    INSERT INTO Open_Cases VALUES ('US2309-2323', '3/15/2019', '3/15/2019 ')

    DECLARE    @Report_Start_Date date = '3/18/2019';
    DECLARE    @Report_End_Date date = '3/22/2019';

    WITH    Four
    AS    (SELECT 1 AS N UNION SELECT 2 UNION SELECT 3 UNION SELECT 4),
        Sixteen
    AS    (SELECT ROW_NUMBER() OVER (ORDER BY a.N) AS N FROM Four a CROSS JOIN Four b),
        TwoFiftySix
    AS    (SELECT ROW_NUMBER() OVER (ORDER BY a.N) AS N FROM Sixteen a CROSS JOIN Sixteen b),
        LocalTally
    AS    (SELECT ROW_NUMBER() OVER (ORDER BY a.N) AS N FROM TwoFiftySix a CROSS JOIN Sixteen b),
        Reporting_Days
    AS    (
        SELECT    N, DATEADD(DAY, N - 1, @Report_Start_Date) AS Report_Date
        FROM    LocalTally
        WHERE    N <= DATEDIFF(DAY, @Report_Start_Date, @Report_End_Date) + 1
        )
    SELECT    b.Case_ID,
        b.Last_Updated_On,
        b.Contact_Date,
        r.Report_Date,
        DATEDIFF(DAY, @Report_Start_Date, r.Report_Date) + 1 AS Age
    FROM        (
            SELECT    o.Case_ID,
                o.Last_Updated_On,
                o.Contact_Date,
                r.Resolved_Date,
                ISNULL(r.Resolved_Date, '99991231') AS Boundary_Date
            FROM        Open_Cases o
                LEFT JOIN Resolved_Cases r
                ON    r.Case_ID = o.Case_ID
            ) b -- base_data
        JOIN    Reporting_Days r
        ON        r.Report_Date BETWEEN b.Last_Updated_On AND b.Boundary_Date

    The sample data loads one record into the Open_Cases table because your "Data in Server" example has only one row.  Because the "Resolved_Date" is null, there is no data in the Resolved_Cases table.
    If you have not researched Tally tables, you should look into it.  Basically, it is a table with sequential numbers from 1 to whatever you want.  Jeff Moden has great information on Tally tables, and lots of research to support their use.  Here are two articles to get you started:
    http://www.sqlservercentral.com/articles/T-SQL/62867/
    http://www.sqlservercentral.com/blogs/dwainsql/2014/03/27/tally-tables-in-t-sql/
    This table allows you to turn one row into multiple, which is how it is used here.  Rather than having an actual table named Tally (which has advantages and disadvantages), this solution creates one using CROSS JOIN and ROW_NUMBER(), which Moden says is very fast because there are no table reads, and I have found that to be true in my testing.  This version produces only 4096 rows, but that should be sufficient for this solution.
    I use the LocalTally table to create one row per reporting date between the parameter start and end dates, then join the base data to it using the dates in the base data.  The output here matches the "Query_Returns" section in your original post, so if this does not do exactly what you want, it may point the way to a solution for you.

  • fahey.jonathan - Friday, March 22, 2019 9:22 AM

    This is not a full solution, I know, but I made a start for you.  I had a hard time understanding exactly how the sample data you provided related to the table layouts you showed, so I may be a little off in my solution.  Here is what I have; explanation to follow:
    CREATE    -- TRUNCATE -- DROP
    TABLE    Open_Cases
        (
        [Case_ID] varchar(15),
        [Last_Updated_On] datetime,
        [Contact_Date] datetime,
        )

    CREATE -- TRUNCATE -- DROP
    TABLE    Resolved_Cases
        (
        [Case_ID] varchar(15),
        [Resolved_Date] datetime
        )

    INSERT INTO Open_Cases VALUES ('US2309-2323', '3/15/2019', '3/15/2019 ')

    DECLARE    @Report_Start_Date date = '3/18/2019';
    DECLARE    @Report_End_Date date = '3/22/2019';

    WITH    Four
    AS    (SELECT 1 AS N UNION SELECT 2 UNION SELECT 3 UNION SELECT 4),
        Sixteen
    AS    (SELECT ROW_NUMBER() OVER (ORDER BY a.N) AS N FROM Four a CROSS JOIN Four b),
        TwoFiftySix
    AS    (SELECT ROW_NUMBER() OVER (ORDER BY a.N) AS N FROM Sixteen a CROSS JOIN Sixteen b),
        LocalTally
    AS    (SELECT ROW_NUMBER() OVER (ORDER BY a.N) AS N FROM TwoFiftySix a CROSS JOIN Sixteen b),
        Reporting_Days
    AS    (
        SELECT    N, DATEADD(DAY, N - 1, @Report_Start_Date) AS Report_Date
        FROM    LocalTally
        WHERE    N <= DATEDIFF(DAY, @Report_Start_Date, @Report_End_Date) + 1
        )
    SELECT    b.Case_ID,
        b.Last_Updated_On,
        b.Contact_Date,
        r.Report_Date,
        DATEDIFF(DAY, @Report_Start_Date, r.Report_Date) + 1 AS Age
    FROM        (
            SELECT    o.Case_ID,
                o.Last_Updated_On,
                o.Contact_Date,
                r.Resolved_Date,
                ISNULL(r.Resolved_Date, '99991231') AS Boundary_Date
            FROM        Open_Cases o
                LEFT JOIN Resolved_Cases r
                ON    r.Case_ID = o.Case_ID
            ) b -- base_data
        JOIN    Reporting_Days r
        ON        r.Report_Date BETWEEN b.Last_Updated_On AND b.Boundary_Date

    The sample data loads one record into the Open_Cases table because your "Data in Server" example has only one row.  Because the "Resolved_Date" is null, there is no data in the Resolved_Cases table.
    If you have not researched Tally tables, you should look into it.  Basically, it is a table with sequential numbers from 1 to whatever you want.  Jeff Moden has great information on Tally tables, and lots of research to support their use.  Here are two articles to get you started:
    http://www.sqlservercentral.com/articles/T-SQL/62867/
    http://www.sqlservercentral.com/blogs/dwainsql/2014/03/27/tally-tables-in-t-sql/
    This table allows you to turn one row into multiple, which is how it is used here.  Rather than having an actual table named Tally (which has advantages and disadvantages), this solution creates one using CROSS JOIN and ROW_NUMBER(), which Moden says is very fast because there are no table reads, and I have found that to be true in my testing.  This version produces only 4096 rows, but that should be sufficient for this solution.
    I use the LocalTally table to create one row per reporting date between the parameter start and end dates, then join the base data to it using the dates in the base data.  The output here matches the "Query_Returns" section in your original post, so if this does not do exactly what you want, it may point the way to a solution for you.

    This is very close to what I need, however I see two issues:

    1) The start and end dates are dynamic and cannot be hard coded just for a certain date range.  This date range is going to change based on the report being pulled and/or what the users select as start and end dates.

    2) When I add in the other INSERTS(the other two for the open cases with different [Last_Updated_On] dates and the one for the Resolved Case) I am getting duplicate entries displaying for the cases, when it should ONLY be taking the latest [Last_Updated_On] case that has a date before or on the [Report_Date]---I know this would be MAX [Last_Updated_On] date where [Last_Updated_On] <= [Report_Date] but I don't know where that would go in all of this...

    

    In this case it should be returning 4 rows(circled in Red), but it returns 28. It does correctly NOT return anything for 3/22 since the case was resolved on 3/22:
    

  • Lynn Pettis - Friday, March 22, 2019 8:14 AM

    Please post the code for the UDF.  Also, did you actually post enough sample data to cover all the cases you posted in your problem description?  It looks rather lean on that side.  I also asked you use different case ids for each of the different cases you posted.

    As requested, here is the UDF. I believe the cases I posted should be enough to cover any scenario, because additional Case_Id's will be handled in the same way.  But I see what you are saying---the solution is hard to tell if it works in all three cases because when you add data it will only be able to test one of the cases.

    So I will assume the initial data is already loaded from Case 3

    case 1:

    [Case_ID]   [Last_Updated_On] [Contact_Date] [Resolved_Date]
    US2309-2325    3/15/2019         3/15/2019         NULL              

    Case 2:

    [Case_ID]   [Last_Updated_On] [Contact_Date] [Resolved_Date]
    US2309-2326    3/15/2019         3/15/2019         NULL              
    US2309-2326    3/19/2019         3/15/2019         NULL  
    US2309-2326    3/20/2019         3/15/2019         NULL  

    CREATE FUNCTION [CalcAge]

    (

    -- Add the parameters for the function here

    @Start date,

    @End date

    )


    RETURNS int

    AS

    BEGIN

    -- Add the T-SQL statements to compute the return value here RETURN  (SELECT  DATEDIFF(dd, @Start , @End) + 1 )) -  (DATEDIFF(wk, @Start, @End) *2)  -  (SELECT  COUNT(*)  AS NumHolidays FROM Holidays WHERE Holiday_Date  BETWEEN @Start AND @End) -1END

    Holidays is a table that contains dates for holidays(there are none in this example, so that part can be removed). It counts the number of days in the holiday table between the start and end date and then subtracts them from the number of non-weekend days.

    The other two items in the calculation I don't really understand how they work, but I found this solution on stackoverflow and after checking several scenarios, it worked properly, so I used it.  Basically it excludes weekend days from the calculation of the number of days between dates(similar to the NETWORKDAYS function in Excel).  The other solutions talked about  variations of putting every possible day into a table and then checking if each day was a weekend day but I didn't want to do all that, and when I saw this solution provided a way to avoid it, I used it.

  • matter2003 - Friday, March 22, 2019 10:48 AM

    Lynn Pettis - Friday, March 22, 2019 8:14 AM

    Please post the code for the UDF.  Also, did you actually post enough sample data to cover all the cases you posted in your problem description?  It looks rather lean on that side.  I also asked you use different case ids for each of the different cases you posted.

    As requested, here is the UDF. I believe the cases I posted should be enough to cover any scenario, because additional Case_Id's will be handled in the same way.  But I see what you are saying---the solution is hard to tell if it works in all three cases because when you add data it will only be able to test one of the cases.

    So I will assume the initial data is already loaded from Case 3

    case 1:

    [Case_ID]   [Last_Updated_On] [Contact_Date] [Resolved_Date]
    US2309-2325    3/15/2019         3/15/2019         NULL              

    Case 2:

    [Case_ID]   [Last_Updated_On] [Contact_Date] [Resolved_Date]
    US2309-2326    3/15/2019         3/15/2019         NULL              
    US2309-2326    3/19/2019         3/15/2019         NULL  
    US2309-2326    3/20/2019         3/15/2019         NULL  

    CREATE FUNCTION [CalcAge]

    (

    -- Add the parameters for the function here

    @Start date,

    @End date

    )


    RETURNS int

    AS

    BEGIN

    -- Add the T-SQL statements to compute the return value here RETURN  (SELECT  DATEDIFF(dd, @Start , @End) + 1 )) -  (DATEDIFF(wk, @Start, @End) *2)  -  (SELECT  COUNT(*)  AS NumHolidays FROM Holidays WHERE Holiday_Date  BETWEEN @Start AND @End) -1END

    Holidays is a table that contains dates for holidays(there are none in this example, so that part can be removed). It counts the number of days in the holiday table between the start and end date and then subtracts them from the number of non-weekend days.

    The other two items in the calculation I don't really understand how they work, but I found this solution on stackoverflow and after checking several scenarios, it worked properly, so I used it.  Basically it excludes weekend days from the calculation of the number of days between dates(similar to the NETWORKDAYS function in Excel).  The other solutions talked about  variations of putting every possible day into a table and then checking if each day was a weekend day but I didn't want to do all that, and when I saw this solution provided a way to avoid it, I used it.

    You keep adding additional tables to the requirements.  Now we need your table Holidays if we are to actually help.  Also, I still thing you have left out data for all the cases you posted in your initial post.  If the data you have already posted is to be used for all the cases, I am at a loss.

  • drew.allen - Friday, March 22, 2019 9:06 AM

    Part of the problem is that your UDF is a scalar function (not a table-valued function) given the way that you are calling it.  I've substituted a Table Value Constructor for your function.
    PS: Scalar functions and table-valued functions are mutually exclusive.  You cannot have a scalar table-valued function.

    As Lynn mentioned, your sample data only covers one case.  I added a second case where the case had not yet been resolved.

    I believe that this gives you what you need.  It is very loosely based on work by Itzik Ben-Gan on packing intervals.

    WITH Case_Statuses AS
    (
        SELECT
            o.Case_ID
        ,    o.Contact_Date
        ,    o.Last_Updated_On AS Update_Status_Begin_Date
        ,    LEAD(o.Last_Updated_On, 1, COALESCE(r.Resolved_Date, '9999-12-31')) OVER(PARTITION BY o.Case_ID ORDER BY o.Last_Updated_On) AS Update_Status_End_Dt
        FROM #Open_Cases o
        LEFT OUTER JOIN #Resolved_Cases r
            ON o.Case_ID = r.Case_ID
    )

    SELECT *
    FROM Case_Statuses cs
    CROSS APPLY ( VALUES(1), (2), (3), (4), (5), (6) ) CalcAge(Age)
    CROSS APPLY ( VALUES(DATEADD(DAY, CalcAge.Age - 1, @Report_Start_Date) ) ) rd(Report_Date)
    WHERE cs.Update_Status_Begin_Date <= rd.Report_Date
        AND rd.Report_Date < cs.Update_Status_End_Dt
        AND rd.Report_Date BETWEEN @Report_Start_Date AND @Report_End_Date

    Drew

    Like the other post, the date selection range needs to be able to handle a dynamic date range...this date range will change, it is not going to remain the same and will vary by report and/or user entered values...is there a way to make the date range dynamic? (with the caveat it will skip over weekend and holidays and not include those---I can make a version of my calc age function that will return a date perhaps if it is not a weekend/holiday?)

    Drew, this solution works when I make the following change(minus the static date range):

      LEAD(o.Last_Updated_On, 1, COALESCE(r.Resolved_Date + 1, '9999-12-31')) OVER(PARTITION BY o.Case_ID ORDER BY o.Last_Updated_On) AS Update_Status_End_Dt

    EDIT: I fixed the below issue---I created the temp tables prior to running the code using SELECT... INTO
    (However, I am not sure what #Open_Cases o and #Resolved_Cases r refer to(is this a temp table)?  If so, it was throwing an error...when I changed it to the Open_Cases and Resolved_Cases(the actual physical tables), it created duplicate rows in there(that somehow now can't be deleted??), which is not what I want... Any advice on what to do with that issue? )

    Other than that it looks like the solution works properly...

  • Lynn Pettis - Friday, March 22, 2019 10:55 AM

    matter2003 - Friday, March 22, 2019 10:48 AM

    Lynn Pettis - Friday, March 22, 2019 8:14 AM

    Please post the code for the UDF.  Also, did you actually post enough sample data to cover all the cases you posted in your problem description?  It looks rather lean on that side.  I also asked you use different case ids for each of the different cases you posted.

    As requested, here is the UDF. I believe the cases I posted should be enough to cover any scenario, because additional Case_Id's will be handled in the same way.  But I see what you are saying---the solution is hard to tell if it works in all three cases because when you add data it will only be able to test one of the cases.

    So I will assume the initial data is already loaded from Case 3

    case 1:

    [Case_ID]   [Last_Updated_On] [Contact_Date] [Resolved_Date]
    US2309-2325    3/15/2019         3/15/2019         NULL              

    Case 2:

    [Case_ID]   [Last_Updated_On] [Contact_Date] [Resolved_Date]
    US2309-2326    3/15/2019         3/15/2019         NULL              
    US2309-2326    3/19/2019         3/15/2019         NULL  
    US2309-2326    3/20/2019         3/15/2019         NULL  

    CREATE FUNCTION [CalcAge]

    (

    -- Add the parameters for the function here

    @Start date,

    @End date

    )


    RETURNS int

    AS

    BEGIN

    -- Add the T-SQL statements to compute the return value here RETURN  (SELECT  DATEDIFF(dd, @Start , @End) + 1 )) -  (DATEDIFF(wk, @Start, @End) *2)  -  (SELECT  COUNT(*)  AS NumHolidays FROM Holidays WHERE Holiday_Date  BETWEEN @Start AND @End) -1END

    Holidays is a table that contains dates for holidays(there are none in this example, so that part can be removed). It counts the number of days in the holiday table between the start and end date and then subtracts them from the number of non-weekend days.

    The other two items in the calculation I don't really understand how they work, but I found this solution on stackoverflow and after checking several scenarios, it worked properly, so I used it.  Basically it excludes weekend days from the calculation of the number of days between dates(similar to the NETWORKDAYS function in Excel).  The other solutions talked about  variations of putting every possible day into a table and then checking if each day was a weekend day but I didn't want to do all that, and when I saw this solution provided a way to avoid it, I used it.

    You keep adding additional tables to the requirements.  Now we need your table Holidays if we are to actually help.  Also, I still thing you have left out data for all the cases you posted in your initial post.  If the data you have already posted is to be used for all the cases, I am at a loss.

    Lynn, you don't really need the holidays table, as I explained it is not pertinent to this as there are no holidays in the date ranges I supplied(or even testing with date ranges up to Memorial Day in the US, as there are no holidays until then), and when there are, it will automatically handle it. So it is of no real consequence to this...that part can just be removed.

    I simply made changes to the case ID's of Case 1 and Case 2 in the examples I provided. Case 3 can use the original case 3 information.
    There are 3 situations I need to check to ensure it's working properly:
    1) If the Case ID only has a single [Last_Update_On] date in the open cases and no resolved case date, it returns the dame data for each report date(the dates between the start and end dates, minus weekends)

    2) If the Case ID has multiple [Last_Updated_On] dates(ie, multiple rows for that Case_ID in the open cases table), the returned row will choose the [Last_Updated_On] date that is equal to or less than the [Report_Date]

    3) If the Case ID has a resolved Date in the Resolved Table, it should ONLY return rows with a [Report_Date] LESS THAN the [Resolved_Date] as at this point the case is not longer open, as it has been resolved.

    These are the only 3 scenarios I need to check for and the additional data will check for Scenario 1 above(Case 1 in the last post I made with the updated info) and Scenario 2 above(Case 2 in the last post I made with the updated info). Case 3 can be tested using the full data in the original post(Case 3) 

    Additionally, it needs to be able to handle Dynamic  Date ranges.  The date ranges I gave are an example, but realistically, there will be different date ranges for different reports, and users will be able to pull data based on date ranges they set, so the solution needs to be able to handle the date ranges dynamically as we will not know in advance what those will be for any given solution. However, the same 3 scenarios apply to this no matter what the date range. Once the case is resolved, no additional rows will be returned past the [Resolved_Date] no matter if a 50 year date range was put in. Similarly, until the case is resolved, it will display the row with the [Last_Updated_On] Date for all dates after it(basically just the same row with an incremented [Report_Date].

  • matter2003 - Friday, March 22, 2019 11:22 AM

    Lynn Pettis - Friday, March 22, 2019 10:55 AM

    matter2003 - Friday, March 22, 2019 10:48 AM

    Lynn Pettis - Friday, March 22, 2019 8:14 AM

    Please post the code for the UDF.  Also, did you actually post enough sample data to cover all the cases you posted in your problem description?  It looks rather lean on that side.  I also asked you use different case ids for each of the different cases you posted.

    As requested, here is the UDF. I believe the cases I posted should be enough to cover any scenario, because additional Case_Id's will be handled in the same way.  But I see what you are saying---the solution is hard to tell if it works in all three cases because when you add data it will only be able to test one of the cases.

    So I will assume the initial data is already loaded from Case 3

    case 1:

    [Case_ID]   [Last_Updated_On] [Contact_Date] [Resolved_Date]
    US2309-2325    3/15/2019         3/15/2019         NULL              

    Case 2:

    [Case_ID]   [Last_Updated_On] [Contact_Date] [Resolved_Date]
    US2309-2326    3/15/2019         3/15/2019         NULL              
    US2309-2326    3/19/2019         3/15/2019         NULL  
    US2309-2326    3/20/2019         3/15/2019         NULL  

    CREATE FUNCTION [CalcAge]

    (

    -- Add the parameters for the function here

    @Start date,

    @End date

    )


    RETURNS int

    AS

    BEGIN

    -- Add the T-SQL statements to compute the return value here RETURN  (SELECT  DATEDIFF(dd, @Start , @End) + 1 )) -  (DATEDIFF(wk, @Start, @End) *2)  -  (SELECT  COUNT(*)  AS NumHolidays FROM Holidays WHERE Holiday_Date  BETWEEN @Start AND @End) -1END

    Holidays is a table that contains dates for holidays(there are none in this example, so that part can be removed). It counts the number of days in the holiday table between the start and end date and then subtracts them from the number of non-weekend days.

    The other two items in the calculation I don't really understand how they work, but I found this solution on stackoverflow and after checking several scenarios, it worked properly, so I used it.  Basically it excludes weekend days from the calculation of the number of days between dates(similar to the NETWORKDAYS function in Excel).  The other solutions talked about  variations of putting every possible day into a table and then checking if each day was a weekend day but I didn't want to do all that, and when I saw this solution provided a way to avoid it, I used it.

    You keep adding additional tables to the requirements.  Now we need your table Holidays if we are to actually help.  Also, I still thing you have left out data for all the cases you posted in your initial post.  If the data you have already posted is to be used for all the cases, I am at a loss.

    Lynn, you don't really need the holidays table, as I explained it is not pertinent to this as there are no holidays in the date ranges I supplied(or even testing with date ranges up to Memorial Day in the US, as there are no holidays until then), and when there are, it will automatically handle it. So it is of no real consequence to this...that part can just be removed.

    I simply made changes to the case ID's of Case 1 and Case 2 in the examples I provided. Case 3 can use the original case 3 information.
    There are 3 situations I need to check to ensure it's working properly:
    1) If the Case ID only has a single [Last_Update_On] date in the open cases and no resolved case date, it returns the dame data for each report date(the dates between the start and end dates, minus weekends)

    2) If the Case ID has multiple [Last_Updated_On] dates(ie, multiple rows for that Case_ID in the open cases table), the returned row will choose the [Last_Updated_On] date that is equal to or less than the [Report_Date]

    3) If the Case ID has a resolved Date in the Resolved Table, it should ONLY return rows with a [Report_Date] LESS THAN the [Resolved_Date] as at this point the case is not longer open, as it has been resolved.

    These are the only 3 scenarios I need to check for and the additional data will check for Scenario 1 above(Case 1 in the last post I made with the updated info) and Scenario 2 above(Case 2 in the last post I made with the updated info). Case 3 can be tested using the full data in the original post(Case 3) 

    Additionally, it needs to be able to handle Dynamic  Date ranges.  The date ranges I gave are an example, but realistically, there will be different date ranges for different reports, and users will be able to pull data based on date ranges they set, so the solution needs to be able to handle the date ranges dynamically as we will not know in advance what those will be for any given solution. However, the same 3 scenarios apply to this no matter what the date range. Once the case is resolved, no additional rows will be returned past the [Resolved_Date] no matter if a 50 year date range was put in. Similarly, until the case is resolved, it will display the row with the [Last_Updated_On] Date for all dates after it(basically just the same row with an incremented [Report_Date].

    I'm done.  Others may be able to help but without everything I asked for I can't help as i am very visual and seeing what I have to start with and what i need to come up with I can figure out what needs to happen.

  • Lynn Pettis - Friday, March 22, 2019 11:39 AM

    matter2003 - Friday, March 22, 2019 11:22 AM

    Lynn Pettis - Friday, March 22, 2019 10:55 AM

    matter2003 - Friday, March 22, 2019 10:48 AM

    Lynn Pettis - Friday, March 22, 2019 8:14 AM

    Please post the code for the UDF.  Also, did you actually post enough sample data to cover all the cases you posted in your problem description?  It looks rather lean on that side.  I also asked you use different case ids for each of the different cases you posted.

    As requested, here is the UDF. I believe the cases I posted should be enough to cover any scenario, because additional Case_Id's will be handled in the same way.  But I see what you are saying---the solution is hard to tell if it works in all three cases because when you add data it will only be able to test one of the cases.

    So I will assume the initial data is already loaded from Case 3

    case 1:

    [Case_ID]   [Last_Updated_On] [Contact_Date] [Resolved_Date]
    US2309-2325    3/15/2019         3/15/2019         NULL              

    Case 2:

    [Case_ID]   [Last_Updated_On] [Contact_Date] [Resolved_Date]
    US2309-2326    3/15/2019         3/15/2019         NULL              
    US2309-2326    3/19/2019         3/15/2019         NULL  
    US2309-2326    3/20/2019         3/15/2019         NULL  

    CREATE FUNCTION [CalcAge]

    (

    -- Add the parameters for the function here

    @Start date,

    @End date

    )


    RETURNS int

    AS

    BEGIN

    -- Add the T-SQL statements to compute the return value here RETURN  (SELECT  DATEDIFF(dd, @Start , @End) + 1 )) -  (DATEDIFF(wk, @Start, @End) *2)  -  (SELECT  COUNT(*)  AS NumHolidays FROM Holidays WHERE Holiday_Date  BETWEEN @Start AND @End) -1END

    Holidays is a table that contains dates for holidays(there are none in this example, so that part can be removed). It counts the number of days in the holiday table between the start and end date and then subtracts them from the number of non-weekend days.

    The other two items in the calculation I don't really understand how they work, but I found this solution on stackoverflow and after checking several scenarios, it worked properly, so I used it.  Basically it excludes weekend days from the calculation of the number of days between dates(similar to the NETWORKDAYS function in Excel).  The other solutions talked about  variations of putting every possible day into a table and then checking if each day was a weekend day but I didn't want to do all that, and when I saw this solution provided a way to avoid it, I used it.

    You keep adding additional tables to the requirements.  Now we need your table Holidays if we are to actually help.  Also, I still thing you have left out data for all the cases you posted in your initial post.  If the data you have already posted is to be used for all the cases, I am at a loss.

    Lynn, you don't really need the holidays table, as I explained it is not pertinent to this as there are no holidays in the date ranges I supplied(or even testing with date ranges up to Memorial Day in the US, as there are no holidays until then), and when there are, it will automatically handle it. So it is of no real consequence to this...that part can just be removed.

    I simply made changes to the case ID's of Case 1 and Case 2 in the examples I provided. Case 3 can use the original case 3 information.
    There are 3 situations I need to check to ensure it's working properly:
    1) If the Case ID only has a single [Last_Update_On] date in the open cases and no resolved case date, it returns the dame data for each report date(the dates between the start and end dates, minus weekends)

    2) If the Case ID has multiple [Last_Updated_On] dates(ie, multiple rows for that Case_ID in the open cases table), the returned row will choose the [Last_Updated_On] date that is equal to or less than the [Report_Date]

    3) If the Case ID has a resolved Date in the Resolved Table, it should ONLY return rows with a [Report_Date] LESS THAN the [Resolved_Date] as at this point the case is not longer open, as it has been resolved.

    These are the only 3 scenarios I need to check for and the additional data will check for Scenario 1 above(Case 1 in the last post I made with the updated info) and Scenario 2 above(Case 2 in the last post I made with the updated info). Case 3 can be tested using the full data in the original post(Case 3) 

    Additionally, it needs to be able to handle Dynamic  Date ranges.  The date ranges I gave are an example, but realistically, there will be different date ranges for different reports, and users will be able to pull data based on date ranges they set, so the solution needs to be able to handle the date ranges dynamically as we will not know in advance what those will be for any given solution. However, the same 3 scenarios apply to this no matter what the date range. Once the case is resolved, no additional rows will be returned past the [Resolved_Date] no matter if a 50 year date range was put in. Similarly, until the case is resolved, it will display the row with the [Last_Updated_On] Date for all dates after it(basically just the same row with an incremented [Report_Date].

    I'm done.  Others may be able to help but without everything I asked for I can't help as i am very visual and seeing what I have to start with and what i need to come up with I can figure out what needs to happen.

    Lynn,

    I'm not sure what else you needed as I already gave you everything you asked for(minus the holiday part which has no bearing on the solution as I already have hat part taken care of by myself and its working).  Drew's solution is 100% working other than being able to handle a dynamic date range.

  • matter2003 - Friday, March 22, 2019 11:03 AM

    drew.allen - Friday, March 22, 2019 9:06 AM

    Part of the problem is that your UDF is a scalar function (not a table-valued function) given the way that you are calling it.  I've substituted a Table Value Constructor for your function.
    PS: Scalar functions and table-valued functions are mutually exclusive.  You cannot have a scalar table-valued function.

    As Lynn mentioned, your sample data only covers one case.  I added a second case where the case had not yet been resolved.

    I believe that this gives you what you need.  It is very loosely based on work by Itzik Ben-Gan on packing intervals.

    WITH Case_Statuses AS
    (
        SELECT
            o.Case_ID
        ,    o.Contact_Date
        ,    o.Last_Updated_On AS Update_Status_Begin_Date
        ,    LEAD(o.Last_Updated_On, 1, COALESCE(r.Resolved_Date, '9999-12-31')) OVER(PARTITION BY o.Case_ID ORDER BY o.Last_Updated_On) AS Update_Status_End_Dt
        FROM #Open_Cases o
        LEFT OUTER JOIN #Resolved_Cases r
            ON o.Case_ID = r.Case_ID
    )

    SELECT *
    FROM Case_Statuses cs
    CROSS APPLY ( VALUES(1), (2), (3), (4), (5), (6) ) CalcAge(Age)
    CROSS APPLY ( VALUES(DATEADD(DAY, CalcAge.Age - 1, @Report_Start_Date) ) ) rd(Report_Date)
    WHERE cs.Update_Status_Begin_Date <= rd.Report_Date
        AND rd.Report_Date < cs.Update_Status_End_Dt
        AND rd.Report_Date BETWEEN @Report_Start_Date AND @Report_End_Date

    Drew

    Like the other post, the date selection range needs to be able to handle a dynamic date range...this date range will change, it is not going to remain the same and will vary by report and/or user entered values...is there a way to make the date range dynamic? (with the caveat it will skip over weekend and holidays and not include those---I can make a version of my calc age function that will return a date perhaps if it is not a weekend/holiday?)

    Drew, this solution works when I make the following change(minus the static date range):

      LEAD(o.Last_Updated_On, 1, COALESCE(r.Resolved_Date + 1, '9999-12-31')) OVER(PARTITION BY o.Case_ID ORDER BY o.Last_Updated_On) AS Update_Status_End_Dt

    However, I am not sure what #Open_Cases o and #Resolved_Cases r refer to(is this a temp table)?  If so, it was throwing an error...when I changed it to the Open_Cases and Resolved_Cases(the actual physical tables), it created duplicate rows in there(that somehow now can't be deleted??), which is not what I want...

    Any advice on what to do with that issue?

    Other than that it looks like the solution works properly...

    The problem here is that you didn't try to understand what the code is doing before deploying it to your own environment.  Yes, I used temp tables.  I don't want to create permanent tables on my database just to help you with a problem, so I converted all of the permanent tables to temp tables.  I didn't think it was necessary to explicitly tell you to change them back.  So, for future reference, you'll need to change these temp tables back to your own permanent tables.

    To make the dates dynamic you'll need to replace these two lines of code
    /*  First line to replace.  */
    CROSS APPLY ( VALUES(DATEADD(DAY, CalcAge.Age - 1, @Report_Start_Date) ) ) rd(Report_Date)

    /* Second line to replace. */
      AND rd.Report_Date BETWEEN @Report_Start_Date AND @Report_End_Date

    with the following code.
    /*  First replacement line.  */
    CROSS APPLY ( VALUES(DATEADD(DAY, CalcAge.Age - 1, @Report_Start_Date) ) ) rd(Report_Date)

    /* Second replacement line. */
      AND rd.Report_Date BETWEEN @Report_Start_Date AND @Report_End_Date

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Friday, March 22, 2019 12:04 PM

    matter2003 - Friday, March 22, 2019 11:03 AM

    drew.allen - Friday, March 22, 2019 9:06 AM

    Part of the problem is that your UDF is a scalar function (not a table-valued function) given the way that you are calling it.  I've substituted a Table Value Constructor for your function.
    PS: Scalar functions and table-valued functions are mutually exclusive.  You cannot have a scalar table-valued function.

    As Lynn mentioned, your sample data only covers one case.  I added a second case where the case had not yet been resolved.

    I believe that this gives you what you need.  It is very loosely based on work by Itzik Ben-Gan on packing intervals.

    WITH Case_Statuses AS
    (
        SELECT
            o.Case_ID
        ,    o.Contact_Date
        ,    o.Last_Updated_On AS Update_Status_Begin_Date
        ,    LEAD(o.Last_Updated_On, 1, COALESCE(r.Resolved_Date, '9999-12-31')) OVER(PARTITION BY o.Case_ID ORDER BY o.Last_Updated_On) AS Update_Status_End_Dt
        FROM #Open_Cases o
        LEFT OUTER JOIN #Resolved_Cases r
            ON o.Case_ID = r.Case_ID
    )

    SELECT *
    FROM Case_Statuses cs
    CROSS APPLY ( VALUES(1), (2), (3), (4), (5), (6) ) CalcAge(Age)
    CROSS APPLY ( VALUES(DATEADD(DAY, CalcAge.Age - 1, @Report_Start_Date) ) ) rd(Report_Date)
    WHERE cs.Update_Status_Begin_Date <= rd.Report_Date
        AND rd.Report_Date < cs.Update_Status_End_Dt
        AND rd.Report_Date BETWEEN @Report_Start_Date AND @Report_End_Date

    Drew

    Like the other post, the date selection range needs to be able to handle a dynamic date range...this date range will change, it is not going to remain the same and will vary by report and/or user entered values...is there a way to make the date range dynamic? (with the caveat it will skip over weekend and holidays and not include those---I can make a version of my calc age function that will return a date perhaps if it is not a weekend/holiday?)

    Drew, this solution works when I make the following change(minus the static date range):

      LEAD(o.Last_Updated_On, 1, COALESCE(r.Resolved_Date + 1, '9999-12-31')) OVER(PARTITION BY o.Case_ID ORDER BY o.Last_Updated_On) AS Update_Status_End_Dt

    However, I am not sure what #Open_Cases o and #Resolved_Cases r refer to(is this a temp table)?  If so, it was throwing an error...when I changed it to the Open_Cases and Resolved_Cases(the actual physical tables), it created duplicate rows in there(that somehow now can't be deleted??), which is not what I want...

    Any advice on what to do with that issue?

    Other than that it looks like the solution works properly...

    The problem here is that you didn't try to understand what the code is doing before deploying it to your own environment.  Yes, I used temp tables.  I don't want to create permanent tables on my database just to help you with a problem, so I converted all of the permanent tables to temp tables.  I didn't think it was necessary to explicitly tell you to change them back.  So, for future reference, you'll need to change these temp tables back to your own permanent tables.

    To make the dates dynamic you'll need to replace these two lines of code
    /*  First line to replace.  */
    CROSS APPLY ( VALUES(DATEADD(DAY, CalcAge.Age - 1, @Report_Start_Date) ) ) rd(Report_Date)

    /* Second line to replace. */
      AND rd.Report_Date BETWEEN @Report_Start_Date AND @Report_End_Date

    with the following code.
    /*  First replacement line.  */
    CROSS APPLY ( VALUES(DATEADD(DAY, CalcAge.Age - 1, @Report_Start_Date) ) ) rd(Report_Date)

    /* Second replacement line. */
      AND rd.Report_Date BETWEEN @Report_Start_Date AND @Report_End_Date

    Drew

    Drew,

    Apologies, I actually edited the initial reply to your first post that I had solved the temp table thing.  I also added in "Drop If Exists" logic to it as well so it will recreate the temp tables when it runs.

    I'm a little confused...it appears the original code is the same as the replacement code? 

    From what I am testing, it appears the following line is what is controlling the number of days that return:

    CROSS APPLY ( VALUES(1), (2), (3), (4), (5) ) CalcAge(Age)

    When I add in (6), (7), (8), etc it continues to return an additional day for each one added---I'm unsure how to return the number of days between the start and end date there automatically(minus weekends/holidays)

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply