Gap in dates query

  • Hello All,

    I need your help in building this query.

    create table #memb
    (ID int,
    eff_dt date,
    term_dt date,
    prod varchar(5));

    insert into #memb
    select 1111 , '01/01/2016' , '12/31/2016', 'AAA'
    union all
    select 1111, '01/01/2017', '12/31/2017', 'AAA'
    union all
    select 1111, '01/01/2018', '03/31/2021',  'AAA'
    union all
    select 1112, '01/01/2017', '12/31/2017', 'AAA'
    union all
    select 1112, '01/01/2018', '05/31/2021',  'BBB'
    union all
    select 2222, '02/01/2018',  '07/31/2019',  'BBB'
    union all
    select 2222, '10/01/2019', '12/31/2019',  'BBB'
    union all
    select 2222, '01/01/2020', '02/29/2020', 'AAA'
    union all
    select 2222, '04/01/2020',  '03/31/2021', 'AAA'
    union all
    select 3333, '01/01/2020', '05/25/2020',  'AAA';

    select * from #memb;

    drop table #memb;

    expected result:

    A customer has enrolled into certain subscription and can have gaps in the service.

    1111, '01/01/2016', '03/31/2021', 'AAA' -- this customer has no gaps for three subscription periods, so eff_dt should be first line eff_date  and term date should be last record term date for this customer with the enrolled product when termed.

    1112, '01/01/2017', '05/31/2021', 'BBB' --this customer has no gaps on subscription period, but there is change in product. Should provide the product subscribed when terming the subscription.

    2222, '04/01/2020',  '03/31/2021', 'AAA' --this customer has gaps in subscription, so should display the most recent non gap period eff date and term date and most recent product subscribed.

    3333, '01/01/2020', '05/25/2020',  'AAA' -- this customer has only one record, just subscribed for a few months and cancelled, so should display just that record.

    Thank you in advance!

    edit: added tags

    • This topic was modified 3 years, 3 months ago by  ssc_san.
  • Im curious how to solve this properly, the first step might be checking for gaps with the LAG function, but all after that i can think off on the fly gets really messy and ugly.

    Also heres the insert for people from EU who MIGHT get a date conversion error and dont want to rewrite the dates by hand.

    insert into #memb
    select 1111 ,'01/01/2016' ,'31/12/2016', 'AAA'
    union all
    select 1111, '01/01/2017', '31/12/2017', 'AAA'
    union all
    select 1111, '01/01/2018', '31/03/2021', 'AAA'
    union all
    select 1112, '01/01/2017', '31/12/2017', 'AAA'
    union all
    select 1112, '01/01/2018', '31/05/2021', 'BBB'
    union all
    select 2222, '02/01/2018', '31/07/2019', 'BBB'
    union all
    select 2222, '01/10/2019', '31/12/2019', 'BBB'
    union all
    select 2222, '01/01/2020', '29/02/2020', 'AAA'
    union all
    select 2222, '01/04/2020', '31/03/2021', 'AAA'
    union all
    select 3333, '01/01/2020', '25/05/2020', 'AAA';

    select *
    --Term Date of previous Subscription
    ,LAG(term_dt) OVER(PARTITION BY ID ORDER BY EFF_DT ASC)
    --When Datediff > 1, there is a gap?
    ,GAP_CHECK=isnull(DATEDIFF(DAY
    ,LAG(term_dt) OVER(PARTITION BY ID ORDER BY EFF_DT ASC)
    ,eff_dt)
    ,1)
    from #memb
    where ID = '1111'
  • Thanks for providing consumable data.

    ssc_san wrote:

    1112, '01/01/2017', '05/31/2021', 'BBB' --this customer has no gaps on subscription period, but there is change in product. Should provide the product subscribed when terming the subscription.

    In 01/01/2017 the customer subscribed for product 'AAA'. But you want to display the latest subscribed product along with that date.

    Is it correct? Just want to make sure

    _____________
    Code for TallyGenerator

  • If my assumption is correct, then there is a solution.

    1st - let's find the latest eff_dt for each customer which is not continuous with a previous subscription period:

    select ID, MAX(eff_dt)
    from @memb m
    where not exists (select * from @memb M1
    where M1.ID = M.ID
    and M1.eff_dt < M.eff_dt
    and M1.term_dt >= dateadd(dd, -1, M.eff_dt)
    )
    group by ID

    I used table variable instead of temp table, but it does not matter here.

    Each of these records is either a fresh start or a restart point for a customer - same thing, according to the requirements.

    Then we apply the latest term_dt along with the corresponding product to the records received from the last query:

    select ID, MAX(eff_dt) from_dt, L.term_dt, L.Prod
    from @memb m
    cross apply (select top 1 Term_dt, Prod
    from @memb LR
    where LR.ID = m.ID
    order by Term_dt desc
    ) L
    where not exists (select * from @memb M1
    where M1.ID = M.ID
    and M1.eff_dt < M.eff_dt
    and M1.term_dt >= dateadd(dd, -1, M.eff_dt)
    )
    group by ID, L.term_dt, L.Prod

    Looks like it's what you need.

    • This reply was modified 3 years, 3 months ago by  Sergiy.

    _____________
    Code for TallyGenerator

  • or using Windowed functions:

    WITH Gaps
    AS
    (
    SELECT ID, eff_dt, term_dt, prod
    ,CASE
    WHEN eff_dt = DATEADD(DAY, 1, LAG(term_dt) OVER (PARTITION BY ID, Prod ORDER BY eff_dt))
    THEN 0
    ELSE 1
    END AS Gap
    FROM #memb
    )
    ,Grps
    AS
    (
    SELECT ID, eff_dt, term_dt, prod
    ,SUM(Gap) OVER (PARTITION BY ID, Prod ORDER BY eff_dt) AS Grp
    FROM Gaps
    )
    ,Terms
    AS
    (
    SELECT ID
    ,MIN(eff_dt) AS eff_dt
    ,MAX(term_dt) AS term_dt
    ,prod
    FROM Grps
    GROUP BY ID, prod, Grp
    )
    ,RNs
    AS
    (
    SELECT ID, eff_dt, term_dt, prod
    ,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY eff_dt DESC) AS rn
    FROM Terms
    )
    SELECT ID, eff_dt, term_dt, prod
    FROM RNs
    WHERE rn = 1;
  • Yes, if there are no gaps I have to show the initial eff date and most recent term date with product subscribed when subscription termed.

    Thank you!

  • Thank you! You all helped me to learn something new and with the solution.

  • I think Ken wrote his by product before the requirements were clarified, so it misses 1112, '01/01/2017', '05/31/2021', 'BBB' . Oddly, I wrote an R solution in about 3 minutes, then had to test my waning SQL Server skills. We do so much quick-direction-change stuff with public data that I rip stuff with R to build quick Excel calculators that people love. Anyway, SQL Server will always by my first love, even as memory fades. I tried to step through  my logical progression and see most bits separately. I'm sure there are more efficient (code writing and performance) ways to do this.

    with cte_Combine_Next_Record as -- Look at previous and current records side by side
    (
    select ID
    , LAG(eff_dt) OVER(PARTITION BY ID ORDER BY EFF_DT ASC) as previous_eff_dt
    , LAG(term_dt) OVER(PARTITION BY ID ORDER BY EFF_DT ASC) as previous_term_dt
    , LAG(prod) OVER(PARTITION BY ID ORDER BY EFF_DT ASC) as previous_prod
    , eff_dt
    , term_dt
    , prod
    , count(*) OVER(PARTITION BY ID) as total_records
    from #memb
    --order by ID, eff_dt, prod;
    ), cte_Find_Gaps as -- just adding a comparison flag (Gap_Found) between the previous term_dt and the current eff_date
    (
    select *
    , case when previous_eff_dt is null or DATEDIFF(D, previous_term_dt, eff_dt) = 1
    then 0
    else 1
    end as Gap_Found
    from cte_Combine_Next_Record
    --order by ID, eff_dt, prod;
    ), cte_Find_Groups as -- Using the comparison flag (Gap_Found) created above to number the groups of continuous subscription dates.
    -- I drop records with no previous record since the previous dates. "previous_eff_dt is not null"
    -- Note the specific inclusion of singular records for case like ID 3333. "total_records = 1"
    (
    select *
    , sum(Gap_Found) over(PARTITION BY ID ORDER BY EFF_DT ASC rows unbounded preceding) as Group_Number
    from cte_Find_Gaps
    where total_records = 1 or previous_eff_dt is not null
    ), cte_Find_Continuous_Periods as -- Here, aggregate across ID and Group_Number to get the full continous period range (eff_dt to term_dt)
    -- Also, number the groups using Row_Number() to find the Latest_Sbuscription
    (
    select ID, Group_Number
    , min(case when total_records=1 or Gap_Found=1 then eff_dt else previous_eff_dt end) as eff_dt
    , max(term_dt) as term_dt
    , ROW_NUMBER() over(PARTITION BY ID ORDER BY Group_Number desc) as Latest_Subscription
    from cte_Find_Groups
    group by ID, Group_Number
    --order by ID, eff_dt
    ) -- Now join each set of continous periods back to the groups to get the prod matching the latest eff_dt.
    -- Also, filtering for the Latest_Subscription.
    select cte_Find_Continuous_Periods.ID
    , cte_Find_Continuous_Periods.eff_dt
    , cte_Find_Continuous_Periods.term_dt
    , cte_Find_Groups.prod
    from cte_Find_Continuous_Periods inner join cte_Find_Groups
    on cte_Find_Continuous_Periods.ID = cte_Find_Groups.ID
    and cte_Find_Continuous_Periods.term_dt = cte_Find_Groups.term_dt
    where cte_Find_Continuous_Periods.Latest_Subscription = 1
    order by ID, eff_dt, prod;
  • This was removed by the editor as SPAM

  • Thank you jschmidt! I combined Sergiy & Ken's solution to get what I needed. You query laid it out in detail.

    Thank you all!

  • This was removed by the editor as SPAM

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply