Need to create gaps in a table of date ranges

  • The internet is full of examples of how to find gaps in date ranges but I need to create gaps in a table of date ranges.

    I have a table of date ranges

    IF OBJECT_ID('tempdb.dbo.#DateRanges') IS NOT NULL DROP TABLE #DateRanges
    create table #DateRanges ( id char(15), SubType char(15), effdate datetime, termdate datetime )
    insert into #DateRanges ( id, SubType, effdate, termdate )
    values ('ABC00001', 'BASIC' , '2020-06-01', '2020-06-08')
    , ('ABC00001', 'ROAM' , '2020-06-09', '2020-06-11')
    , ('ABC00001', 'BASIC' , '2020-06-12', '2078-12-31')
    , ('ABC00002', 'ENHANCED' , '2019-06-01', '2019-09-30')
    , ('ABC00002', 'SPECIAL' , '2019-10-01', '2019-11-30')
    , ('ABC00002', 'ENHANCED' , '2019-12-01', '2078-12-31')

    And a table of gap ranges

    IF OBJECT_ID('tempdb.dbo.#Gaps') IS NOT NULL DROP TABLE #Gaps
    CREATE TABLE #Gaps ( id char(15), effdate datetime, termdate datetime)
    insert into #Gaps ( id, effdate, termdate)
    values('ABC00001', '2020-06-10', '2020-06-11')
    ,('ABC00001', '2020-12-16', '2020-12-30')
    ,('ABC00002', '2020-01-01', '2020-01-02')

    I need to exclude the ranges from the #Gaps table from the #DateRanges table. The expected result should be:

    ABC00001 BASIC 2020-06-01 2020-06-08

    ABC00001 ROAM 2020-06-09 2020-06-09

    ABC00001 BASIC 2020-06-12 2020-12-15

    ABC00001 BASIC 2020-12-31 2078-12-31

    ABC00002 ENHANCED 2019-06-01 2019-09-30

    ABC00002 SPECIAL 2019-10-01 2019-11-30

    ABC00002 ENHANCED 2019-12-01 2019-12-31

    ABC00002 ENHANCED 2020-01-03 2078-12-31

     

  • One "brute force" way to do this would be to expand all the date ranges and use the set operator EXCEPT to remove the (also expanded) gaps.  Then the results could be grouped by splitting the calculated dates into contiguous ranges.  The query uses a partial CROSS JOIN between the id in #Gap and unique combinations of (id, SubType) from #DateRanges.  It takes about 4 to  5 seconds to execute on my test instance so it's definitely not efficiently written.  Something like this

    drop table if exists #DateRanges;
    go
    create table #DateRanges ( id char(15), SubType char(15), effdate datetime, termdate datetime )
    insert into #DateRanges ( id, SubType, effdate, termdate )
    values ('ABC00001', 'BASIC' , '2020-06-01', '2020-06-08')
    , ('ABC00001', 'ROAM' , '2020-06-09', '2020-06-11')
    , ('ABC00001', 'BASIC' , '2020-06-12', '2078-12-31')
    , ('ABC00002', 'ENHANCED' , '2019-06-01', '2019-09-30')
    , ('ABC00002', 'SPECIAL' , '2019-10-01', '2019-11-30')
    , ('ABC00002', 'ENHANCED' , '2019-12-01', '2078-12-31');

    drop table if exists #Gaps;
    go
    create table #Gaps ( id char(15), effdate datetime, termdate datetime)
    insert into #Gaps ( id, effdate, termdate)
    values('ABC00001', '2020-06-10', '2020-06-11')
    ,('ABC00001', '2020-12-16', '2020-12-30')
    ,('ABC00002', '2020-01-01', '2020-01-02');

    with
    unq_subtypes_cte(id, SubType) as (
    select distinct id, SubType from #DateRanges),
    minus_cte as (
    /* expand date ranges */
    select dr.id, dr.SubType, dt.calc_dt
    from #DateRanges dr
    cross apply dbo.fnTally(0, datediff(day, dr.effdate, dr.termdate)) fn
    cross apply (values (dateadd(day, fn.n, dr.effdate))) dt(calc_dt)
    except
    /* expand gaps */
    select usc.id, usc.SubType, dt.calc_dt
    from #Gaps g
    join unq_subtypes_cte usc on g.id=usc.id /* partial cross join */
    cross apply dbo.fnTally(0, datediff(day, g.effdate, g.termdate)) fn
    cross apply (values (dateadd(day, fn.n, g.effdate))) dt(calc_dt)),
    split_cte as (
    select *, case when datediff(day, calc_dt, lag(calc_dt, 1, 1)
    over (partition by id, SubType
    order by calc_dt))<>-1
    then 1 else 0 end split
    from minus_cte),
    grp_cte as (
    select *, sum(split) over (partition by id, SubType order by calc_dt) grp
    from split_cte)
    select id, SubType, cast(min(calc_dt) as date) effdate, cast(max(calc_dt) as date) termdate
    from grp_cte
    group by id, SubType, grp
    order by id, effdate;
    idSubTypeeffdatetermdate
    ABC00001 BASIC 2020-06-012020-06-08
    ABC00001 ROAM 2020-06-092020-06-09
    ABC00001 BASIC 2020-06-122020-12-15
    ABC00001 BASIC 2020-12-312078-12-31
    ABC00002 ENHANCED 2019-06-012019-09-30
    ABC00002 SPECIAL 2019-10-012019-11-30
    ABC00002 ENHANCED 2019-12-012019-12-31
    ABC00002 ENHANCED 2020-01-032078-12-31

     

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • I completely forgot about trying the EXCEPT route. The method I was trying is similar to what you have by expanding the date ranges but my method of remerging the date ranges was taking forever.  So your inefficient brute force method is still much faster that what I was coming up with.

     

     

  • If you can get the logic right windowed functions should be more efficient.

    The following works with your test data:

    WITH AllRanges
    AS
    (
    SELECT id, SubType, effdate, termdate
    FROM #DateRanges
    UNION ALL
    SELECT id, 'Gap', effdate, termdate
    FROM #Gaps
    )
    ,RangeLeadLags
    AS
    (
    SELECT id, SubType, effdate, termdate
    ,LEAD(effdate, 1, '20781231') OVER (PARTITION BY id ORDER BY effdate) AS NextEffDate
    ,LAG(termdate) OVER (PARTITION BY id ORDER BY effdate) AS PrevTermDate
    ,LAG(SubType) OVER (PARTITION BY id ORDER BY effdate) AS PrevSubType
    FROM AllRanges
    )
    SELECT id
    ,CASE
    WHEN SubType = 'Gap'
    AND PrevTermDate = '20781231'
    THEN PrevSubType
    ELSE SubType
    END AS SubType
    ,CASE
    WHEN SubType = 'Gap'
    AND PrevTermDate = '20781231'
    THEN DATEADD(day, 1, termdate)
    ELSE effdate
    END AS effdate
    ,CASE
    WHEN SubType = 'Gap'
    AND PrevTermDate = '20781231'
    THEN PrevTermDate
    WHEN termdate > DATEADD(day, -1, NextEffDate)
    THEN DATEADD(day, -1, NextEffDate)
    ELSE termdate
    END AS termdate
    FROM RangeLeadLags
    WHERE SubType <> 'Gap'
    OR PrevTermDate = '20781231'
    ORDER BY id, effdate;
  • The idea with the above is eyeballing the following and seeing what you need to get.

    SELECT id, SubType, effdate, termdate
    FROM #DateRanges
    UNION ALL
    SELECT id, 'Gap', effdate, termdate
    FROM #Gaps
    ORDER BY id, effdate

    So if the real data has, say, multiple gaps at the end of an id so the following can be added to the test data:

    INSERT INTO #Gaps ( id, effdate, termdate)
    VALUES ('ABC00001', '2021-01-10', '2021-01-20');

    Then the query could be adjusted to something like:

    WITH AllRanges
    AS
    (
    SELECT id, SubType, effdate, termdate
    ,CONVERT(varchar(23), termdate, 112) + SubType AS TermDateSubType
    FROM #DateRanges
    UNION ALL
    SELECT id, 'Gap', effdate, termdate
    ,CONVERT(varchar(23), termdate, 112) + 'Gap'
    FROM #Gaps
    )
    ,RangeLeadLags
    AS
    (
    SELECT id, SubType, effdate, termdate
    ,LEAD(effdate, 1, '20781231') OVER (PARTITION BY id ORDER BY effdate) AS NextEffDate
    ,MAX(TermDateSubType) OVER (PARTITION BY id ORDER BY effdate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS MaxTermDateSubType
    FROM AllRanges
    )
    SELECT id
    ,CASE
    WHEN SubType = 'Gap'
    AND MaxTermDateSubType LIKE '20781231%'
    THEN SUBSTRING(MaxTermDateSubType, 9, 15)
    ELSE SubType
    END AS SubType
    ,CASE
    WHEN SubType = 'Gap'
    AND MaxTermDateSubType LIKE '20781231%'
    THEN DATEADD(day, 1, termdate)
    ELSE effdate
    END AS effdate
    ,CASE
    WHEN SubType = 'Gap'
    AND MaxTermDateSubType LIKE '20781231%'
    AND NextEffDate = '20781231'
    THEN LEFT(MaxTermDateSubType, 8)
    WHEN SubType = 'Gap'
    AND MaxTermDateSubType LIKE '20781231%'
    THEN DATEADD(day, -1, NextEffDate)
    WHEN termdate > DATEADD(day, -1, NextEffDate)
    THEN DATEADD(day, -1, NextEffDate)
    ELSE termdate
    END AS termdate
    FROM RangeLeadLags
    WHERE SubType <> 'Gap'
    OR MaxTermDateSubType LIKE '20781231%'
    ORDER BY id, effdate;

    etc

  • Ken, that is some pretty neat code.  I had to tweak it just a bit to support a scenario that I forgot to include in my original post.

    The extra scenario is for ID of ABC00003. Where the max term date is not 2078-12-31 and the gap is from 2021-01-01 to 2021-01-28. I have included the code below that works for all of these scenarios.

    Also, I have tested this with over 1000 date ranges which in my environment is a normal amount of dates and it runs in milliseconds.

    IF OBJECT_ID('tempdb.dbo.#DateRanges') IS NOT NULL DROP TABLE #DateRanges
    create table #DateRanges ( id char(15), SubType char(15), effdate datetime, termdate datetime )
    insert into #DateRanges ( id, SubType, effdate, termdate )
    values
    ('ABC00001', 'BASIC' , '2020-06-01', '2020-06-08')
    , ('ABC00001', 'ROAM' , '2020-06-09', '2020-06-11')
    , ('ABC00001', 'BASIC' , '2020-06-12', '2078-12-31')
    , ('ABC00002', 'ENHANCED' , '2019-06-01', '2019-09-30')
    , ('ABC00002', 'SPECIAL' , '2019-10-01', '2019-11-30')
    , ('ABC00002', 'ENHANCED' , '2019-12-01', '2078-12-31')
    , ('ABC00003', 'BASIC' , '2018-01-01', '2021-02-28')

    IF OBJECT_ID('tempdb.dbo.#Gaps') IS NOT NULL DROP TABLE #Gaps
    CREATE TABLE #Gaps ( id char(15), effdate datetime, termdate datetime)
    insert into #Gaps ( id, effdate, termdate)
    values
    ('ABC00001', '2020-06-10', '2020-06-11')
    ,('ABC00001', '2020-12-16', '2020-12-30')
    ,('ABC00001', '2021-01-10', '2021-01-20')
    ,('ABC00002', '2020-01-01', '2020-01-02')
    ,('ABC00003', '2021-01-01', '2021-01-28')

    ;WITH AllRanges
    AS
    (
    SELECT SubType, id, effdate, termdate
    ,CONVERT(varchar(23), termdate, 112) + SubType AS TermDateSubType
    FROM #DateRanges
    UNION ALL
    SELECT 'Gap', id, effdate, termdate
    ,CONVERT(varchar(23), termdate, 112) + 'Gap'
    FROM #Gaps
    ), PreRangeLeadLags AS
    (
    SELECT SubType, id, effdate, termdate
    ,LEAD(effdate, 1, '2078-12-31') OVER (PARTITION BY id ORDER BY effdate) AS NextEffDate
    ,MAX(TermDateSubType) OVER (PARTITION BY id ORDER BY effdate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS MaxTermDateSubType
    ,MAX(TermDate) OVER (PARTITION BY id ORDER BY effdate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS MaxTermDate
    FROM AllRanges
    ), RangeLeadLags AS
    (
    SELECT SubType, id, effdate, termdate
    , CASE WHEN NextEffDate = '2078-12-31' AND MaxTermDate IS NOT NULL THEN MaxTermDate ELSE NextEffDate END AS NextEffDate
    , MaxTermDate
    , MaxTermdateSubType
    FROM PreRangeLeadLags
    ), PreOut AS
    (
    SELECT rll.id
    ,CASE WHEN rll.SubType = 'Gap' AND rll.MaxTermDateSubType IS NOT NULL
    THEN SUBSTRING(rll.MaxTermDateSubType, 9, 15)
    ELSE rll.SubType
    END AS SubType
    ,CASE WHEN rll.SubType = 'Gap' AND rll.MaxTermDateSubType IS NOT NULL
    THEN DATEADD(day, 1, rll.termdate)
    ELSE rll.effdate
    END AS effdate
    ,CASE WHEN rll.SubType = 'Gap' AND MaxTermDateSubType IS NOT NULL AND rll.NextEffDate = rll.MaxTermDate
    THEN LEFT(rll.MaxTermDateSubType, 8)
    WHEN rll.SubType = 'Gap' AND rll.MaxTermDateSubType IS NOT NULL
    THEN DATEADD(day, -1, rll.NextEffDate)
    WHEN rll.termdate > DATEADD(day, -1, rll.NextEffDate)
    THEN DATEADD(day, -1, rll.NextEffDate)
    ELSE rll.termdate
    END AS termdate
    FROM RangeLeadLags AS rll
    WHERE (SubType <> 'Gap'
    OR MaxTermDateSubType IS NOT NULL)
    )
    SELECT id, SubType, effdate, termdate
    FROM PreOut
    WHERE effdate <= TermDate
    ORDER BY id, effdate;

    Thank you for your time on getting me pointed in the right direction.

     

  • That's some very clever code on the part of both.  The only trouble with that code is that it contains date literals which are based on the content of the date range table and must be known in advance.

    It's certainly fast enough... if you were to do a single scan for a max TermDate and dump it into a variable, I don't believe that would slow things down much at all and then there would be nothing manual about it at all.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • In my company's case we use that date of 2078-12-31 as the maximum date because there are fields in some tables that are defined as SMALLDATETIME and since the maximum date of that data type is 2079-06-06 so they went with the largest full year.

    So if anybody else wants to use this code you definitely want to be aware of this in the code and change accordingly as you mentioned.

  • Glad to be of help.

    The problem with the windowed function approach is it being more difficult to anticipate data combinations which could break the rules. A brut force approach of unwrapping the date ranges, excluding the gaps with NOT EXISTS and then wrapping up the resulting ranges is easier to get right and test. I suppose the approach to take depends on what you want to achieve. An overnight batch processes is unlikely to be time or resource critical, so is more suited to the brute force approach, whereas a stored procedure called by an application probably needs to be as efficient as possible so windowed functions are more appropriate.

    • This reply was modified 3 years, 10 months ago by  Ken McKelvey.
  • kherald69 wrote:

    In my company's case we use that date of 2078-12-31 as the maximum date because there are fields in some tables that are defined as SMALLDATETIME and since the maximum date of that data type is 2079-06-06 so they went with the largest full year.

    So if anybody else wants to use this code you definitely want to be aware of this in the code and change accordingly as you mentioned.

    Ah... great feedback.  I certain was wondering.  Thanks.

    If the reason they did this was to conserve space, consider using the DATE datatype.  It only occupies 3 bytes.  If you need "time" in your entries, then I'd just use the DATETIME datatype because the DATE, TIME, and DATETIME2() datatypes are relatively crippled (calculation wise) compared to DATETIME (or SMALLDATETIME).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply