The internet is full of examples of how to find gaps in date ranges but I need to create gaps in a table of date ranges.
I have a table of date ranges
IF OBJECT_ID('tempdb.dbo.#DateRanges') IS NOT NULL DROP TABLE #DateRanges
create table #DateRanges ( id char(15), SubType char(15), effdate datetime, termdate datetime )
insert into #DateRanges ( id, SubType, effdate, termdate )
values ('ABC00001', 'BASIC' , '2020-06-01', '2020-06-08')
, ('ABC00001', 'ROAM' , '2020-06-09', '2020-06-11')
, ('ABC00001', 'BASIC' , '2020-06-12', '2078-12-31')
, ('ABC00002', 'ENHANCED' , '2019-06-01', '2019-09-30')
, ('ABC00002', 'SPECIAL' , '2019-10-01', '2019-11-30')
, ('ABC00002', 'ENHANCED' , '2019-12-01', '2078-12-31')
And a table of gap ranges
IF OBJECT_ID('tempdb.dbo.#Gaps') IS NOT NULL DROP TABLE #Gaps
CREATE TABLE #Gaps ( id char(15), effdate datetime, termdate datetime)
insert into #Gaps ( id, effdate, termdate)
values('ABC00001', '2020-06-10', '2020-06-11')
,('ABC00001', '2020-12-16', '2020-12-30')
,('ABC00002', '2020-01-01', '2020-01-02')
I need to exclude the ranges from the #Gaps table from the #DateRanges table. The expected result should be:
ABC00001 BASIC 2020-06-01 2020-06-08
ABC00001 ROAM 2020-06-09 2020-06-09
ABC00001 BASIC 2020-06-12 2020-12-15
ABC00001 BASIC 2020-12-31 2078-12-31
ABC00002 ENHANCED 2019-06-01 2019-09-30
ABC00002 SPECIAL 2019-10-01 2019-11-30
ABC00002 ENHANCED 2019-12-01 2019-12-31
ABC00002 ENHANCED 2020-01-03 2078-12-31
One "brute force" way to do this would be to expand all the date ranges and use the set operator EXCEPT to remove the (also expanded) gaps. Then the results could be grouped by splitting the calculated dates into contiguous ranges. The query uses a partial CROSS JOIN between the id in #Gap and unique combinations of (id, SubType) from #DateRanges. It takes about 4 to 5 seconds to execute on my test instance so it's definitely not efficiently written. Something like this
drop table if exists #DateRanges;
go
create table #DateRanges ( id char(15), SubType char(15), effdate datetime, termdate datetime )
insert into #DateRanges ( id, SubType, effdate, termdate )
values ('ABC00001', 'BASIC' , '2020-06-01', '2020-06-08')
, ('ABC00001', 'ROAM' , '2020-06-09', '2020-06-11')
, ('ABC00001', 'BASIC' , '2020-06-12', '2078-12-31')
, ('ABC00002', 'ENHANCED' , '2019-06-01', '2019-09-30')
, ('ABC00002', 'SPECIAL' , '2019-10-01', '2019-11-30')
, ('ABC00002', 'ENHANCED' , '2019-12-01', '2078-12-31');
drop table if exists #Gaps;
go
create table #Gaps ( id char(15), effdate datetime, termdate datetime)
insert into #Gaps ( id, effdate, termdate)
values('ABC00001', '2020-06-10', '2020-06-11')
,('ABC00001', '2020-12-16', '2020-12-30')
,('ABC00002', '2020-01-01', '2020-01-02');
with
unq_subtypes_cte(id, SubType) as (
select distinct id, SubType from #DateRanges),
minus_cte as (
/* expand date ranges */
select dr.id, dr.SubType, dt.calc_dt
from #DateRanges dr
cross apply dbo.fnTally(0, datediff(day, dr.effdate, dr.termdate)) fn
cross apply (values (dateadd(day, fn.n, dr.effdate))) dt(calc_dt)
except
/* expand gaps */
select usc.id, usc.SubType, dt.calc_dt
from #Gaps g
join unq_subtypes_cte usc on g.id=usc.id /* partial cross join */
cross apply dbo.fnTally(0, datediff(day, g.effdate, g.termdate)) fn
cross apply (values (dateadd(day, fn.n, g.effdate))) dt(calc_dt)),
split_cte as (
select *, case when datediff(day, calc_dt, lag(calc_dt, 1, 1)
over (partition by id, SubType
order by calc_dt))<>-1
then 1 else 0 end split
from minus_cte),
grp_cte as (
select *, sum(split) over (partition by id, SubType order by calc_dt) grp
from split_cte)
select id, SubType, cast(min(calc_dt) as date) effdate, cast(max(calc_dt) as date) termdate
from grp_cte
group by id, SubType, grp
order by id, effdate;
idSubTypeeffdatetermdate
ABC00001 BASIC 2020-06-012020-06-08
ABC00001 ROAM 2020-06-092020-06-09
ABC00001 BASIC 2020-06-122020-12-15
ABC00001 BASIC 2020-12-312078-12-31
ABC00002 ENHANCED 2019-06-012019-09-30
ABC00002 SPECIAL 2019-10-012019-11-30
ABC00002 ENHANCED 2019-12-012019-12-31
ABC00002 ENHANCED 2020-01-032078-12-31
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
February 23, 2021 at 2:20 pm
I completely forgot about trying the EXCEPT route. The method I was trying is similar to what you have by expanding the date ranges but my method of remerging the date ranges was taking forever. So your inefficient brute force method is still much faster that what I was coming up with.
February 25, 2021 at 1:22 pm
If you can get the logic right windowed functions should be more efficient.
The following works with your test data:
WITH AllRanges
AS
(
SELECT id, SubType, effdate, termdate
FROM #DateRanges
UNION ALL
SELECT id, 'Gap', effdate, termdate
FROM #Gaps
)
,RangeLeadLags
AS
(
SELECT id, SubType, effdate, termdate
,LEAD(effdate, 1, '20781231') OVER (PARTITION BY id ORDER BY effdate) AS NextEffDate
,LAG(termdate) OVER (PARTITION BY id ORDER BY effdate) AS PrevTermDate
,LAG(SubType) OVER (PARTITION BY id ORDER BY effdate) AS PrevSubType
FROM AllRanges
)
SELECT id
,CASE
WHEN SubType = 'Gap'
AND PrevTermDate = '20781231'
THEN PrevSubType
ELSE SubType
END AS SubType
,CASE
WHEN SubType = 'Gap'
AND PrevTermDate = '20781231'
THEN DATEADD(day, 1, termdate)
ELSE effdate
END AS effdate
,CASE
WHEN SubType = 'Gap'
AND PrevTermDate = '20781231'
THEN PrevTermDate
WHEN termdate > DATEADD(day, -1, NextEffDate)
THEN DATEADD(day, -1, NextEffDate)
ELSE termdate
END AS termdate
FROM RangeLeadLags
WHERE SubType <> 'Gap'
OR PrevTermDate = '20781231'
ORDER BY id, effdate;
February 25, 2021 at 4:51 pm
The idea with the above is eyeballing the following and seeing what you need to get.
SELECT id, SubType, effdate, termdate
FROM #DateRanges
UNION ALL
SELECT id, 'Gap', effdate, termdate
FROM #Gaps
ORDER BY id, effdate
So if the real data has, say, multiple gaps at the end of an id so the following can be added to the test data:
INSERT INTO #Gaps ( id, effdate, termdate)
VALUES ('ABC00001', '2021-01-10', '2021-01-20');
Then the query could be adjusted to something like:
WITH AllRanges
AS
(
SELECT id, SubType, effdate, termdate
,CONVERT(varchar(23), termdate, 112) + SubType AS TermDateSubType
FROM #DateRanges
UNION ALL
SELECT id, 'Gap', effdate, termdate
,CONVERT(varchar(23), termdate, 112) + 'Gap'
FROM #Gaps
)
,RangeLeadLags
AS
(
SELECT id, SubType, effdate, termdate
,LEAD(effdate, 1, '20781231') OVER (PARTITION BY id ORDER BY effdate) AS NextEffDate
,MAX(TermDateSubType) OVER (PARTITION BY id ORDER BY effdate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS MaxTermDateSubType
FROM AllRanges
)
SELECT id
,CASE
WHEN SubType = 'Gap'
AND MaxTermDateSubType LIKE '20781231%'
THEN SUBSTRING(MaxTermDateSubType, 9, 15)
ELSE SubType
END AS SubType
,CASE
WHEN SubType = 'Gap'
AND MaxTermDateSubType LIKE '20781231%'
THEN DATEADD(day, 1, termdate)
ELSE effdate
END AS effdate
,CASE
WHEN SubType = 'Gap'
AND MaxTermDateSubType LIKE '20781231%'
AND NextEffDate = '20781231'
THEN LEFT(MaxTermDateSubType, 8)
WHEN SubType = 'Gap'
AND MaxTermDateSubType LIKE '20781231%'
THEN DATEADD(day, -1, NextEffDate)
WHEN termdate > DATEADD(day, -1, NextEffDate)
THEN DATEADD(day, -1, NextEffDate)
ELSE termdate
END AS termdate
FROM RangeLeadLags
WHERE SubType <> 'Gap'
OR MaxTermDateSubType LIKE '20781231%'
ORDER BY id, effdate;
etc
February 26, 2021 at 10:01 pm
Ken, that is some pretty neat code. I had to tweak it just a bit to support a scenario that I forgot to include in my original post.
The extra scenario is for ID of ABC00003. Where the max term date is not 2078-12-31 and the gap is from 2021-01-01 to 2021-01-28. I have included the code below that works for all of these scenarios.
Also, I have tested this with over 1000 date ranges which in my environment is a normal amount of dates and it runs in milliseconds.
IF OBJECT_ID('tempdb.dbo.#DateRanges') IS NOT NULL DROP TABLE #DateRanges
create table #DateRanges ( id char(15), SubType char(15), effdate datetime, termdate datetime )
insert into #DateRanges ( id, SubType, effdate, termdate )
values
('ABC00001', 'BASIC' , '2020-06-01', '2020-06-08')
, ('ABC00001', 'ROAM' , '2020-06-09', '2020-06-11')
, ('ABC00001', 'BASIC' , '2020-06-12', '2078-12-31')
, ('ABC00002', 'ENHANCED' , '2019-06-01', '2019-09-30')
, ('ABC00002', 'SPECIAL' , '2019-10-01', '2019-11-30')
, ('ABC00002', 'ENHANCED' , '2019-12-01', '2078-12-31')
, ('ABC00003', 'BASIC' , '2018-01-01', '2021-02-28')
IF OBJECT_ID('tempdb.dbo.#Gaps') IS NOT NULL DROP TABLE #Gaps
CREATE TABLE #Gaps ( id char(15), effdate datetime, termdate datetime)
insert into #Gaps ( id, effdate, termdate)
values
('ABC00001', '2020-06-10', '2020-06-11')
,('ABC00001', '2020-12-16', '2020-12-30')
,('ABC00001', '2021-01-10', '2021-01-20')
,('ABC00002', '2020-01-01', '2020-01-02')
,('ABC00003', '2021-01-01', '2021-01-28')
;WITH AllRanges
AS
(
SELECT SubType, id, effdate, termdate
,CONVERT(varchar(23), termdate, 112) + SubType AS TermDateSubType
FROM #DateRanges
UNION ALL
SELECT 'Gap', id, effdate, termdate
,CONVERT(varchar(23), termdate, 112) + 'Gap'
FROM #Gaps
), PreRangeLeadLags AS
(
SELECT SubType, id, effdate, termdate
,LEAD(effdate, 1, '2078-12-31') OVER (PARTITION BY id ORDER BY effdate) AS NextEffDate
,MAX(TermDateSubType) OVER (PARTITION BY id ORDER BY effdate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS MaxTermDateSubType
,MAX(TermDate) OVER (PARTITION BY id ORDER BY effdate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS MaxTermDate
FROM AllRanges
), RangeLeadLags AS
(
SELECT SubType, id, effdate, termdate
, CASE WHEN NextEffDate = '2078-12-31' AND MaxTermDate IS NOT NULL THEN MaxTermDate ELSE NextEffDate END AS NextEffDate
, MaxTermDate
, MaxTermdateSubType
FROM PreRangeLeadLags
), PreOut AS
(
SELECT rll.id
,CASE WHEN rll.SubType = 'Gap' AND rll.MaxTermDateSubType IS NOT NULL
THEN SUBSTRING(rll.MaxTermDateSubType, 9, 15)
ELSE rll.SubType
END AS SubType
,CASE WHEN rll.SubType = 'Gap' AND rll.MaxTermDateSubType IS NOT NULL
THEN DATEADD(day, 1, rll.termdate)
ELSE rll.effdate
END AS effdate
,CASE WHEN rll.SubType = 'Gap' AND MaxTermDateSubType IS NOT NULL AND rll.NextEffDate = rll.MaxTermDate
THEN LEFT(rll.MaxTermDateSubType, 8)
WHEN rll.SubType = 'Gap' AND rll.MaxTermDateSubType IS NOT NULL
THEN DATEADD(day, -1, rll.NextEffDate)
WHEN rll.termdate > DATEADD(day, -1, rll.NextEffDate)
THEN DATEADD(day, -1, rll.NextEffDate)
ELSE rll.termdate
END AS termdate
FROM RangeLeadLags AS rll
WHERE (SubType <> 'Gap'
OR MaxTermDateSubType IS NOT NULL)
)
SELECT id, SubType, effdate, termdate
FROM PreOut
WHERE effdate <= TermDate
ORDER BY id, effdate;
Thank you for your time on getting me pointed in the right direction.
February 27, 2021 at 2:18 am
That's some very clever code on the part of both. The only trouble with that code is that it contains date literals which are based on the content of the date range table and must be known in advance.
It's certainly fast enough... if you were to do a single scan for a max TermDate and dump it into a variable, I don't believe that would slow things down much at all and then there would be nothing manual about it at all.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 27, 2021 at 5:37 am
In my company's case we use that date of 2078-12-31 as the maximum date because there are fields in some tables that are defined as SMALLDATETIME and since the maximum date of that data type is 2079-06-06 so they went with the largest full year.
So if anybody else wants to use this code you definitely want to be aware of this in the code and change accordingly as you mentioned.
February 27, 2021 at 11:59 am
Glad to be of help.
The problem with the windowed function approach is it being more difficult to anticipate data combinations which could break the rules. A brut force approach of unwrapping the date ranges, excluding the gaps with NOT EXISTS and then wrapping up the resulting ranges is easier to get right and test. I suppose the approach to take depends on what you want to achieve. An overnight batch processes is unlikely to be time or resource critical, so is more suited to the brute force approach, whereas a stored procedure called by an application probably needs to be as efficient as possible so windowed functions are more appropriate.
February 27, 2021 at 8:10 pm
In my company's case we use that date of 2078-12-31 as the maximum date because there are fields in some tables that are defined as SMALLDATETIME and since the maximum date of that data type is 2079-06-06 so they went with the largest full year.
So if anybody else wants to use this code you definitely want to be aware of this in the code and change accordingly as you mentioned.
Ah... great feedback. I certain was wondering. Thanks.
If the reason they did this was to conserve space, consider using the DATE datatype. It only occupies 3 bytes. If you need "time" in your entries, then I'd just use the DATETIME datatype because the DATE, TIME, and DATETIME2() datatypes are relatively crippled (calculation wise) compared to DATETIME (or SMALLDATETIME).
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply