April 22, 2019 at 5:08 pm
I need to look through 200,000 IDs with multiple dates and identify just the ID’s that have a gap. My first step was going to be to pull all the dates together to see gaps, but all I did was get the Min and Max and ignored the gaps. How can I identify something that isn’t there And then somehow show ID and missing date.
I need to look through 200,000 IDs with multiple dates and identify just the ID’s that have a gap. My first step was going to be to pull all the dates together to see gaps, but all I did was get the Min and Max and ignored the gaps. How can I identify something that isn’t there And then somehow show ID and missing date.
QUERY
CREATE TABLE #1A
(id VARCHAR(100),
effdate date,
termdate date)
INSERT INTO #1a
(ID, EffDate,Termdate)
VALUES
('0123456789','2016-01-01','2016-01-31'),
('0123456789','2016-02-01','2016-04-30'),
('0123456789','2016-05-01','2016-12-31'),
('0123456789','2017-01-01','2017-04-30'),
('0123456789','2017-06-01','2017-10-31'),--Missing 2017-05-01-2017-05-31
('1111111111','2016-01-01','2016-01-31'),
('1111111111','2016-02-01','2017-05-31'),
('1111111111','2017-06-01','2017-12-31'),-- no gaps
('2222222222','2016-01-01','2016-01-31'),
('2222222222','2016-03-01','2017-05-31'),--missing 2016-02-01-2016-02-28
('2222222222','2017-06-01','2017-12-31'),
('3333333333','2016-01-01','2016-01-31'),
('3333333333','2016-02-01','2016-04-30'),
('3333333333','2016-05-01','2016-12-31'),
('3333333333','2017-01-01','2017-04-30'),
('3333333333','2017-06-01','2017-10-31'),-- missing 2017-05-01
('4444444444','2016-01-01','2016-01-31'),
('4444444444','2016-02-01','2017-05-31'),
('4444444444','2017-06-01','2017-12-31'),
('4444444444','2018-01-01','2018-12-31'), -- no gaps
('5555555555','2016-01-01','2016-01-31'),
('5555555555','2016-01-01','2016-02-28'),
('5555555555','2016-01-01','2016-03-31'),--Mutilple overlapping
('5555555555','2016-02-01','2017-05-31'),
('5555555555','2017-06-01','2017-12-31'),
('5555555555','2018-01-01','2018-11-30'),
('5555555555','2018-12-01','2018-12-31'),
('5555555555','2019-01-01','2019-01-31'),--missing 2019-02-01
('5555555555','2019-03-01','2019-04-30'),
('5555555555','2019-05-01','2020-01-31')
Select *
into #1b
from #1a
as combine
where effdate < Termdate
order by Effdate;
with combine as
(
select id, effdate, Termdate, -- find the gap and flag it
case when lag(Termdate)
over (partition by ID
order by Termdate) = dateadd(day,-1, effdate) --put dates without gaps together
then 0
else 1
end as flag
from #1A
)
, groups2 as
(
Select id, effdate, Termdate,
-- cumulative sum over 0/1 to assign the same group number for row without gaps
sum(flag)
over (partition by ID
order by effdate) as grp
from combine
)
select id, min(Effdate) as MinEfdate, max(termdate) as MaxTermdate
Into #Combine2
from groups2
group by ID
order by ID, MinEfdate;
Select * from #Combine2
order by ID, MinEfdate;
I was expecting:
idMinEfdateMaxTermdate
01234567892016-01-012017-04-30
01234567892017-06-012017-10-31
But got:
idMinEfdateMaxTermdate
01234567892016-01-012017-10-31
11111111112016-01-012017-12-31
22222222222016-01-012017-12-31
33333333332016-01-012017-10-31
44444444442016-01-012018-12-31
55555555552016-01-012020-01-31
April 22, 2019 at 5:37 pm
Difficult to read when everything is in a code block, however, if you need every date group, look at a calendar table, or similar. Otherwise you can use a Tally Table to generate the dates on the fly.
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
April 22, 2019 at 7:00 pm
I wouldn't use a tally table for this problem. It's a variation on the interval packing problem. I believe that this works for the data given, but it might not work for all combinations with overlapping intervals. You can always use the standard interval packing to ensure that there are no overlapping intervals.
WITH ranges AS
(
SELECT *, LAG(termdate, 1, DATEADD(DAY, -1, effdate)) OVER(PARTITION BY ID ORDER BY effdate, termdate) AS prevtermdate
FROM #1A
)
SELECT id, prevtermdate as gapbegin, effdate AS gapend
FROM ranges
WHERE effdate > DATEADD(DAY, 1, prevtermdate)
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply