October 20, 2010 at 6:42 am
Hello there,
I've got the following problem and I'm struggling to come up with a solution that works in all circumstances.
I have 2 tables ActivePeriod and ActiveCategory.
For each date range entered into ActivePeriod, there must be a entry (or entries) in ActiveCategory to cover the date range specified in ActivePeriod. i.e. there can be no gaps.
There can also be multiple entires in ActivePeriod for each client, but these entries will not have date overlaps.
I'm trying to identify any gaps in ActiveCategory for each entry in ActivePeriod
I've tried a few different approaches, but I can't seem to find a working solution.
Here is some SQL to create the two tables and a couple of sample records
CREATE TABLE ActivePeriod (PeriodID int identity, PersonID int, StartDate datetime, enddate datetime)
go
INSERT INTO ActivePeriod ([PersonID], [StartDate], [EndDate]) VALUES (1, 'Jan 01 2008', 'Jan 01 2010')
INSERT INTO ActivePeriod ([PersonID], [StartDate], [EndDate]) VALUES (2, 'Feb 01 2008', 'Feb 01 2009')
INSERT INTO ActivePeriod ([PersonID], [StartDate], [EndDate]) VALUES (2, 'Sep 01 2009', null)
go
CREATE TABLE ActiveCategory (CategoryID int identity, PersonID int, StartDate datetime, enddate datetime)
go
INSERT INTO ActiveCategory ([PersonID], [StartDate], [EndDate]) VALUES (1, 'Jan 01 2008', 'Jan 01 2009')
INSERT INTO ActiveCategory ([PersonID], [StartDate], [EndDate]) VALUES (1, 'Feb 01 2009', 'May 01 2009')
INSERT INTO ActiveCategory ([PersonID], [StartDate], [EndDate]) VALUES (1, 'Jul 01 2009', 'Jan 01 2010')
INSERT INTO ActiveCategory ([PersonID], [StartDate], [EndDate]) VALUES (2, 'Mar 01 2008', 'Feb 01 2009')
INSERT INTO ActiveCategory ([PersonID], [StartDate], [EndDate]) VALUES (2, 'Dec 01 2009', null)
go
--drop table ActivePeriod
--drop table ActiveCategory
So basically, for PersonID 1 it looks like this
I'm interested in returning the date ranges contained in the it the two GAP periods shown above
So something along the lines of
Does anyone have any ideas?
Thanks for taking the time to look at this.
Paul
October 20, 2010 at 9:30 am
Wow, this one was nasty.
First of all, a couple clarifications as housekeeping.
There are 29 days in Feb in 2008 (affects your expected output)
Based on your expected output, I think the last row of sample data in ActiveCategory is incorrect (should start Dec 1, 2009, right?)
This brings up an important question ... can 2 categories for one person overlap (ie: could it have been Dec 2008?). If so you need to add one more line to calculate gaps correctly.
Anyway, here is my solution:
CREATE TABLE #ActivePeriod (PeriodID int identity, PersonID int, StartDate datetime, enddate datetime)
go
INSERT INTO #ActivePeriod ([PersonID], [StartDate], [EndDate]) VALUES (1, 'Jan 01 2008', 'Jan 01 2010')
INSERT INTO #ActivePeriod ([PersonID], [StartDate], [EndDate]) VALUES (2, 'Feb 01 2008', 'Feb 01 2009')
INSERT INTO #ActivePeriod ([PersonID], [StartDate], [EndDate]) VALUES (2, 'Sep 01 2009', null)
go
CREATE TABLE #ActiveCategory (CategoryID int identity, PersonID int, StartDate datetime, enddate datetime)
go
INSERT INTO #ActiveCategory ([PersonID], [StartDate], [EndDate]) VALUES (1, 'Jan 01 2008', 'Jan 01 2009')
INSERT INTO #ActiveCategory ([PersonID], [StartDate], [EndDate]) VALUES (1, 'Feb 01 2009', 'May 01 2009')
INSERT INTO #ActiveCategory ([PersonID], [StartDate], [EndDate]) VALUES (1, 'Jul 01 2009', 'Jan 01 2010')
INSERT INTO #ActiveCategory ([PersonID], [StartDate], [EndDate]) VALUES (2, 'Mar 01 2008', 'Feb 01 2009')
INSERT INTO #ActiveCategory ([PersonID], [StartDate], [EndDate]) VALUES (2, 'Dec 01 2009', null)
go
;
WITH Category (CategoryID,PersonID,StartDate,EndDate)
AS
(
SELECT ROW_NUMBER () OVER (PARTITION BY PersonID ORDER BY StartDate),PersonID,COALESCE(StartDate,'Jan 1, 1900'),COALESCE(EndDate, 'Jan 1, 2100')
FROM #ActiveCategory
),
Cat2 (PersonID,StartDate,EndDate)
AS
(SELECT COALESCE(c1.PersonID,c2.PersonID), DATEADD(day,1,COALESCE(c1.EndDate,'Dec 31, 1899')),DATEADD(day,-1,COALESCE(c2.StartDate,'Jan 2, 2100'))
FROM Category c1
FULL OUTER JOIN Category c2 ON c1.PersonId=C2.PersonId and c1.CategoryId = c2.CategoryId-1
WHERE COALESCE(c1.EndDate,'Jan 1, 2010') < 'Jan 1, 2100')
SELECT ap.PersonID,ap.StartDate,ap.EndDate,CASE WHEN ac.StartDate > ap.StartDate THEN ac.StartDate ELSE ap.StartDate END AS GapFrom, CASE WHEN ac.EndDate < COALESCE(ap.EndDate,'Jan 1, 2100') THEN ac.EndDate ELSE ap.EndDate END AS GapTo
FROM #ActivePeriod ap
INNER JOIN Cat2 ac ON ap.PersonID = ac.PersonID
WHERE ac.StartDate BETWEEN ap.StartDate AND COALESCE(ap.EndDate,'Jan 1, 2100')
OR ac.EndDate BETWEEN ap.StartDate AND COALESCE(ap.EndDate,'Jan 1, 2100')
Select * from #activecategory
drop table #activecategory
drop table #activeperiod
I'm sure someone else can find you a neater, more efficient way of doing this, as I'm using many performance no-nos like ORs and functions on columns in a where clause, as well as some boundary dates to replace nulls (please dont use this stored procedure for 100 years)
In case you need clarification, here is what this is doing:
Self join category to itself with a full join, essentially creating a CTE of category "GAPS"
Join that CTE to activeperiod to see which of those gaps intersect with which periods.
It gets messy partly because it needs to manage NULLS in both originating data, and in the full outer join (which creates the start and end gaps).
I have an idea on how to make this slightly better, so I may post an update soon. But this returns your expected results.
October 20, 2010 at 9:36 am
Thanks for that mate, I'm just working through your solution but you are correct about the sample data being incorrect. I've updated the original post to reflect this.
I'm 60% through my own solution, but I think yours looks a little more promising.
Just to clarify - the categories can not overlap.
Thanks again - I'll post an update and let you know how I get on.
Paul
October 20, 2010 at 9:48 am
NOTE: Actually I have a small logical error in the above. If the "gap" in category is for the entire activeperiod I won't find it.
Adding the following at the end should correct this:
OR (ac.StartDate <ap.StartDate AND ac.EndDate > ap.EndDate)
October 20, 2010 at 8:01 pm
Edit: Didn't read the entire post by OP.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
October 22, 2010 at 9:53 am
Hello Nevyn,
I've worked through your solution now and it works in every situation but one (it's far better than what I came up with, I didn't think of using a full outer join).
If there is no entry in ActiveCategory for a person but there is a valid ActivePeriod, the result is missing from the result set.
I'm just trying to work out how to include these cases.
October 22, 2010 at 10:18 am
I think this should do it
;
WITH Category (CategoryID,PersonID,StartDate,EndDate)
AS
(
SELECT ROW_NUMBER () OVER (PARTITION BY PersonID ORDER BY StartDate),PersonID,COALESCE(StartDate,'Jan 1, 1900'),COALESCE(EndDate, 'Jan 1, 2100')
FROM ActiveCategory
),
Cat2 (PersonID,StartDate,EndDate)
AS
(SELECT COALESCE(c1.PersonID,c2.PersonID), DATEADD(day,1,COALESCE(c1.EndDate,'Dec 31, 1899')),DATEADD(day,-1,COALESCE(c2.StartDate,'Jan 2, 2100'))
FROM Category c1
FULL OUTER JOIN Category c2 ON c1.PersonId=C2.PersonId and c1.CategoryId = c2.CategoryId-1
WHERE COALESCE(c1.EndDate,'Jan 1, 2010') < 'Jan 1, 2100')
SELECT ap.PersonID,ap.StartDate,ap.EndDate,CASE WHEN ac.StartDate > ap.StartDate THEN ac.StartDate ELSE ap.StartDate END AS GapFrom, CASE WHEN ac.EndDate < COALESCE(ap.EndDate,'Jan 1, 2100') THEN ac.EndDate ELSE ap.EndDate END AS GapTo
FROM ActivePeriod ap
LEFT OUTER JOIN Cat2 ac ON ap.PersonID = ac.PersonID
WHERE COALESCE(ac.StartDate, 'Jan 1, 1900') BETWEEN ap.StartDate AND COALESCE(ap.EndDate,'Jan 1, 2100')
OR COALESCE(ac.EndDate,'Jan 1, 2100') BETWEEN ap.StartDate AND COALESCE(ap.EndDate,'Jan 1, 2100')
OR (COALESCE(ac.StartDate, 'Jan 1, 1900') <ap.StartDate AND COALESCE(ac.EndDate,'Jan 1, 2100') > COALESCE(ap.EndDate,'Jan 1, 2100'))
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply