April 8, 2010 at 12:41 pm
Hi all,
I need some help in finding the overlaps in the date range using sql. below is the sample data.
ID StartDate EndDate
id1 20060731 20071022
id1 20070730 20080530
id1 20071102 20071221
id1 20080107 20090105
id1 20090106 20090529
from the above data the row 2 is a overlap for the row 1 i need to delete row 2 from the results set.
once i delete the row 2 then row 3,4,5 are valid. is this possible in a set base operation.
April 8, 2010 at 1:30 pm
I am hoping you have some unique key that you use to clearly differentiate some of the records. To make things a little easier to read I added a RowId Identity column to your example. You can do the same thing without the key, your where clauses just get a lot uglier.
Declare @Table Table
(
RowId Int Identity(1, 1) Not Null,
Id NChar(3) Not Null,
StartDate NVarChar(8) Not Null,
EndDate NVarChar(8) Not Null
);
Insert Into @Table (Id, StartDate, EndDate)
Select 'id1', '20060731', '20071022' Union All
Select 'id1', '20070730', '20080530' Union All
Select 'id1', '20071102', '20071221' Union All
Select 'id1', '20080107', '20090105' Union All
Select 'id1', '20090106', '20090529';
Select *
From @Table;
-- Find all possible cases where the dates could overlap.
With Overlaps (OverlapRowId, BaseRowId)
As
(
Select Overlap.RowId, Base.RowId
From @Table As Base
Inner Join @Table As Overlap On Overlap.Id = Base.Id
Where (Overlap.StartDate > Base.StartDate)
And (Overlap.StartDate < Base.EndDate)
)
-- Remove records that were found to cause overlap issues.
Delete T
From @Table As T
Inner Join
(
Select O.OverlapRowId
From Overlaps As O
-- Remove "false positives"
Left Join Overlaps As Fp On Fp.OverlapRowId = O.BaseRowId
Where (Fp.OverlapRowId Is Null)
) As SubQuery On SubQuery.OverlapRowId = T.RowId;
-- Select the valid options.
Select RowId, Id, StartDate, EndDate
From @Table;
Go
April 8, 2010 at 3:46 pm
Thank you very much. it work fine i have a problem when the StartDate=StartDate or EndDate=EndDate
like the data sample below
Insert Into @Table (Id, StartDate, EndDate)
Select 'id1', '20081016', '20090531' Union All
Select 'id1', '20090601', '20090605' Union All
Select 'id1', '20090601', '20090606' Union All
Select 'id1', '20090607', '20090729' Union All
Select 'id1', '20090730', '20090805';
can you please help
Thank you.
April 8, 2010 at 4:23 pm
i have a problem when the StartDate=StartDate or EndDate=EndDate
like the data sample below
Insert Into @Table (Id, StartDate, EndDate)
Select 'id1', '20081016', '20090531' Union All
Select 'id1', '20090601', '20090605' Union All
Select 'id1', '20090601', '20090606' Union All
Select 'id1', '20090607', '20090729' Union All
Select 'id1', '20090730', '20090805';
In the case where you have equal start dates, which of these two would you consider to be the overlapping time frame? The one that ends earlier, the one that ends later, or the record that was inserted last?
April 8, 2010 at 4:34 pm
Thank you for the quick response.
if the StartDate are same then we need to pick the earliest EndDate
and If the EndDates are same then we need to pick the Earluest StartDate
April 8, 2010 at 5:00 pm
if the StartDate are same then we need to pick the earliest EndDate
Declare @Table Table
(
RowId Int Identity(1, 1) Not Null,
Id NChar(3) Not Null,
StartDate NVarChar(8) Not Null,
EndDate NVarChar(8) Not Null
);
/*
Insert Into @Table (Id, StartDate, EndDate)
Select 'id1', '20060731', '20071022' Union All
Select 'id1', '20070730', '20080530' Union All
Select 'id1', '20071102', '20071221' Union All
Select 'id1', '20080107', '20090105' Union All
Select 'id1', '20090106', '20090529';
*/
Insert Into @Table (Id, StartDate, EndDate)
Select 'id1', '20081016', '20090531' Union All
Select 'id1', '20090601', '20090605' Union All
Select 'id1', '20090601', '20090606' Union All
Select 'id1', '20090607', '20090729' Union All
Select 'id1', '20090730', '20090805';
Select *
From @Table;
-- Find all possible cases where the dates could overlap.
With Overlaps (OverlapRowId, BaseRowId, OStart, OEnd, BStart, BEnd)
As
(
Select Overlap.RowId, Base.RowId, Overlap.StartDate, Overlap.EndDate, Base.StartDate, Base.EndDate
From @Table As Base
Inner Join @Table As Overlap On Overlap.Id = Base.Id
Where (((Overlap.StartDate > Base.StartDate) And (Overlap.StartDate < Base.EndDate))
-- If they start at the same time, the one that ends later is the overlap.
Or ((Overlap.StartDate = Base.StartDate) And (Overlap.EndDate > Base.EndDate)))
-- Now that equal start times are possible, need to eliminate self-joins.
And (Base.RowId != Overlap.RowId)
)
-- Remove records that were found to cause overlap issues.
Delete T
From @Table As T
Inner Join
(
Select O.OverlapRowId
From Overlaps As O
-- Remove "false positives"
Left Join Overlaps As Fp On Fp.OverlapRowId = O.BaseRowId
Where (Fp.OverlapRowId Is Null)
) As SubQuery On SubQuery.OverlapRowId = T.RowId;
-- Select the valid options.
Select RowId, Id, StartDate, EndDate
From @Table;
Go
and If the EndDates are same then we need to pick the Earluest StartDate
This case should have been handled by the original where clause.
Do you have to worry about the situation where identical records are possible (i.e. multiple records with the same start and end dates for the same Id)?
April 8, 2010 at 5:13 pm
Thank you so much , really appreciate your help with this.
I have not come across the scenario where there are multiple records with the same start and enddates. that would be really help full if the data changes of the ID.
Thank you!!!
April 8, 2010 at 5:19 pm
Just in case you ever do see complete dupes... Figured I'd throw it in for the sake of completeness.
Declare @Table Table
(
RowId Int Identity(1, 1) Not Null,
Id NChar(3) Not Null,
StartDate NVarChar(8) Not Null,
EndDate NVarChar(8) Not Null
);
/*
Insert Into @Table (Id, StartDate, EndDate)
Select 'id1', '20060731', '20071022' Union All
Select 'id1', '20070730', '20080530' Union All
Select 'id1', '20071102', '20071221' Union All
Select 'id1', '20080107', '20090105' Union All
Select 'id1', '20090106', '20090529';
*/
Insert Into @Table (Id, StartDate, EndDate)
Select 'id1', '20081016', '20090531' Union All
Select 'id1', '20090601', '20090605' Union All
Select 'id1', '20090601', '20090606' Union All
Select 'id1', '20090607', '20090729' Union All
Select 'id1', '20090730', '20090805' Union All
-- Added some duplicates.
Select 'id1', '20100101', '20100131' Union All
Select 'id1', '20100101', '20100131' Union All
Select 'id1', '20100101', '20100131';
Select *
From @Table;
-- Find all possible cases where the dates could overlap.
With Overlaps (OverlapRowId, BaseRowId)
As
(
Select Overlap.RowId, Base.RowId
From @Table As Base
Inner Join @Table As Overlap On Overlap.Id = Base.Id
Where (((Overlap.StartDate > Base.StartDate) And (Overlap.StartDate < Base.EndDate))
-- If they start at the same time, take the one the ends first.
Or ((Overlap.StartDate = Base.StartDate) And (Overlap.EndDate > Base.EndDate)))
-- Now that equal start times are possible, need to eliminate self-joins.
And (Base.RowId != Overlap.RowId)
Union
-- Find duplicate rows (i.e. matching start and end dates for an id).
-- The tiebreaker in this case will be the one with the lowest id.
-- This query will produce duplicate records, but the union will clean those up for us.
Select Case -- Always pick the higher id value as the overlap.
When (Overlap.RowId > Base.RowId) Then Overlap.RowId
Else Base.RowId
End, Case -- Always pick the lower id value as the base.
When (Overlap.RowId < Base.RowId) Then Overlap.RowId
Else Base.RowId
End
From @Table As Base
Inner Join @Table As Overlap On Overlap.Id = Base.Id
Where (Base.RowId != Overlap.RowId) And (Base.StartDate = Overlap.StartDate)
And (Base.EndDate = Overlap.EndDate)
)
-- Remove records that were found to cause overlap issues.
Delete T
From @Table As T
Inner Join
(
Select O.OverlapRowId
From Overlaps As O
-- Remove "false positives"
Left Join Overlaps As Fp On Fp.OverlapRowId = O.BaseRowId
Where (Fp.OverlapRowId Is Null)
) As SubQuery On SubQuery.OverlapRowId = T.RowId;
-- Select the valid options.
Select RowId, Id, StartDate, EndDate
From @Table;
Go
April 8, 2010 at 5:26 pm
Thank you Somuch!!!!!!
April 8, 2010 at 6:51 pm
Insert Into @Table (Id, StartDate, EndDate)
Select 'id1', '20081016', '20090531' Union All
Select 'id1', '20090601', '20090605' Union All
Select 'id1', '20090601', '20090606' Union All
Select 'id1', '20090607', '20090729' Union All
Select 'id1', '20090729', '20090805' Union All
-- Added some duplicates.
Select 'id1', '20100101', '20100131' Union All
Select 'id1', '20100101', '20100131' Union All
Select 'id1', '20100101', '20100131';
1 - Select 'id1', '20090607', '20090729' Union All
2 - Select 'id1', '20090729', '20090805' Union All
The delete does not consider the overlaps with record 1 - Endate ('20090729' ) and 2- StartDate ('20090729').
Thank you..
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply