February 10, 2016 at 10:11 am
Hi All,
I have a table with circa 720,000 unique member numbers, I have a second table with multiple member numbers which has spans for each member. Member spans are defined by a date range (a start member date and an end member date). Each member can have any number of spans. These spans cannot overlap i.e. the same date cannot be in the 2 or more spans. There is no time involved in these dates, its on full day. Teh combination of members and spans results in around 3.2 million records.
My question is I would like to identify records that has overlapping data.
Below are two examples of a scenario - the first example shows that the data is OK, no overlapping, the second example shows that there is an overlap - 2 end dates of "2001-09-30".
Also, the data is not sequential, a member may go for days or months without membership, so there are gaps.
Example 1
MemberNoYMDStart YMDEnd
123 1997-06-01 1997-09-30
123 1997-10-01 1997-10-31
123 1997-11-01 1997-12-31
123 1998-01-01 1998-12-31
123 1999-01-01 1999-06-30
Example 2
MemberNoYMDStart YMDEnd
417 2001-06-01 2001-09-30
417 2001-10-01 2001-09-30
417 2001-11-01 2001-12-31
I have had a search online and am rather confused at to where to begin, in theory it didn't seem too complicated, but when starting to try for a solution, it seems it's pretty complicated.
If someone could start me in the correct direction, it will be appreciated.
Thanks
Michael
February 10, 2016 at 10:19 am
join the date-range table to itself and have a WHERE clause that checks for a.start >= b.start and a.end between b.start and b.end. Or something similar to that. Play with the permutations on a sample data set (which you should give us if you want us to write a query for you). 🙂
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
February 10, 2016 at 10:20 am
If you have two date ranges, Start1 -> End1 and Start2 -> End2, see if they overlap with this:
Start1 <= End2 AND Start2 <= End1
February 10, 2016 at 10:24 am
First: To help us help you, please always post a full repro script: CREATE TABLE (including constraints and indexes), INSERT statements with a limited but illustrative set of sample data, and expected results. Posting sample in tabular form as you did is far less helpful, because most of the people who help here do not have the time to manually create the table and type in all that data.
Second: Add a CHECK constraint to your table to ensure that EndDate can't be before StartDate. That alone is enough to kick out the incorrect row in your sample data. (It is not enough to prevent overlaps, though - more on that below). Whether or not they can be equal depends on your business requirement, and on whether the StartDate and/or the EndDate themselves are included in the interval or not. (The technical terms for this are closed interval, open interval, and half-open interval; when working with dates without time the closed interval where both start and end date are included is the most common, though half-open is often used as well. when working with time as well it is often more convenient to use a half-open interval where the StartTime is included in the interval but the EndTime is not).
Third: Without a full repro script I cannot give as much help as I would want to, but I can tell you the most important ingredient of your solution, which is how to test if two intervals overlap or not. The simplest test for this is "a.StartTime < b.EndTime AND b.StartTime < a.EndTime". Depending on the type of interval (open, closed, half-open), you may need to change the < to <=. Instead of wrecking my brains to think about this, I always simple add all the possible edge cases to my testset and then tweak until it works as expected.
February 10, 2016 at 10:25 am
WOW. I spent five minutes typing that reply, and then two others already replied during that time. Talk about overlapping intervals... 😉
February 10, 2016 at 10:49 am
Hi All,
Thank yo so much for your input.
Below is script for sample data. For the output I need to identify the member number of overlapping records so that the relevant department can correct them.
The end and start dates cannot be equal, that is the business requirement.
I don't know where to start, I don't know if I will have to loop through records to be able to identify the blocks by member?
Thank you all for your help - if there is anything else that I need to supply, please let me know.
CREATE TABLE [dbo].[zzz_Overlapping](
[MemNo] [varchar](500) NULL,
[YMDStart] [datetime] NULL,
[YMDEnd] [datetime] NULL
) ON [PRIMARY]
GO
insert into zzz_Overlapping (MemNo,YMDStart,YMDEnd)
values ( 123,'1997-06-01','1997-09-30')
insert into zzz_Overlapping (MemNo,YMDStart,YMDEnd)
values ( 123,'1997-10-01','1997-10-31')
insert into zzz_Overlapping (MemNo,YMDStart,YMDEnd)
values ( 123,'1997-11-01','1997-12-31')
insert into zzz_Overlapping (MemNo,YMDStart,YMDEnd)
values ( 123,'1998-01-01','1998-12-31')
insert into zzz_Overlapping (MemNo,YMDStart,YMDEnd)
values ( 123,'1999-01-01','1999-06-30')
insert into zzz_Overlapping (MemNo,YMDStart,YMDEnd)
values ( 417,'2001-06-01','2001-09-30')
insert into zzz_Overlapping (MemNo,YMDStart,YMDEnd)
values ( 417,'2001-10-01','2001-09-30')
insert into zzz_Overlapping (MemNo,YMDStart,YMDEnd)
values ( 417,'2001-11-01','2001-12-31')
Michael
February 10, 2016 at 11:32 am
This will give you what you want.
SELECT A.MemNo
FROM zzz_Overlapping A
JOIN zzz_Overlapping B
ON B.MemNo = A.MemNo
AND ((A.YMDStart BETWEEN B.YMDStart AND B.YMDEnd)
OR (A.YMDEnd BETWEEN B.YMDStart AND B.YMDEnd))
WHERE B.YMDStart != A.YMDStart OR B.YMDEnd != A.YMDEnd;
The query could be simplified, if it had a primary key. I'd suggest PRIMARY KEY ( MemNo, YMDStart, YMDEnd ) and also a check constraint to insure YMDStart < YMDEnd.
CREATE TABLE [dbo].[zzz_Overlapping]
(
[MemNo] [varchar](500) NOT NULL,
[YMDStart] [datetime] NOT NULL,
[YMDEnd] [datetime] NOT NULL,
PRIMARY KEY ( MemNo, YMDStart, YMDEnd ),
CHECK ( YMDStart < YMDEnd )
);
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
February 10, 2016 at 12:06 pm
Expanding a bit on Eric's reply:
1. For the primary key, I would just use MemNo, YMDStart. Then I would also add a UNIQUE constraint on MemNo, YMDEnd.
2. Slightly simpler (and probably more efficient) version of his code:
SELECT A.MemNo, A.YMDStart, A.YMDEnd, B.YMDStart, B.YMDEnd
FROM zzz_Overlapping A
INNER JOIN zzz_Overlapping B
ON B.MemNo = A.MemNo
AND B.YMDStart < A.YMDEnd
AND A.YMDStart < B.YMDEnd
AND A.YMDStart < B.YMDStart;
(Note that this code, as well as Eric's code, does rely on EndDate in any row being larger than StartDate. Rows with EndDate before StartDate do not really define a proper interval anyway, and you should not expect to get relevant results from them. And yes, you do have such a "weird interval" row in your sample data).
February 10, 2016 at 1:06 pm
Erik and Hugo,
Thank you so much for your help, the queries are brilliant and have helped me identify erroneous records.
Unfortunately I only have read only access to the tables in question and cannot make any changes to them. Also unfortunately, I have no input (excuse the pun) on the front end where the data gets captured. The sad thing is, looking at the results, there are very new (recent) records, that tells me the validation is poor. (i.e. I was hoping it was maybe only old data).
I report on the data and, as can be imagined, this causes duplicate data in the reports.
Thank you so much, you have helped me immensely.
Michael
February 10, 2016 at 1:21 pm
Hugo,
My script returns the duplicate listed below, but your's appears to return none.
MemNoYMDStart YMDEndYMDStart YMDEnd
417 2001-10-01 2001-09-30 2001-06-01 2001-09-30
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
February 10, 2016 at 1:31 pm
Eric M Russell (2/10/2016)
Hugo,My script returns the duplicate listed below, but your's appears to return none.
MemNoYMDStart YMDEndYMDStart YMDEnd
417 2001-10-01 2001-09-30 2001-06-01 2001-09-30
True. And one might debate whether or not an interval that ends before it even starts is able to overlap with other intervals or not.
Either way, I already mentioned in my post that these negative-length intervals will probably cause weird results. They should be filtered out before even starting to look for overlapping intervals.
February 10, 2016 at 3:29 pm
Hugo Kornelis (2/10/2016)
True. And one might debate whether or not an interval that ends before it even starts is able to overlap with other intervals or not.
Don't you just love some of the statements that come out of working with dates and times? And the cool part is that the statement is completely valid. One of the interesting aspects of working in our industry that I particularly enjoy.
February 10, 2016 at 7:50 pm
Hugo Kornelis (2/10/2016)
True. And one might debate whether or not an interval that ends before it even starts is able to overlap with other intervals or not.
I think the answer depends on whether you (or the business analyst setting the requirement) thinks a point-in-time has mass.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
February 11, 2016 at 7:06 am
Orlando Colamatteo (2/10/2016)
Hugo Kornelis (2/10/2016)
True. And one might debate whether or not an interval that ends before it even starts is able to overlap with other intervals or not.I think the answer depends on whether you (or the business analyst setting the requirement) thinks a point-in-time has mass.
That's a discussion best left to another forum.
https://www.physicsforums.com/threads/does-spacetime-have-mass.618519/
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
February 11, 2016 at 8:30 pm
Posting back to a similar problem/topic:
http://www.sqlservercentral.com/Forums/Topic1760478-3412-1.aspx
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply