May 13, 2008 at 10:04 am
This problem is really foxing me and I'm struggling to think of a strategy to solve it, let alone write the code! I belive it's a fairly tricky problem, so any help you guys can give would be much appreciated.
Two tables are linked by a person ID. They are Address and SportsClub. A person can be in a single club whilst living at mulitple addresses, or at a single address whilst at multiple clubs - or a combination of both. The first table below holds records of the clubs someone is a member of over time. The second table holds their postal adresses as they change over time.
PersonClub[ClubFromDate][ClubToDate]
14CB101/01/200101/02/2001
14MK401/02/200101/03/2001
14LD501/03/200101/08/2001
14AA701/08/200101/02/2002
PersonAddress[AddFromDate][AddToDate]
14Cambs01/01/199501/04/2001
14Bristol01/04/200101/12/2001
14London01/12/200101/12/2007
I need to build a single table containing the records for both, such as:
PersonClubAddressClubFromDateClubToDateAddFromDateAddToDate
14CB1Cambs01/01/200101/02/200101/01/199501/04/2001
14MK4Cambs01/02/200101/03/200101/01/199501/04/2001
14LD5Cambs01/03/200101/08/200101/01/199501/04/2001
14LD5Bristol01/03/200101/08/200101/04/200101/12/2001
14AA7Bristol01/08/200101/02/200201/04/200101/12/2001
14AA7London01/08/200101/02/200201/12/200101/12/2007
I think the solution in plain English is as follows:
> Start with the first Club.
If the ClubFromDate falls between AddFromDate and AddToDate, then display the row joined with that Address.
Next, if the ClubToDate falls between AddFromDate and AddToDate, then display another row joined with that Address.
> Loop through each Club.
I can't see how to do this in SQL and any solution I come up with will be very clunky and probably use a cursor - however these tables have millions of rows and performance is a bit of an issue. I look forward to hearing your creative ideas and thanks again in advance!
May 13, 2008 at 10:21 am
Something like this?
select s.Person,
s.Club,
a.Address,
s.ClubFromDate,s.ClubToDate,
a.AddFromDate,a.AddToDate
from SportsClub s
inner join Address a on a.Person=s.Person
and a.AddToDate > s.ClubFromDate
and a.AddFromDate < s.ClubToDate
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537May 13, 2008 at 10:34 am
It does not look too difficult. Maybe I am missing something. Check this out.
Create table #Person (person int, Club varchar(5), ClubFromDate datetime, ClubToDate datetime)
GO
Create table #Address (person int, Address varchar(25), AddFromDate datetime, AddToDate datetime)
GO
insert into #Person
Select 14, 'CB1', '01/01/2001', '01/02/2001'
UNION ALL
Select 14, 'MK4', '01/02/2001', '01/03/2001'
UNION ALL
Select 14, 'LD5', '01/03/2001', '01/08/2001'
UNION ALL
Select 14, 'AA7', '01/08/2001', '01/02/2002'
insert into #Address
Select 14, 'Cambs', '01/01/1995', '01/04/2001'
UNION ALL
Select 14, 'Bristol', '01/04/2001', '01/12/2001'
UNION ALL
Select 14, 'London', '01/12/2001', '01/12/2007'
Select a.Person, Club, Address, ClubFromDate, ClubToDate, AddfromDate, AddtoDate from #person as a
JOIN #Address as b
ON a.Person = b.Person and (a.ClubfromDate between b.Addfromdate and b.AddToDate or a.ClubToDate between b.Addfromdate and b.AddToDate)
The result looks like what you wanted.
-Roy
May 13, 2008 at 1:28 pm
Mark's solution is the one I would use. Should do exactly what you need.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
May 13, 2008 at 2:41 pm
There are 5 overlap conditions between a Club interval and and Address interval which means that they occur during the same interval. This is easiest shown graphically with an interval for the Club and then identify the Address intervals that overlap:
Club Interval|----------|
1 Address Equal|----------|
2 Address within |-----|
3 Address Before |------|
4 Address after |-----|
5 Address complete |------------------|
The SQL for these overlappig conditions is:
SELECTAddress.PersonId
,Club.ClubId
,Address.AddressDescr
,Club.FromDate , Address.FromDate, Club.ToDate , Address.ToDate
--Greater of the From Dates
,CASE WHEN Address.FromDate < Club.FromDate then Club.FromDate
else Address.FromDate
end as FromDate
--Lesser of the To Dates
,CASE WHEN Address.ToDate > Club.ToDate then Club.ToDate
else Address.ToDate
end as ToDAte
FROM Club
joinAddress
on Address.PersonId= Club.PersonId
WHERE(( Address.FromDate = Club.FromDate AND Address.ToDate = Club.ToDate ) -- 1 Address Equal CLub
or( Address.FromDate between Club.FromDate AND Club.ToDate
AND Address.ToDate between Club.FromDate AND Club.ToDate ) -- 2 Address within CLub
or( Address.FromDate < Club.FromDate
AND Address.ToDate between Club.FromDate AND Club.ToDate ) -- 3 Address starts Before Club and ends within
or( Address.FromDate between Club.FromDate AND Club.ToDate
AND Address.ToDate >= Club.ToDate ) -- 4 Address starts within Club and ends after
or(Address.FromDate Club.ToDate ) -- 5
)
SQL = Scarcely Qualifies as a Language
May 13, 2008 at 2:47 pm
All five of those are covered by the conditions in Mark's query. I've used that method before, and it works.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
May 13, 2008 at 2:57 pm
SQL Code for condition 5 got messed up and the Greater than and Less Than signs are making the code disappear - just replace GT and LT with the approriate symbols.
or(Address.FromDate LT Club.FromDate AND Address.ToDate GT Club.ToDate ) -- 5
[/]
SQL = Scarcely Qualifies as a Language
May 13, 2008 at 11:58 pm
Carl Federl (5/13/2008)
There are 5 overlap conditions between a Club interval and and Address interval which means that they occur during the same interval. This is easiest shown graphically with an interval for the Club and then identify the Address intervals that overlap:
Club Interval|----------|
1 Address Equal|----------|
2 Address within |-----|
3 Address Before |------|
4 Address after |-----|
5 Address complete |------------------|
The SQL for these overlappig conditions is:
SELECTAddress.PersonId
,Club.ClubId
,Address.AddressDescr
,Club.FromDate , Address.FromDate, Club.ToDate , Address.ToDate
--Greater of the From Dates
,CASE WHEN Address.FromDate < Club.FromDate then Club.FromDate
else Address.FromDate
end as FromDate
--Lesser of the To Dates
,CASE WHEN Address.ToDate > Club.ToDate then Club.ToDate
else Address.ToDate
end as ToDAte
FROM Club
joinAddress
on Address.PersonId= Club.PersonId
WHERE(( Address.FromDate = Club.FromDate AND Address.ToDate = Club.ToDate ) -- 1 Address Equal CLub
or( Address.FromDate between Club.FromDate AND Club.ToDate
AND Address.ToDate between Club.FromDate AND Club.ToDate ) -- 2 Address within CLub
or( Address.FromDate < Club.FromDate
AND Address.ToDate between Club.FromDate AND Club.ToDate ) -- 3 Address starts Before Club and ends within
or( Address.FromDate between Club.FromDate AND Club.ToDate
AND Address.ToDate >= Club.ToDate ) -- 4 Address starts within Club and ends after
or(Address.FromDate Club.ToDate ) -- 5
)
There are 2 scenarios where Club and Address do *not* overlap
Club Interval |----------|
1 Address Before |------|
2 Address After |-----|
Scenario 1 is where address end date is earlier than club start date, in other words
Address.AddToDate < SportsClub.ClubFromDate
Scenario 2 is where address start date is after than club end date, in other words
Address.AddFromDate > SportsClub.ClubToDate
For no overlap, either one of these conditions must be true
Address.AddToDate < SportsClub.ClubFromDate OR
Address.AddFromDate > SportsClub.ClubToDate
Therefore the condition for overlap is the inverse of the above case. Here I'm
ignoring the equality case ( > vs. >= ) as this hasn't been specified by the OP
as to whether this counts as overlap or not.
NOT (Address.AddToDate < SportsClub.ClubFromDate OR
Address.AddFromDate > SportsClub.ClubToDate)
which is equivalent to
Address.AddToDate > SportsClub.ClubFromDate AND
Address.AddFromDate < SportsClub.ClubToDate
This gives the solution I posted.
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537May 14, 2008 at 7:01 am
Thanks a lot for that guys, much appreciated. I'm going to have to sit down and understand the code, but it seems okay at first glance.
May 14, 2008 at 8:03 am
On another thread I posted my X-paradigm for easily remembering how to code if two dates overlap. Thus two intervals S1-F1 and S2-F2 overlap if S1 < F2 and S2 < F1.
S1 F1
X
S2 F2
May 16, 2008 at 5:31 am
There are 2 temporal designs for how to define a period's end:
1. The end is one time unit less than "next" begin. This is the "closed-closed" design.
2. The end is equal to the same "next" begin. This is the "closed-open" design.
Reference "Developing Time-Oriented Database Applications in SQL" at
http://www.cs.arizona.edu/people/rts/tdbbook.pdf
The time unit depends on the datatype chosen:
For datetime, 3 milliseconds
For smalldatetime, 1 hour
For date, 1 day
For datetime2, variable with a maximum precision of seven decimal places for seconds. ( 0.0000001 )
In Tom Clark's example data, the ToDate is the same as the "next" FromDate:
Person Address [AddFromDate] [AddToDate]
14 Cambs 01/01/1995 01/04/2001
14 Bristol 01/04/2001 01/12/2001
14 London 01/12/2001 01/12/2007
[\code]
Hence, this is the "closed-open" design.
Mark 's solution works with a "closed-closed" design but does not return correct results under a "closed-open" design. My solution works only with "closed-open" design.
Comparing the two solutions, the "closed-closed" design has significantly less complicated SQL but can is more confusing for personnel entering the period end units.
SQL = Scarcely Qualifies as a Language
May 16, 2008 at 9:19 am
Mark's solution does work with the "closed-open" design, it's just a question of whether you use "<=" or "<" in the join.
If you need it, I can provide the mathematical proof for this. It's very simple, 9th/10th grade Geometry in a one-dimensional space.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
May 18, 2008 at 4:02 pm
Finally had time to examine Mark's "not excluded" solution in detail and it produces exactly the same results as the more complicated included solution.
Reference: "Propositional Login" http://en.wikipedia.org/wiki/Propositional_logic[\url]
Given Constraints:
1)PersonAddressFromTs < PersonAddressToTs
2)PersonClubFrom< PersonClubToTs
For closed-closed intervals:
AND NOT ( PersonAddressToTs < PersonClubFromTs
OR PersonAddressFromTs > PersonClubToTs
)
Applying De Morgan's Theorem (2)
AND NOT ( PersonAddressToTs < PersonClubFromTs )
AND NOT ( PersonAddressFromTs > PersonClubToTs)
Changing "Not Comparison" to "Reverse comparison", e.g.
"NOT Lesser Than" to "Greater than or Equal"
"NOT Greater Than" to "Less than or Equal"
AND PersonAddressToTs >= PersonClubFromTs
AND PersonAddressFromTs <= PersonClubToTs
For closed-open intervals:
AND NOT ( PersonAddressToTs <= PersonClubFromTs
OR PersonAddressFromTs => PersonClubToTs
)
Can be transformed to :
AND PersonAddressToTs > PersonClubFromTs
AND PersonAddressFromTs < PersonClubToTs
I have looked at about 50 temporal join solution and all use the complex include solution instead of the much simpler "not excluded" solution.
For a schema containing 1000 Person, 3 Clubs for each person and 1 to 5 Addresses per Person, the "not included" and the transformed version, under SQL Server 2008, all produce identical execution plans. Seems the optimizer does know De Morgan's Theorems
SQL = Scarcely Qualifies as a Language
May 18, 2008 at 5:18 pm
Wow, that reminded me of some long forgotten university lectures. Nice work!
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply