September 12, 2016 at 12:04 pm
How do I find overlapping records in an evolved dimension table with a begin and end date for each record.
For example, assume my dtbl_employee_evolved table has the following:
emp_id, evolv_key, lastname, begin_date, end_date
1 1000 Roberts 01/01/2016 01/15/2016
1 1001 Smith 01/16/2016 01/31/2016
2 1002 Edwards 07/01/2016 09/04/2016
2 1003 James 07/06/2016 08/11/2016
2 1004 Flanagan 08/12/2016 08/31/2016
Each change in lastname should record the day the old name changed and the day the new name began.
There should not be an overlap as an employee cannot have different last names on the same day.
Employee 2 has overlapping records, if you try to find his last name on July 7 2016, you will get two records.
2 1002 Edwards 07/01/2016 09/04/2016
2 1003 James 07/06/2016 08/11/2016
I need a query that will return all overlapping records. I've tried recursive table expression and cross join unsuccessfully.
September 12, 2016 at 2:10 pm
Something like the following should give you a start. You didn't say what you wanted to do once you found the overlaps.
SELECT *
FROM dtbl_employee_evolved AS a
INNER JOIN dtbl_employee_evolved AS b
ON a.emp_id = b.emp_id
AND a.begin_date < b.end_date
AND a.end_date > b.begin_date
AND a.evolv_key < b.evolv_key
Also, Hungarian Notation is frowned upon in modern programming, but it's probably way too late to drop the prefix "dtbl_" from your tables.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
September 12, 2016 at 3:06 pm
Are you storing your begin_dates and end_dates as char or varchar columns?
I ask because at some point your example data apparently switches from DD/MM/YYYY format to MM/DD/YYYY.
Unless your data is consistent you have a problem.
IF your begin_date and end_date columns were date or datetime datatypes, you could do something like this:
select t1.*
from sometable t1
join sometable t2 on t1.emp_id = t2.emp_id
and t1.evolv_key <> t2.evolv_key -- assumes this column is unique
and ( t1.end_date between t2.begin_date and t2.end_date
OR t1.begin_date between t2.begin_date and t2.end_date )
If it is stored as character data, you could use the SUBSTRING() function to rearrange the data into YYYYMMDD format and use the resulting strings for the BETWEEN compares.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
September 13, 2016 at 11:53 pm
If you already have entries for each day that should work, but if you don't the overhead is significant, and if that query is used often, if will affect performance.
If the data is in this format, you can detect overlaps using simple date arithmetic, because an overlap is simply one interval starting after a given interval, but before the given is finished, something like
select dr1.* from date_ranges dr1
inner join date_ranges dr2
on dr2.start > dr1.start -- start after dr1 is started
and dr2.start < dr1.end -- start before dr1 is finished
If you need special handling for interval that are wholly within another interval, or you need to merge intervals, i.e.
PKey Start End Type
==== ===== === ====
01 01/01/2010 20/01/2010 S
02 15/01/2010 31/01/2010 S
yielding
Start End Type
===== === ====
01/01/2010 31/01/2010 S
you will need more complex calculation.
In my experience with this kind of problems, once you get how to do the calculation by hand, it's easy to transfer it into SQL 🙂
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply