June 12, 2013 at 3:42 am
I am looking for help with splitting a row into multiple rows based on dates overlapping.
As an example, I have a table with the following data:
Row ID, Employee, Job, Start Date, End Date, Workload
1, John Doe, HSBC, 01/01/2013, 31/12/2013, 100
2, John Doe, Vacation, 17/06/2013, 21/06/2013, 100
3, John Doe, Vacation, 19/08/2013, 23/08/2013, 100
4, John Doe, Barclays, 01/01/2014, 31/01/2014, 50
5, John Doe, Santander, 06/01/2014, 25/01/2014, 50
6, John Doe, Vacation, 13/01/2014, 17/01/2014, 100
I am looking to split the banking rows where they overlap with a vacation. So for example, the final result should be:
Row ID, Employee, Job, Start Date, End Date, Workload
1, John Doe, HSBC, 01/01/2013, 16/06/2013, 100
2, John Doe, Vacation, 17/06/2013, 21/06/2013, 100
3, John Doe, HSBC, 22/06/2013, 18/08/2013, 100
4, John Doe, Vacation, 19/08/2013, 23/08/2013, 100
5, John Doe, HSBC, 24/08/2013, 31/12/2013, 100
6, John Doe, Barclays, 01/01/2014, 12/01/2014, 50
7, John Doe, Barclays, 18/01/2014, 31/01/2014, 50
8, John Doe, Santander, 06/01/2014, 12/01/2014, 50
9, John Doe, Santander, 18/01/2014, 25/01/2014, 50
10, John Doe, Vacation, 13/01/2014, 17/01/2014, 100
New rows after split are in bold. The Row ID should be unique although it doesn't need to be sequential.
Any help or guidance would be appreciated.
June 12, 2013 at 9:42 am
hi,
I tried to understand your example but I didn't.
I have no idea how you get to the date values for the added rows. And what about the first row in your example, the date values have changed there too.
Please try to explain it again, maybe more detailled.
By the way, please read the following article and provide your example data a little bit more comfortable π
http://www.sqlservercentral.com/articles/Best+Practices/61537/
June 12, 2013 at 10:05 am
I understand your requirements. This is a bit challenging. Since you are brand new around here I went ahead and put together some ddl and sample data in a consumable format so you can see how you should post this type of thing.
Here it is:
set dateformat dmy
if OBJECT_ID('tempdb..#Something') is not null
drop table #Something
create table #Something
(
ID int,
Employee varchar(20),
Job varchar(20),
StartDate datetime,
EndDate datetime,
Workload int
)
insert #Something
select *
from (Values
(1, 'John Doe', 'HSBC', '01/01/2013', '31/12/2013', 100)
,(2, 'John Doe', 'Vacation', '17/06/2013', '21/06/2013', 100)
,(3, 'John Doe', 'Vacation', '19/08/2013', '23/08/2013', 100)
,(4, 'John Doe', 'Barclays', '01/01/2014', '31/01/2014', 50)
,(5, 'John Doe', 'Santander', '06/01/2014', '25/01/2014', 50)
,(6, 'John Doe', 'Vacation', '13/01/2014', '17/01/2014', 100)
) x(a,b,c,d,e,f)
select *
from #Something
So now we all have something to work with. I will see if I can come up with something that will work. Meantime others can work on this too now that they have details to work with.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 β Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
June 12, 2013 at 10:37 am
Hi WolfgangE and Sean Lange,
Thank you for your responses.
WolfgangE, sorry should have been more clear in my requirements.
We have a system where an employee can be booked on a job for long periods, so in my crude example, John Doe is booked to HSBC for all of 2013. However, in between, employees can be on vacation too so John Doe has 2 vacations during the time he is booked on HSBC.
What we are looking to achieve is to split the job either side of vacation bookings so it all slots in as opposed to being stacked. I attach a screenshot as an example.
Sean Lange, thanks for the DDL. I should have provided that in the first place.
I look forward to your guidance and assistance.
June 12, 2013 at 11:47 am
Tricky.
Could you clarify your first sample output from above? With HSBC you intersperse the vacation with the new rows of HSBC. With the other banks you show the vacation AFTER the banking records.
I am pretty close to a solution (its not very pretty), but assuming the output order matters I need to understand the sorting rules.
June 12, 2013 at 11:59 am
thank goodness someone else tries, I dont' find a solution yet...
June 12, 2013 at 12:34 pm
Here is what I have. Sort order is not the same as the original post suggested, so we could still use a clairification on those rules.
There is likely a cleaner way to do this, but this is my first pass. Note that it can get tripped up if a job and a vacation start on the same date, so if that is an issue let me know.
set dateformat dmy
if OBJECT_ID('tempdb..#Something') is not null
drop table #Something
create table #Something
(
ID int,
Employee varchar(20),
Job varchar(20),
StartDate datetime,
EndDate datetime,
Workload int
)
insert #Something
select *
from (Values
(1, 'John Doe', 'HSBC', '01/01/2013', '31/12/2013', 100)
,(2, 'John Doe', 'Vacation', '17/06/2013', '21/06/2013', 100)
,(3, 'John Doe', 'Vacation', '19/08/2013', '23/08/2013', 100)
,(4, 'John Doe', 'Barclays', '01/01/2014', '31/01/2014', 50)
,(5, 'John Doe', 'Santander', '06/01/2014', '25/01/2014', 50)
,(6, 'John Doe', 'Vacation', '13/01/2014', '17/01/2014', 100)
) x(a,b,c,d,e,f)
;
WITH VacationCTE AS (--This CTE grabs just the Vacation rows so we can compare and split dates from them
SELECT ID,
Employee,
Job,
StartDate,
EndDate,
Workload
FROM #Something
WHERE Job = 'Vacation'
),
NewRowsCTE AS ( --This CTE creates just new rows starting after the vacations for each banking job
SELECT a.ID,
a.Employee,
a.Job,
DATEADD (d,1,b.EndDate) AS StartDate,
a.EndDate,
a.Workload
FROM #Something a
INNER JOIN VacationCTE b
ON a.StartDate <= b.StartDate
AND a.EndDate > b.StartDate
AND a.EndDate > b.EndDate -- This is needed because if the vacation ends when the project does, there is no split row after
),
UnionCTE AS ( -- This CTE merges the new rows with the existing ones
SELECT ID,
Employee,
Job,
StartDate,
EndDate,
Workload
FROM #Something
UNION ALL
SELECT ID,
Employee,
Job,
StartDate,
EndDate,
Workload
FROM NewRowsCTE
),
FixEndDateCTE as (
SELECT CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.StartDate) AS FixID, min(d.StartDate) as StartDate
FROM UnionCTE c
LEFT OUTER JOIN VacationCTE d
ON c.StartDate < d.StartDate
AND c.EndDate >= d.StartDate
WHERE c.Job <> 'Vacation'
GROUP BY CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.StartDate)
)
SELECT row_number() OVER (ORDER BY e.Startdate),
e.Employee,
e.Job,
e.StartDate,
CASE WHEN f.StartDate IS NULL
THEN e.EndDate
ELSE DATEADD (d,-1,f.StartDate)
END,
e.Workload
FROM UnionCTE e
LEFT OUTER JOIN FixEndDateCTE f
ON (CONVERT (CHAR,e.ID)+CONVERT (CHAR,e.StartDate)) = f.FixID
ORDER BY e.StartDate
It is tricky because you need to separate vacations from other rows ( treated differently), create new rows post vacation, and update both the existing and new rows with corrected end dates. Probably far easier to do within a stored procedure where you can use a temp table and take multiple steps, but I tried to do it within a query.
June 13, 2013 at 8:39 am
Hi SSC Veteran,
Thank you very much for your response.
Your DDL appears to be working. I am testing further.
The IDs for existing rows can remain the same with new IDs allocated to new rows (after split). No particular order is required.
I really appreciate your suggestion.
June 18, 2013 at 2:49 am
Hi All,
Thanks to SSC Veteran's assistance, I am close to the correct result set.
During testing, I found that when 2 rows overlap, duplicate rows appear. An example is:
John Smith, Vacation, 12/02/2014, 25/02/2014
John Smith, Natwest, 14/02/2014, 28/02/2014
The expected result should be the Natwest job start date is the next day after vacation:
John Smith, Vacation, 12/02/2014, 25/02/2014
John Smith, Natwest, 26/02/2014, 28/02/2014
However, I see the following:
John Smith, Vacation, 12/02/2014, 25/02/2014
John Smith, Natwest, 14/02/2014, 28/02/2014
John Smith, Natwest, 26/02/2014, 28/02/2014
Any pointers would be appreciated. Here is the DDL.
set dateformat dmy
if OBJECT_ID('tempdb..#Something') is not null
drop table #Something
create table #Something
(
ID int,
Employee varchar(20),
Job varchar(20),
StartDate datetime,
EndDate datetime,
Workload int
)
insert #Something
select *
from (Values
(1, 'John Doe', 'HSBC', '01/01/2013', '31/12/2013', 100)
,(2, 'John Doe', 'Vacation', '17/06/2013', '21/06/2013', 100)
,(3, 'John Doe', 'Vacation', '19/08/2013', '23/08/2013', 100)
,(4, 'John Doe', 'Barclays', '01/01/2014', '31/01/2014', 50)
,(5, 'John Doe', 'Santander', '06/01/2014', '25/01/2014', 50)
,(6, 'John Doe', 'Vacation', '13/01/2014', '17/01/2014', 100)
,(7, 'John Doe', 'Natwest', '14/02/2014', '28/02/2014', 100)
,(8, 'John Doe', 'Vacation', '12/02/2014', '25/02/2014', 100)
) x(a,b,c,d,e,f)
;
WITH VacationCTE AS (--This CTE grabs just the Vacation rows so we can compare and split dates from them
SELECT ID,
Employee,
Job,
StartDate,
EndDate,
Workload
FROM #Something
WHERE Job = 'Vacation'
),
NewRowsCTE AS ( --This CTE creates just new rows starting after the vacations for each banking job
SELECT a.ID,
a.Employee,
a.Job,
DATEADD (d,1,b.EndDate) AS StartDate,
a.EndDate,
a.Workload
FROM #Something a
INNER JOIN VacationCTE b
ON a.StartDate <= b.EndDate
AND a.EndDate > b.StartDate
AND a.EndDate > b.EndDate -- This is needed because if the vacation ends when the project does, there is no split row after
),
UnionCTE AS ( -- This CTE merges the new rows with the existing ones
SELECT ID,
Employee,
Job,
StartDate,
EndDate,
Workload
FROM #Something
UNION ALL
SELECT ID,
Employee,
Job,
StartDate,
EndDate,
Workload
FROM NewRowsCTE
),
FixEndDateCTE as (
SELECT CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.StartDate) AS FixID, min(d.StartDate) as StartDate
FROM UnionCTE c
LEFT OUTER JOIN VacationCTE d
ON c.StartDate < d.StartDate
AND c.EndDate >= d.StartDate
WHERE c.Job <> 'Vacation'
GROUP BY CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.StartDate)
)
SELECT row_number() OVER (ORDER BY e.Startdate),
e.Employee,
e.Job,
e.StartDate,
CASE WHEN f.StartDate IS NULL
THEN e.EndDate
ELSE DATEADD (d,-1,f.StartDate)
END,
e.Workload
FROM UnionCTE e
LEFT OUTER JOIN FixEndDateCTE f
ON (CONVERT (CHAR,e.ID)+CONVERT (CHAR,e.StartDate)) = f.FixID
ORDER BY e.StartDate
June 18, 2013 at 6:17 am
This appears to work:
SELECT
ID = ROW_NUMBER() OVER(ORDER BY MIN(DateRange)),
Employee, Job, [Workload],
StartDate = MIN(DateRange),
EndDate = MAX(DateRange)
FROM (
SELECT ID, Employee, Job, [Workload], DateRange,
Grouper = DENSE_RANK() over (order by Employee, daterange)
- DENSE_RANK() over (order by ID, Employee, daterange)
FROM (
SELECT
so.ID,
so.Employee,
Job = so.Job,
so.[Workload],
c.DateRange,
rn = DENSE_RANK() OVER(PARTITION BY c.DateRange ORDER BY CASE WHEN so.Job = 'Vacation' THEN 0 ELSE 1 END)
FROM #Something so
CROSS APPLY (SELECT RangeStart = MIN(StartDate), RangeEnd = MAX(EndDate) FROM #Something si) r
CROSS APPLY dbo.IF_Calendar (r.RangeStart,r.RangeEnd,'monday') c
WHERE c.DateRange BETWEEN so.StartDate AND so.EndDate
) d
WHERE rn = 1
) q
GROUP BY ID, Employee, Job, [Workload], Grouper
Here's the function definition:
CREATE FUNCTION [dbo].[IF_Calendar]
(
@StartDate DATE,
@EndDate DATE,
@FirstWeekDay VARCHAR(10)
)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
(
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a CROSS JOIN E1 b), --10E+2 or 100 rows
E3(N) AS (SELECT 1 FROM E2 a CROSS JOIN E2 b CROSS JOIN E1 c), --1M rows max
iTally AS ( -- generate sufficient rows to cover startdate to enddate inclusive
SELECT TOP(1+DATEDIFF(DAY,@StartDate,@EndDate))
rn = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1
FROM E3
)
-- Do some date arithmetic
SELECT
a.DateRange,
c.[Year],
c.[Month],
c.[DayOfMonth],
c.AbsWeekno,
c.[DayName],
d.Holiday
FROM iTally
CROSS APPLY (SELECT DateRange = DATEADD(day,rn,@StartDate)) a
CROSS APPLY (VALUES ('Tuesday',1),('Wednesday',2),('Thursday',3),('Friday',4),('Saturday',5),('Sunday',6),('Monday',7)
) b (FirstWeekDay, FirstWeekdayOffset)
CROSS APPLY (
SELECT
[Year] = YEAR(a.DateRange),
[Month] = MONTH(a.DateRange),
[DayOfMonth] = DAY(a.DateRange),
AbsWeekno= DATEDIFF(day,FirstWeekdayOffset,a.DateRange)/7,
[DayName]= DATENAME(weekday,a.DateRange)
) c
CROSS APPLY (
SELECT Holiday = CASE
WHEN [Month] = 1 AND [DayOfMonth] = 1 THEN 'New Year'
WHEN [Month] = 5 AND [DayOfMonth] >= 25 AND [DayName] = 'Monday' THEN 'Memorial Day'
WHEN [Month] = 7 AND [DayOfMonth] = 4 THEN 'Independence Day'
WHEN [Month] = 9 AND [DayOfMonth] <= 7 AND [DayName] = 'Monday' THEN 'Labor Day'
WHEN [Month] = 11 AND [DayOfMonth] BETWEEN 22 AND 28 AND [DayName] = 'Thursday' THEN 'Thanksgiving Day'
WHEN [Month] = 12 AND [DayOfMonth] = 25 THEN 'Christmas Day'
ELSE NULL END
) d
WHERE b.FirstWeekDay = @FirstWeekDay
AND @EndDate IS NOT NULL
)
GO
Which you could easily replace with an inline tally table.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 18, 2013 at 11:02 am
Hi Chris,
Thanks for your response. I will test this further.
How would I include time along with a date in the function as I completely forgot to mentioned the StartDate and EndDate are datetime fields?
June 18, 2013 at 4:42 pm
A slightly related question. If I have the table as per the DDL below, how do I remove rows where the status is 'temp' for overlapping rows by date? In the example below, row with ID 5 will be removed as it overlaps with ID 4 as they are for the same person and job.
if OBJECT_ID('tempdb..#Something') is not null
drop table #Something
GO
set dateformat dmy
Go
create table #Something
(
ID int,
Employee varchar(20),
Job varchar(20),
StartDate datetime,
EndDate datetime,
[Workload] int,
[Status] varchar(30)
)
insert #Something
select *
from (Values
(1, 'John Doe', 'HSBC', '01/01/2013', '31/12/2013', 100, 'temp')
,(2, 'John Doe', 'Vacation', '17/06/2013', '21/06/2013', 100, 'confirmed')
,(3, 'John Doe', 'Vacation', '19/08/2013', '23/08/2013', 100, 'temp')
,(4, 'John Doe', 'Barclays', '01/01/2014', '31/01/2014', 50, 'confirmed')
,(5, 'John Doe', 'Barclays', '06/01/2014', '31/01/2014', 50, 'temp')
,(6, 'John Doe', 'Vacation', '13/01/2014', '25/01/2014', 100, 'confirmed')
) x(a,b,c,d,e,f,g)
select *
from #Something
July 3, 2013 at 10:49 am
Just wanted to add that the queries from SSC Veteran and ChrisM@Work are the solutions I went with. Thank you.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply