SQL to split row by date (split into multiple rows)

Question

Post reply

SQL to split row by date (split into multiple rows)

SQLStalker

Mr or Mrs. 500

Points: 557
More actions
June 12, 2013 at 3:42 am

#278419

I am looking for help with splitting a row into multiple rows based on dates overlapping.
As an example, I have a table with the following data:
Row ID, Employee, Job, Start Date, End Date, Workload
1, John Doe, HSBC, 01/01/2013, 31/12/2013, 100
2, John Doe, Vacation, 17/06/2013, 21/06/2013, 100
3, John Doe, Vacation, 19/08/2013, 23/08/2013, 100
4, John Doe, Barclays, 01/01/2014, 31/01/2014, 50
5, John Doe, Santander, 06/01/2014, 25/01/2014, 50
6, John Doe, Vacation, 13/01/2014, 17/01/2014, 100
I am looking to split the banking rows where they overlap with a vacation. So for example, the final result should be:
Row ID, Employee, Job, Start Date, End Date, Workload
1, John Doe, HSBC, 01/01/2013, 16/06/2013, 100
2, John Doe, Vacation, 17/06/2013, 21/06/2013, 100
3, John Doe, HSBC, 22/06/2013, 18/08/2013, 100
4, John Doe, Vacation, 19/08/2013, 23/08/2013, 100
5, John Doe, HSBC, 24/08/2013, 31/12/2013, 100
6, John Doe, Barclays, 01/01/2014, 12/01/2014, 50
7, John Doe, Barclays, 18/01/2014, 31/01/2014, 50
8, John Doe, Santander, 06/01/2014, 12/01/2014, 50
9, John Doe, Santander, 18/01/2014, 25/01/2014, 50
10, John Doe, Vacation, 13/01/2014, 17/01/2014, 100
New rows after split are in bold. The Row ID should be unique although it doesn't need to be sequential.
Any help or guidance would be appreciated.

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

WolfgangE SSCertifiable Points: 6328 More actions · Answer 1

hi,

I tried to understand your example but I didn't.

I have no idea how you get to the date values for the added rows. And what about the first row in your example, the date values have changed there too.

Please try to explain it again, maybe more detailled.

By the way, please read the following article and provide your example data a little bit more comfortable 😉

http://www.sqlservercentral.com/articles/Best+Practices/61537/

Sean Lange SSC Guru Points: 286609 More actions · Answer 2

I understand your requirements. This is a bit challenging. Since you are brand new around here I went ahead and put together some ddl and sample data in a consumable format so you can see how you should post this type of thing.

Here it is:

set dateformat dmy

if OBJECT_ID('tempdb..#Something') is not null

drop table #Something

create table #Something

(

ID int,

Employee varchar(20),

Job varchar(20),

StartDate datetime,

EndDate datetime,

Workload int

)

insert #Something

select *

from (Values

(1, 'John Doe', 'HSBC', '01/01/2013', '31/12/2013', 100)

,(2, 'John Doe', 'Vacation', '17/06/2013', '21/06/2013', 100)

,(3, 'John Doe', 'Vacation', '19/08/2013', '23/08/2013', 100)

,(4, 'John Doe', 'Barclays', '01/01/2014', '31/01/2014', 50)

,(5, 'John Doe', 'Santander', '06/01/2014', '25/01/2014', 50)

,(6, 'John Doe', 'Vacation', '13/01/2014', '17/01/2014', 100)

) x(a,b,c,d,e,f)

select *

from #Something

So now we all have something to work with. I will see if I can come up with something that will work. Meantime others can work on this too now that they have details to work with.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

SQLStalker Mr or Mrs. 500 Points: 557 More actions · Answer 3

Hi WolfgangE and Sean Lange,

Thank you for your responses.

WolfgangE, sorry should have been more clear in my requirements.

We have a system where an employee can be booked on a job for long periods, so in my crude example, John Doe is booked to HSBC for all of 2013. However, in between, employees can be on vacation too so John Doe has 2 vacations during the time he is booked on HSBC.

What we are looking to achieve is to split the job either side of vacation bookings so it all slots in as opposed to being stacked. I attach a screenshot as an example.

Sean Lange, thanks for the DDL. I should have provided that in the first place.

I look forward to your guidance and assistance.

Nevyn SSChampion Points: 14210 More actions · Answer 4

Tricky.

Could you clarify your first sample output from above? With HSBC you intersperse the vacation with the new rows of HSBC. With the other banks you show the vacation AFTER the banking records.

I am pretty close to a solution (its not very pretty), but assuming the output order matters I need to understand the sorting rules.

WolfgangE SSCertifiable Points: 6328 More actions · Answer 5

thank goodness someone else tries, I dont' find a solution yet...

Nevyn SSChampion Points: 14210 More actions · Answer 6

Here is what I have. Sort order is not the same as the original post suggested, so we could still use a clairification on those rules.

There is likely a cleaner way to do this, but this is my first pass. Note that it can get tripped up if a job and a vacation start on the same date, so if that is an issue let me know.

set dateformat dmy

if OBJECT_ID('tempdb..#Something') is not null

drop table #Something

create table #Something

(

ID int,

Employee varchar(20),

Job varchar(20),

StartDate datetime,

EndDate datetime,

Workload int

)

insert #Something

select *

from (Values

(1, 'John Doe', 'HSBC', '01/01/2013', '31/12/2013', 100)

,(2, 'John Doe', 'Vacation', '17/06/2013', '21/06/2013', 100)

,(3, 'John Doe', 'Vacation', '19/08/2013', '23/08/2013', 100)

,(4, 'John Doe', 'Barclays', '01/01/2014', '31/01/2014', 50)

,(5, 'John Doe', 'Santander', '06/01/2014', '25/01/2014', 50)

,(6, 'John Doe', 'Vacation', '13/01/2014', '17/01/2014', 100)

) x(a,b,c,d,e,f)

;

WITH VacationCTE AS (--This CTE grabs just the Vacation rows so we can compare and split dates from them

SELECT ID,

Employee,

Job,

StartDate,

EndDate,

Workload

FROM #Something

WHERE Job = 'Vacation'

),

NewRowsCTE AS ( --This CTE creates just new rows starting after the vacations for each banking job

SELECT a.ID,

a.Employee,

a.Job,

DATEADD (d,1,b.EndDate) AS StartDate,

a.EndDate,

a.Workload

FROM #Something a

INNER JOIN VacationCTE b

ON a.StartDate <= b.StartDate

AND a.EndDate > b.StartDate

AND a.EndDate > b.EndDate -- This is needed because if the vacation ends when the project does, there is no split row after

),

UnionCTE AS ( -- This CTE merges the new rows with the existing ones

SELECT ID,

Employee,

Job,

StartDate,

EndDate,

Workload

FROM #Something

UNION ALL

SELECT ID,

Employee,

Job,

StartDate,

EndDate,

Workload

FROM NewRowsCTE

),

FixEndDateCTE as (

SELECT CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.StartDate) AS FixID, min(d.StartDate) as StartDate

FROM UnionCTE c

LEFT OUTER JOIN VacationCTE d

ON c.StartDate < d.StartDate

AND c.EndDate >= d.StartDate

WHERE c.Job <> 'Vacation'

GROUP BY CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.StartDate)

)

SELECT row_number() OVER (ORDER BY e.Startdate),

e.Employee,

e.Job,

e.StartDate,

CASE WHEN f.StartDate IS NULL

THEN e.EndDate

ELSE DATEADD (d,-1,f.StartDate)

END,

e.Workload

FROM UnionCTE e

LEFT OUTER JOIN FixEndDateCTE f

ON (CONVERT (CHAR,e.ID)+CONVERT (CHAR,e.StartDate)) = f.FixID

ORDER BY e.StartDate

It is tricky because you need to separate vacations from other rows ( treated differently), create new rows post vacation, and update both the existing and new rows with corrected end dates. Probably far easier to do within a stored procedure where you can use a temp table and take multiple steps, but I tried to do it within a query.

SQLStalker Mr or Mrs. 500 Points: 557 More actions · Answer 7

Hi SSC Veteran,

Thank you very much for your response.

Your DDL appears to be working. I am testing further.

The IDs for existing rows can remain the same with new IDs allocated to new rows (after split). No particular order is required.

I really appreciate your suggestion.

SQLStalker Mr or Mrs. 500 Points: 557 More actions · Answer 8

Hi All,

Thanks to SSC Veteran's assistance, I am close to the correct result set.

During testing, I found that when 2 rows overlap, duplicate rows appear. An example is:

John Smith, Vacation, 12/02/2014, 25/02/2014

John Smith, Natwest, 14/02/2014, 28/02/2014

The expected result should be the Natwest job start date is the next day after vacation:

John Smith, Vacation, 12/02/2014, 25/02/2014

John Smith, Natwest, 26/02/2014, 28/02/2014

However, I see the following:

John Smith, Vacation, 12/02/2014, 25/02/2014

John Smith, Natwest, 14/02/2014, 28/02/2014

John Smith, Natwest, 26/02/2014, 28/02/2014

Any pointers would be appreciated. Here is the DDL.

set dateformat dmy

if OBJECT_ID('tempdb..#Something') is not null

drop table #Something

create table #Something

(

ID int,

Employee varchar(20),

Job varchar(20),

StartDate datetime,

EndDate datetime,

Workload int

)

insert #Something

select *

from (Values

(1, 'John Doe', 'HSBC', '01/01/2013', '31/12/2013', 100)

,(2, 'John Doe', 'Vacation', '17/06/2013', '21/06/2013', 100)

,(3, 'John Doe', 'Vacation', '19/08/2013', '23/08/2013', 100)

,(4, 'John Doe', 'Barclays', '01/01/2014', '31/01/2014', 50)

,(5, 'John Doe', 'Santander', '06/01/2014', '25/01/2014', 50)

,(6, 'John Doe', 'Vacation', '13/01/2014', '17/01/2014', 100)

,(7, 'John Doe', 'Natwest', '14/02/2014', '28/02/2014', 100)

,(8, 'John Doe', 'Vacation', '12/02/2014', '25/02/2014', 100)

) x(a,b,c,d,e,f)

;

WITH VacationCTE AS (--This CTE grabs just the Vacation rows so we can compare and split dates from them

SELECT ID,

Employee,

Job,

StartDate,

EndDate,

Workload

FROM #Something

WHERE Job = 'Vacation'

),

NewRowsCTE AS ( --This CTE creates just new rows starting after the vacations for each banking job

SELECT a.ID,

a.Employee,

a.Job,

DATEADD (d,1,b.EndDate) AS StartDate,

a.EndDate,

a.Workload

FROM #Something a

INNER JOIN VacationCTE b

ON a.StartDate <= b.EndDate

AND a.EndDate > b.StartDate

AND a.EndDate > b.EndDate -- This is needed because if the vacation ends when the project does, there is no split row after

),

UnionCTE AS ( -- This CTE merges the new rows with the existing ones

SELECT ID,

Employee,

Job,

StartDate,

EndDate,

Workload

FROM #Something

UNION ALL

SELECT ID,

Employee,

Job,

StartDate,

EndDate,

Workload

FROM NewRowsCTE

),

FixEndDateCTE as (

SELECT CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.StartDate) AS FixID, min(d.StartDate) as StartDate

FROM UnionCTE c

LEFT OUTER JOIN VacationCTE d

ON c.StartDate < d.StartDate

AND c.EndDate >= d.StartDate

WHERE c.Job <> 'Vacation'

GROUP BY CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.StartDate)

)

SELECT row_number() OVER (ORDER BY e.Startdate),

e.Employee,

e.Job,

e.StartDate,

CASE WHEN f.StartDate IS NULL

THEN e.EndDate

ELSE DATEADD (d,-1,f.StartDate)

END,

e.Workload

FROM UnionCTE e

LEFT OUTER JOIN FixEndDateCTE f

ON (CONVERT (CHAR,e.ID)+CONVERT (CHAR,e.StartDate)) = f.FixID

ORDER BY e.StartDate

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 9

This appears to work:

SELECT

ID = ROW_NUMBER() OVER(ORDER BY MIN(DateRange)),

Employee, Job, [Workload],

StartDate = MIN(DateRange),

EndDate = MAX(DateRange)

FROM (

SELECT ID, Employee, Job, [Workload], DateRange,

Grouper = DENSE_RANK() over (order by Employee, daterange)

- DENSE_RANK() over (order by ID, Employee, daterange)

FROM (

SELECT

so.ID,

so.Employee,

Job = so.Job,

so.[Workload],

c.DateRange,

rn = DENSE_RANK() OVER(PARTITION BY c.DateRange ORDER BY CASE WHEN so.Job = 'Vacation' THEN 0 ELSE 1 END)

FROM #Something so

CROSS APPLY (SELECT RangeStart = MIN(StartDate), RangeEnd = MAX(EndDate) FROM #Something si) r

CROSS APPLY dbo.IF_Calendar (r.RangeStart,r.RangeEnd,'monday') c

WHERE c.DateRange BETWEEN so.StartDate AND so.EndDate

) d

WHERE rn = 1

) q

GROUP BY ID, Employee, Job, [Workload], Grouper

Here's the function definition:

CREATE FUNCTION [dbo].[IF_Calendar]

(

@StartDate DATE,

@EndDate DATE,

@FirstWeekDay VARCHAR(10)

)

RETURNS TABLE WITH SCHEMABINDING AS

RETURN

(

WITH E1(N) AS (

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

), --10E+1 or 10 rows

E2(N) AS (SELECT 1 FROM E1 a CROSS JOIN E1 b), --10E+2 or 100 rows

E3(N) AS (SELECT 1 FROM E2 a CROSS JOIN E2 b CROSS JOIN E1 c), --1M rows max

iTally AS ( -- generate sufficient rows to cover startdate to enddate inclusive

SELECT TOP(1+DATEDIFF(DAY,@StartDate,@EndDate))

rn = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1

FROM E3

)

-- Do some date arithmetic

SELECT

a.DateRange,

c.[Year],

c.[Month],

c.[DayOfMonth],

c.AbsWeekno,

c.[DayName],

d.Holiday

FROM iTally

CROSS APPLY (SELECT DateRange = DATEADD(day,rn,@StartDate)) a

CROSS APPLY (VALUES ('Tuesday',1),('Wednesday',2),('Thursday',3),('Friday',4),('Saturday',5),('Sunday',6),('Monday',7)

) b (FirstWeekDay, FirstWeekdayOffset)

CROSS APPLY (

SELECT

[Year] = YEAR(a.DateRange),

[Month] = MONTH(a.DateRange),

[DayOfMonth] = DAY(a.DateRange),

AbsWeekno= DATEDIFF(day,FirstWeekdayOffset,a.DateRange)/7,

[DayName]= DATENAME(weekday,a.DateRange)

) c

CROSS APPLY (

SELECT Holiday = CASE

WHEN [Month] = 1 AND [DayOfMonth] = 1 THEN 'New Year'

WHEN [Month] = 5 AND [DayOfMonth] >= 25 AND [DayName] = 'Monday' THEN 'Memorial Day'

WHEN [Month] = 7 AND [DayOfMonth] = 4 THEN 'Independence Day'

WHEN [Month] = 9 AND [DayOfMonth] <= 7 AND [DayName] = 'Monday' THEN 'Labor Day'

WHEN [Month] = 11 AND [DayOfMonth] BETWEEN 22 AND 28 AND [DayName] = 'Thursday' THEN 'Thanksgiving Day'

WHEN [Month] = 12 AND [DayOfMonth] = 25 THEN 'Christmas Day'

ELSE NULL END

) d

WHERE b.FirstWeekDay = @FirstWeekDay

AND @EndDate IS NOT NULL

)

GO

Which you could easily replace with an inline tally table.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

SQLStalker Mr or Mrs. 500 Points: 557 More actions · Answer 10

Hi Chris,

Thanks for your response. I will test this further.

How would I include time along with a date in the function as I completely forgot to mentioned the StartDate and EndDate are datetime fields?

SQLStalker Mr or Mrs. 500 Points: 557 More actions · Answer 11

A slightly related question. If I have the table as per the DDL below, how do I remove rows where the status is 'temp' for overlapping rows by date? In the example below, row with ID 5 will be removed as it overlaps with ID 4 as they are for the same person and job.

if OBJECT_ID('tempdb..#Something') is not null

drop table #Something

GO

set dateformat dmy

Go

create table #Something

(

ID int,

Employee varchar(20),

Job varchar(20),

StartDate datetime,

EndDate datetime,

[Workload] int,

[Status] varchar(30)

)

insert #Something

select *

from (Values

(1, 'John Doe', 'HSBC', '01/01/2013', '31/12/2013', 100, 'temp')

,(2, 'John Doe', 'Vacation', '17/06/2013', '21/06/2013', 100, 'confirmed')

,(3, 'John Doe', 'Vacation', '19/08/2013', '23/08/2013', 100, 'temp')

,(4, 'John Doe', 'Barclays', '01/01/2014', '31/01/2014', 50, 'confirmed')

,(5, 'John Doe', 'Barclays', '06/01/2014', '31/01/2014', 50, 'temp')

,(6, 'John Doe', 'Vacation', '13/01/2014', '25/01/2014', 100, 'confirmed')

) x(a,b,c,d,e,f,g)

select *

from #Something

SQLStalker Mr or Mrs. 500 Points: 557 More actions · Answer 12

Just wanted to add that the queries from SSC Veteran and ChrisM@Work are the solutions I went with. Thank you.