I have a following table (Contains script to generate the sample table with data):
CREATE TABLE [dbo].[Test](
[Schedule_Start_Date] [varchar](8) NULL,
[UserID] [int] NULL,
[Code] [varchar](10) NULL,
[Start_Time] [time](7) NULL,
[Duration_Min] [int] NULL
) ON [PRIMARY]
GO
INSERT [dbo].[Test] ([Schedule_Start_Date], [UserID], [Code], [Start_Time], [Duration_Min]) VALUES (N'20211030', 1562, N'Shift', CAST(N'10:00:00' AS Time), 720)
GO
INSERT [dbo].[Test] ([Schedule_Start_Date], [UserID], [Code], [Start_Time], [Duration_Min]) VALUES (N'20211030', 1562, N'Break', CAST(N'16:30:00' AS Time), 30)
GO
INSERT [dbo].[Test] ([Schedule_Start_Date], [UserID], [Code], [Start_Time], [Duration_Min]) VALUES (N'20211030', 1562, N'Lunch', CAST(N'21:00:00' AS Time), 60)
GO
INSERT [dbo].[Test] ([Schedule_Start_Date], [UserID], [Code], [Start_Time], [Duration_Min]) VALUES (N'20211030', 1172, N'Shift', CAST(N'06:00:00' AS Time), 720)
GO
INSERT [dbo].[Test] ([Schedule_Start_Date], [UserID], [Code], [Start_Time], [Duration_Min]) VALUES (N'20211030', 1172, N'Break', CAST(N'12:00:00' AS Time), 30)
GO
INSERT [dbo].[Test] ([Schedule_Start_Date], [UserID], [Code], [Start_Time], [Duration_Min]) VALUES (N'20211030', 1172, N'Lunch', CAST(N'15:00:00' AS Time), 60)
GO
INSERT [dbo].[Test] ([Schedule_Start_Date], [UserID], [Code], [Start_Time], [Duration_Min]) VALUES (N'20211101', 1066, N'Shift', CAST(N'06:00:00' AS Time), 720)
GO
INSERT [dbo].[Test] ([Schedule_Start_Date], [UserID], [Code], [Start_Time], [Duration_Min]) VALUES (N'20211101', 1066, N'Break', CAST(N'12:00:00' AS Time), 30)
GO
INSERT [dbo].[Test] ([Schedule_Start_Date], [UserID], [Code], [Start_Time], [Duration_Min]) VALUES (N'20211101', 1066, N'Lunch', CAST(N'15:00:00' AS Time), 60)
GO
Schedule_Start_Date UserID Code Start_Time Duration_min
20211030 1562 Shift 10:00 720
20211030 1562 Break 16:30 30
20211030 1562 Lunch 21:00 60
20211030 1172 Shift 6:00 720
20211030 1172 Break 12:00 30
20211030 1172 Lunch 15:00 60
20211101 1066 Shift 6:00 720
20211101 1066 Break 12:00 30
20211101 1066 Lunch 15:00 60
Expected Result:
Schedule_Start_Date UserID Code Start_Time End_Time
20211030 1562 Shift 10:00 16:30
20211030 1562 Break 16:30 17:00
20211030 1562 Shift 17:00 21:00
20211030 1562 Lunch 21:00 22:00
20211030 1172 Shift 6:00 12:00
20211030 1172 Break 12:00 12:30
20211030 1172 Shift 12:30 15:00
20211030 1172 Lunch 15:00 16:00
20211030 1172 Shift 16:00 18:00
20211101 1066 Shift 6:00 12:00
20211101 1066 Break 12:00 12:30
20211101 1066 Shift 12:30 15:00
20211101 1066 Lunch 15:00 16:00
20211101 1066 Shift 16:00 18:00
Much appreciated if anyone can give me a hand on coming up with a T-SQL statement that can generate the above result from the Test table.
Many thanks!
December 3, 2021 at 2:14 am
Are you just adding [Duration] minutes to [CodeStartTime]? If so, then use DATEADD
select dateadd(minute,12,'6/1/2020 11:15')
December 3, 2021 at 4:38 am
>>
I have a following table ..<<
No this is not a table at all. By definition, a table must have a key, yet all your columns can be NOT NULL, so there's no way this thing can ever have a key. It's basically at a deck of 1960s punchcards written using SQL badly . Your names and datatypes also violate ISO and other standards. We have a date data type, and yet you put a date and variable character columns. First, let's try and get this mess cleaned up a bit. Okay?
We have a DATETIME2(n) data type now, so there's no reason to commit the design flaw called attribute splitting by putting a date and a time in separate columns. You also do't seem to understand there's no such thing as a generic magic universal "code"; it has to be the code of something in particular, such as a postal address. Properly design codes are almost never variable length. Because identifiers are on the nominal scale, by their very nature, they cannot be numeric datatypes. While it is dull and boring, you might want to actually read a good book on basic schema design and data modeling.
I doubt that you actually measure your events down to the nano second. I also believe that in the event could cross a day. Therefore, the use of theTIME(n) data type is a really big mistake.
CREATE TABLE User_Schedules
(event_start_timestamp DATETIME2(0) NOT NULL,
event_end_timestamp DATETIME2(0) NOT NULL,
CHECK (event_start_timestamp <= event_end_timestamp)
user_id CHAR(5) NOT NULL
REFERENCES Users
ON DELETE CASCADE,
event_code VARCHAR(10) NOT NULL
CHECK (event_code IN ('Shift', 'Break', ..),
PRIMARY KEY (event_start_timestamp, event_end_timestamp. user_id));
Please put using the old Sybase syntax for the insertion statement. It will not port and it means you have to put your data in one row at a time. If you use a values clause table constructor, you can insert an entire constructed table and the Optimizer can do something with those insertions because it's in one statement. Also the correct syntax is INSERT INTO and the old Sybase short hands.
INSERT INTO User_Schedules
VALUES ('2021-10-30 10:00:00', '2021-10-30 22:00:00', '01562', 'Shift'),
... ;
Basically, your "Expected Result" is what you should be putting in the table in the first -place. If you need to you can write a procedure that will do the temporal math in T SQL for you. Brent has a good rule of thumb that you ought to "store data the way you use it and use the data the way it is stored."
Please post DDL and follow ANSI/ISO standards when asking for help.
December 3, 2021 at 5:45 am
I really do appreciate your comment and spending the time to explain about proper database design and industry best practices.
Maybe I should have spent more time explaining the background of the problem.
The table that I provided is a simplified version of a table from a scheduling application database. This scheduling application is something that our company bought so I have no control on what the column names and data types should be. The original table contains a lot more columns than I have provided in my example and it does contain primary keys. I just did not include it as it is not needed for what I am asking here since we are only dealing with 9 rows of data.
The challenge that I am facing is, we have a need to export this data in a format that contains end time for another process to import this data to another system. However, end time is not being stored in this scheduling application database. So I need to come up with a query to calculate / derive the end time using start time and duration for each codes.
Example:
Schedule_Start_Date UserID Code Start_Time Duration_min
20211030 1562 Shift 10:00 720
20211030 1562 Break 16:30 30
20211030 1562 Lunch 21:00 60
On 2021-10-30, UserID 1562 has a regular shift starts at 10:00 and the shift is 720minutes in total (12hours). This user took a 30 mins break at 16:30 and lunch for 1 hour at 21:00.
I need the output for this user in the following format:
Schedule_Start_Date UserID Code Start_Time End_Time
20211030 1562 Shift 10:00 16:30
20211030 1562 Break 16:30 17:00
20211030 1562 Shift 17:00 21:00
20211030 1562 Lunch 21:00 22:00
I really hope someone can come up with a way that this can be achieved with a T-SQL as I am trying to avoid using SSIS package to do this data transformation.
Many thanks.
December 3, 2021 at 6:12 am
So what part of DATEADD didn't work for you?
If you're adding values from the previous record, you need to look at the windowing function LAG().
And if you need to do date math (for times that cross midnight), you may have to convert your ScheduleStartDate to a proper date so you can use DATEDIFF...
December 3, 2021 at 6:21 am
Sorry, I am really new to SQL. If you can give me a TSQL that can produce the output format I need from my provided sample data, I would be much appreciated.
December 3, 2021 at 6:46 am
Hate to break it to you, but the way you get better at SQL is the same way you get better at anything else. By trying, maybe failing, and learning along the way. Show what you tried and people will help you If you don't, it's gonna be a hard road. Post your attempt. If you want to get the previous record's value, use LAG(). Look it up in the documentation/Books OnLine (BOL).
So I'll repeat myself. What did you try? Post your code.
December 3, 2021 at 6:47 pm
I think this will work. I have NOT yet adjusted it for performance (or tuned it at all yet, really), I was just trying to make it work. Let me know (1) if this gives you the results you need (2) if it performs well enough for you to be able to use.
;WITH Cte_Test_With_Row_Nums AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY UserID, Schedule_Start_Date ORDER BY Start_Time) AS row_num
FROM dbo.Test
)
SELECT ca1.*
FROM Cte_Test_With_Row_Nums CT
LEFT OUTER JOIN Cte_Test_With_Row_Nums CT_Next ON CT_Next.UserID = CT.UserID AND CT_Next.Schedule_Start_Date = CT.Schedule_Start_Date AND CT_Next.row_num = CT.row_num + 1
CROSS APPLY (
SELECT CT.Schedule_Start_Date, CT.UserID, CT.Code,
DATEADD(MINUTE, DATEDIFF(MINUTE, 0, CT.Start_Time), CAST(CT.Schedule_Start_Date AS datetime)) AS Start_Time,
CASE WHEN CT.Code = 'Shift'
THEN DATEADD(MINUTE, DATEDIFF(MINUTE, 0, CT_Next.Start_Time), CAST(CT.Schedule_Start_Date AS datetime))
ELSE DATEADD(MINUTE, DATEDIFF(MINUTE, 0, CT.Start_Time) + CT.Duration_Min, CAST(CT.Schedule_Start_Date AS datetime)) END AS End_Time
UNION ALL
SELECT CT.Schedule_Start_Date, CT.UserID, 'Shift',
DATEADD(MINUTE, DATEDIFF(MINUTE, 0, CT.Start_Time) + CT.Duration_Min, CAST(CT.Schedule_Start_Date AS datetime)) AS Start_Time,
CASE WHEN CT_Next.Start_Time IS NULL
THEN (SELECT DATEADD(MINUTE, DATEDIFF(MINUTE, 0, CT_Shift.Start_Time) + CT_Shift.Duration_Min, CAST(CT_Shift.Schedule_Start_Date AS datetime))
FROM Cte_Test_With_Row_Nums CT_Shift WHERE CT_Shift.UserID = CT.UserID AND CT_Shift.Schedule_Start_Date = CT.Schedule_Start_Date AND CT_Shift.row_num = 1)
ELSE DATEADD(MINUTE, DATEDIFF(MINUTE, 0, CT_Next.Start_Time), CAST(CT.Schedule_Start_Date AS datetime)) END AS End_Time
WHERE CT.Code <> 'Shift' AND DATEADD(MINUTE, DATEDIFF(MINUTE, 0, CT.Start_Time) + CT.Duration_Min, CAST(CT.Schedule_Start_Date AS datetime)) <
(SELECT DATEADD(MINUTE, DATEDIFF(MINUTE, 0, CT_Shift.Start_Time) + CT_Shift.Duration_Min,
CAST(CT_Shift.Schedule_Start_Date AS datetime)) FROM Cte_Test_With_Row_Nums CT_Shift WHERE CT_Shift.UserID = CT.UserID AND
CT_Shift.Schedule_Start_Date = CT.Schedule_Start_Date AND CT_Shift.row_num = 1)
) AS ca1
ORDER BY CT.UserID, CT.Schedule_Start_Date, CT.Start_Time
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
December 4, 2021 at 1:04 am
Thank you ScottPletcher. You are awesome! I will need to spend some time figuring out how you do this.
Performance does not matter in my case since I am only dealing with a few thousand records per day.
Once again, thank you very much and for everyone who has helped!
December 4, 2021 at 4:43 am
This was an interesting problem because for 'Shift' rows, you need to output only one row. But for non-'Shift' rows, you (mostly) need to output two rows: a row for the current code, and a generated 'Shift' row to complete the time from the current end time to the next activity.
Traditional joins don't do that at all (rather, they combine rows). But CROSS APPLY (or OUTER APPLY) is great at that, since APPLYs are designed to add more rows to the output. That explains my use of CROSS APPLY here.
Now, to the actual mechanics of the CROSS APPLY. Note that the first SELECT within the CROSS APPLY always returns a row (no WHERE condition). This will be the current row from the table with the End_Time added. That's relatively straightforward, although some calc is needed to determine the correct End_Time.
The cool, tricky part is the UNION ALL and the second SELECT. Note that the second SELECT only returns a row if the code <> 'Shift". That's needed because a new 'Shift' row needs to be generated after a non-Shift row if there is time left in the Shift.
For example, based on your sample results:
Schedule_Start_Date UserID Code Start_Time End_Time
20211030 1562 Shift 10:00 16:30
20211030 1562 Break 16:30 17:00
20211030 1562 Shift 17:00 21:00 --<<--
20211030 1562 Lunch 21:00 22:00
20211030 1172 Shift 6:00 12:00
20211030 1172 Break 12:00 12:30
20211030 1172 Shift 12:30 15:00 --<<--
20211030 1172 Lunch 15:00 16:00
20211030 1172 Shift 16:00 18:00 --<<--
20211101 1066 Shift 6:00 12:00
20211101 1066 Break 12:00 12:30
20211101 1066 Shift 12:30 15:00 --<<--
20211101 1066 Lunch 15:00 16:00
20211101 1066 Shift 16:00 18:00 --<<--
The rows marked with --<<-- must be generated from scratch: they do not exist anywhere in the data set. And that's what the second SELECT does, it generates those rows.
Finally, notice the last line, 'Lunch'. It ends at 22:00, and so does the Shift. In order to prevent a row from being added after Lunch, the code must check to see if there is time left in the Shift. And that's what the condition after the AND is for. It prevents a row from being generated if there is no time left in the Shift (otherwise there would be a generated row after:
20211030 1562 Lunch 21:00 22:00
).
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
December 5, 2021 at 5:18 pm
This is so wrong. Your company is trying to have you fix the wrong problem. The real problem is the source of the data and THAT's what actually needs to be fixed. For example, if you look at the first two rows, 10:00 + 720 minutes (which is exactly 6 hours) cannot lead to a break start time of 10:30 on the next line.
That means that no matter how you swing it, this system is producing incorrect information. Since it's a 3rd party system, I'd ask them what's going on an how to fix it. Once that's done, conversion of starttime and duration is a cake walk. But, again, the scheduling system is what is wrong and it shouldn't be "fixed" by creating artificial output that isn't supportable according to the source data. And, just think of what people are getting in the scheduling system.
The bottom line is that the scheduling system is seriously broken and that's what needs to be fixed. Can we "warp" the output using T-SQL? Yes. But it's absolutely the wrong thing to do in more ways than 1. The 3rd party should be brought to task so that the scheduled times in the 3rd party app are actually correct! Or, perhaps, there's a serious training issue that has been missed by the folks that are entering data BUT the app should pick up on those errors and not allow a commit until the data is correct. Seriously! A BREAK and then a LUNCH with no entry available between? That's not right and people using the app for other things are getting the wrong answers, as well!
Get the 3rd party to fix the app.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 6, 2021 at 5:16 am
This is so wrong. Your company is trying to have you fix the wrong problem. The real problem is the source of the data and THAT's what actually needs to be fixed. For example, if you look at the first two rows, 10:00 + 720 minutes (which is exactly 6 hours) cannot lead to a break start time of 10:30 on the next line.
Huh? I think it's clear that the first row is the entire shift duration, and the next row is simply the first break that occurred in the shift.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
December 6, 2021 at 3:04 pm
It might be more efficient to avoid joins:
WITH Grps
AS
(
--Use ShiftEnd as the group as the value will be needed anyway.
SELECT T.UserID, S.StartTime, T.Code, E.EndTime
,MAX(CASE WHEN T.Code = 'Shift' THEN E.EndTime ELSE '1900' END) OVER (PARTITION BY UserID ORDER BY S.StartTime) AS ShiftEnd
FROM dbo.Test T
CROSS APPLY
(
VALUES
(
CAST(T.Schedule_Start_date AS datetime) + CAST(T.Start_Time AS datetime)
)
) S (StartTime)
CROSS APPLY
(
VALUES
(
DATEADD(minute, T.Duration_Min, S.StartTime)
)
) E (EndTime)
)
,NextStarts
AS
(
SELECT UserID, Code, StartTime, EndTime, ShiftEnd
,LEAD(StartTime) OVER (PARTITION BY UserID, ShiftEnd ORDER BY StartTime) AS NextStartTime
FROM Grps
)
,Results
AS
(
SELECT N.UserID
,CASE WHEN R.RowType = 'Dup' THEN 'Shift' ELSE N.Code END AS Code
,CASE
WHEN R.RowType = 'Org'
THEN StartTime
WHEN ISNULL(NextStartTime, ShiftEnd) < EndTime
THEN ISNULL(NextStartTime, ShiftEnd)
ELSE EndTime
END AS StartTime
,CASE
WHEN R.RowType = 'Org' AND ISNULL(NextStartTime, ShiftEnd) < EndTime
THEN ISNULL(NextStartTime, ShiftEnd)
WHEN R.RowType = 'Org'
THEN EndTime
ELSE ISNULL(NextStartTime, ShiftEnd)
END AS EndTime
FROM NextStarts N
CROSS APPLY
(
VALUES ('Org'), ('Dup')
) R (RowType)
)
SELECT CONVERT(varchar(8), StartTime, 112) AS Schedule_Start_Date
,UserID
,Code
,CAST(StartTime AS time) AS Start_Time
,CAST(EndTime AS time) AS End_Time
FROM Results
WHERE NOT (Code = 'Shift' AND StartTime = EndTime);
December 6, 2021 at 4:18 pm
Jeff Moden wrote:This is so wrong. Your company is trying to have you fix the wrong problem. The real problem is the source of the data and THAT's what actually needs to be fixed. For example, if you look at the first two rows, 10:00 + 720 minutes (which is exactly 6 hours) cannot lead to a break start time of 10:30 on the next line.
Huh? I think it's clear that the first row is the entire shift duration, and the next row is simply the first break that occurred in the shift.
Huh, hell. 😀 10:00 +720 minutes is NOT 10:30 like the second line says. That system is broken.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 6, 2021 at 4:28 pm
ScottPletcher wrote:Jeff Moden wrote:This is so wrong. Your company is trying to have you fix the wrong problem. The real problem is the source of the data and THAT's what actually needs to be fixed. For example, if you look at the first two rows, 10:00 + 720 minutes (which is exactly 6 hours) cannot lead to a break start time of 10:30 on the next line.
Huh? I think it's clear that the first row is the entire shift duration, and the next row is simply the first break that occurred in the shift.
Huh, hell. 😀 10:00 +720 minutes is NOT 10:30 like the second line says. That system is broken.
Again, the first entry is the ENTIRE shift time, NOT the time until the first break.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 15 posts - 1 through 15 (of 22 total)
You must be logged in to reply to this topic. Login to reply