November 27, 2013 at 6:38 am
We had to solve the same problem in our business, and the first solution we adopted was as shown in the article, but the strain on the db and the sheer amount of data proved to be too much for us (too many rows in the table, like millions and the query took minutes to run)
Then we considered the fact that a row A does not overlap another row B if the former ends before the latter starts or vice versa (and one row does not overlap with itself by definition)
This leads to the idea of using a self join on the table on the Person ID field and then discarding the rows that do not overlap .
To do this we needed a primary key on the table (in fact we already had it)
So we had something like this:
CREATE TABLE #Schedule
(
ID int identity(1,1) --primary key
, PersonId INT NOT NULL
, Activity VARCHAR(150) NOT NULL
, StartDate DATETIME NOT NULL
, EndDate DATETIME NOT NULL
, CHECK ( EndDate > StartDate )
)
INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)
VALUES ( 1, 'Piano Concert', '2013-11-20 18:00', '2013-11-20 18:20' )
INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)
VALUES ( 1, 'Weeding Anniversary Dinner', '2013-12-09 19:00',
'2013-12-09 23:30' )
INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)
VALUES ( 1, 'Work Meeting', '2013-12-09 18:00', '2013-12-09 19:15' )
-- Oooops! it overlaps with the weeding anniversary
INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)
VALUES ( 2, 'Football Night', '2013-11-20 17:00', '2013-11-20 20:30' )
INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)
VALUES ( 2, 'Reading Club', '2013-12-09 19:00', '2013-12-09 20:30' )
select A.Activity + ' OVERLAPS ' +B.Activity + ' for Person: ' +cast(A.PersonId as varchar(20)) as Problem
from #Schedule as A
inner join #Schedule as B
on A.PersonId=B.PersonId
where not (A.EndDate < B.StartDate or B.EndDate<A.StartDate)
and A.ID<>B.id
This gives us 2 rows: the wedding anniversary overlaps with
Weeding Anniversary Dinner OVERLAPS Work Meeting for Person: 1
Work Meeting OVERLAPS Weeding Anniversary Dinner for Person: 1
we can get one row easily as below
select A.Activity + ' OVERLAPS ' +B.Activity + ' for Person: ' +cast(A.PersonId as varchar(20)) as Problem
from #Schedule as A
inner join #Schedule as B
on A.PersonId=B.PersonId
where not (A.EndDate < B.StartDate or B.EndDate<A.StartDate)
and B.id>a.id--HERE I GET ONLY THE LAST EVENT ADDED THAT OVERLAPPED WITH THE PREVIOUS ONE
We found that this performs lots better for us.
I am interested if somebody has some considerations on the matter (I fear that my solution could have hidden pitfalls I didn't see) [Smile]
November 27, 2013 at 9:23 am
Nadrek (11/26/2013)
This is an interesting take on temporal constraints, but have you tested it with large sets, i.e. perhaps millions of rows of data where the time ranges are measured in years or decades?
Hi, thank you. Doing some math i figured out that 3 million rows of 1 decade of time span with minutes granularity will need 15 768 000 million rows consuming about 314000 GB of space, and 2 million rows of 5 hours interval needs 286 GB .It seems that this approach is not suitable for millions of rows or long time spans with short granularity. In my field of work we might use about 400K rows of 1 day span with minutes precision, we would need 11 GB, which is little for some, but much for others.
I will dome some performance tests on the last and get back with the information.
Thanks for you comments.
November 27, 2013 at 9:30 am
jjturner (11/26/2013)
This is a decent workaround for something SQL Server doesn't seem to adequately support (afaik): temporal awarenessNot to plug another rdbms, but the one starting with "p" has gone a long ways towards implementing duration constraints for overlaps and other types of non-/partial-/extra-equi comparisons (or "Exclusion Constraints", as they're referred to).
Not only that, the PERIOD data type has been generalized to include distance or most any other type of continuous range (cf. "range types").
Using a Start and End column for a timespan has never made much sense to me. It's one datum, not two disjunct attributes relatable only by their row index.
It would be nice to see SQL Server deal seriously with this type of data.
My 2 cantakerous cents :satisfied:
John
Hi there,
Thanks for commenting. What is that other rdbms? i could not guess it he he he, I agree that it will be very good if sql server provided that functionality too. About the Start and End column , looks like it would be better if used a Duration field instead of End, the Start will be still be needed , but the (Start,Duration) pair is more cohesive.
Thanks again.
November 27, 2013 at 11:13 am
lubork (11/26/2013)
Nice simple solution. Observe the Schedule table is fully updateable (e.g. you can change the EndDate value andUPDATE Schedule SET EndDate = '2013-12-09 18:30' WHERE Activity like 'Mexico%'
will work while
UPDATE Schedule SET EndDate = '2013-12-09 19:40' WHERE Activity like 'Mexico%'
correctly fails).
The "supporting" index view may get pretty big (theoretically for one year of activities we may need up to 24*60*365 = 525600 index view entries for one year activities); no problem for SQL Server even if we need to cover several years. Things may get more demanding if we need to drop the time granularity to seconds and watch several independent timelines. I'm sure there exists other solutions; I'm thinking about Itzik's "interval" challenges and algorithms...
Hi there
You are right, the view checks every change on data, also the index size might become problematic for many many rows. I haven't read Itzik's solutions but I'm planning too , I imagine they are great. My goal was to enforce the business rule with an indexed view, and this is the only way i could make it work, because self joins, correlated queries, windowing functions and other things are not allowed in a materialized view. I believe a trigger that checks only the affected rows of a insert , update or delete is a good option.
Something like this:
ALTER TABLE Schedule ADD ScheduleId INT NOT NULL IDENTITY(1,1)
GO
CREATE INDEX IX_ScheduleId ON dbo.Schedule(ScheduleId)
GO
CREATE TRIGGER [dbo].[AvoidOverlap] ON [dbo].[Schedule]
AFTER INSERT,UPDATE,DELETE
NOT FOR REPLICATION
AS
BEGIN
IF EXISTS ( SELECT s.PersonId ,
MinuteNeeded = DATEADD(MINUTE,n.Number -1,s.StartDate) ,
COUNT(*)
FROM dbo.Schedule s
INNER JOIN dbo.Numbers n
ON n.Number BETWEEN 1 AND DATEDIFF(MINUTE,s.StartDate ,s.EndDate )
WHERE s.ScheduleId IN (SELECT ScheduleId FROM INSERTED
UNION
SELECT ScheduleId FROM DELETED)
GROUP BY s.PersonId , DATEADD(MINUTE,n.Number -1,s.StartDate)
HAVING COUNT(*) > 1
)
BEGIN
RAISERROR ('Overlapping',14,1)
ROLLBACK TRANSACTION
END
END
The code is to express and idea, i have not tested it, but I'm pretty sure it works =D.
November 27, 2013 at 11:15 am
lubork (11/26/2013)
Nice simple solution. Observe the Schedule table is fully updateable (e.g. you can change the EndDate value andUPDATE Schedule SET EndDate = '2013-12-09 18:30' WHERE Activity like 'Mexico%'
will work while
UPDATE Schedule SET EndDate = '2013-12-09 19:40' WHERE Activity like 'Mexico%'
correctly fails).
The "supporting" index view may get pretty big (theoretically for one year of activities we may need up to 24*60*365 = 525600 index view entries for one year activities); no problem for SQL Server even if we need to cover several years. Things may get more demanding if we need to drop the time granularity to seconds and watch several independent timelines. I'm sure there exists other solutions; I'm thinking about Itzik's "interval" challenges and algorithms...
I forgot to say, Thanks for commenting =D.
November 27, 2013 at 11:27 am
adrian.facio (11/27/2013)
lubork (11/26/2013)
Nice simple solution. Observe the Schedule table is fully updateable (e.g. you can change the EndDate value andUPDATE Schedule SET EndDate = '2013-12-09 18:30' WHERE Activity like 'Mexico%'
will work while
UPDATE Schedule SET EndDate = '2013-12-09 19:40' WHERE Activity like 'Mexico%'
correctly fails).
The "supporting" index view may get pretty big (theoretically for one year of activities we may need up to 24*60*365 = 525600 index view entries for one year activities); no problem for SQL Server even if we need to cover several years. Things may get more demanding if we need to drop the time granularity to seconds and watch several independent timelines. I'm sure there exists other solutions; I'm thinking about Itzik's "interval" challenges and algorithms...
Hi there
You are right, the view checks every change on data, also the index size might become problematic for many many rows. I haven't read Itzik's solutions but I'm planning too , I imagine they are great. My goal was to enforce the business rule with an indexed view, and this is the only way i could make it work, because self joins, correlated queries, windowing functions and other things are not allowed in a materialized view. I believe a trigger that checks only the affected rows of a insert , update or delete is a good option.
Something like this:
ALTER TABLE Schedule ADD ScheduleId INT NOT NULL IDENTITY(1,1)
GO
CREATE INDEX IX_ScheduleId ON dbo.Schedule(ScheduleId)
GO
CREATE TRIGGER [dbo].[AvoidOverlap] ON [dbo].[Schedule]
AFTER INSERT,UPDATE,DELETE
NOT FOR REPLICATION
AS
BEGIN
IF EXISTS ( SELECT s.PersonId ,
MinuteNeeded = DATEADD(MINUTE,n.Number -1,s.StartDate) ,
COUNT(*)
FROM dbo.Schedule s
INNER JOIN dbo.Numbers n
ON n.Number BETWEEN 1 AND DATEDIFF(MINUTE,s.StartDate ,s.EndDate )
WHERE s.ScheduleId IN (SELECT ScheduleId FROM INSERTED
UNION
SELECT ScheduleId FROM DELETED)
GROUP BY s.PersonId , DATEADD(MINUTE,n.Number -1,s.StartDate)
HAVING COUNT(*) > 1
)
BEGIN
RAISERROR ('Overlapping',14,1)
ROLLBACK TRANSACTION
END
END
The code is to express and idea, i have not tested it, but I'm pretty sure it works =D.
I had something wrong, the trigger must check all the rows of the affected persons, not only of the affected rows. See the correction.
CREATE TRIGGER [dbo].[AvoidOverlap] ON [dbo].[Schedule]
AFTER INSERT,UPDATE,DELETE
NOT FOR REPLICATION
AS
BEGIN
IF EXISTS ( SELECT s.PersonId ,
MinuteNeeded = DATEADD(MINUTE,n.Number -1,s.StartDate) ,
COUNT(*)
FROM dbo.Schedule s
INNER JOIN dbo.Numbers n
ON n.Number BETWEEN 1 AND DATEDIFF(MINUTE,s.StartDate ,s.EndDate )
WHERE s.PersonId IN (SELECT PersonId FROM INSERTED
UNION
SELECT PersonId FROM DELETED)
GROUP BY s.PersonId , DATEADD(MINUTE,n.Number -1,s.StartDate)
HAVING COUNT(*) > 1
)
BEGIN
RAISERROR ('Overlapping',14,1)
ROLLBACK TRANSACTION
END
END
November 27, 2013 at 12:12 pm
archon99 (11/27/2013)
We had to solve the same problem in our business, and the first solution we adopted was as shown in the article, but the strain on the db and the sheer amount of data proved to be too much for us (too many rows in the table, like millions and the query took minutes to run)Then we considered the fact that a row A does not overlap another row B if the former ends before the latter starts or vice versa (and one row does not overlap with itself by definition)
This leads to the idea of using a self join on the table on the Person ID field and then discarding the rows that do not overlap .
To do this we needed a primary key on the table (in fact we already had it)
So we had something like this:
CREATE TABLE #Schedule
(
ID int identity(1,1) --primary key
, PersonId INT NOT NULL
, Activity VARCHAR(150) NOT NULL
, StartDate DATETIME NOT NULL
, EndDate DATETIME NOT NULL
, CHECK ( EndDate > StartDate )
)
INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)
VALUES ( 1, 'Piano Concert', '2013-11-20 18:00', '2013-11-20 18:20' )
INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)
VALUES ( 1, 'Weeding Anniversary Dinner', '2013-12-09 19:00',
'2013-12-09 23:30' )
INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)
VALUES ( 1, 'Work Meeting', '2013-12-09 18:00', '2013-12-09 19:15' )
-- Oooops! it overlaps with the weeding anniversary
INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)
VALUES ( 2, 'Football Night', '2013-11-20 17:00', '2013-11-20 20:30' )
INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)
VALUES ( 2, 'Reading Club', '2013-12-09 19:00', '2013-12-09 20:30' )
select A.Activity + ' OVERLAPS ' +B.Activity + ' for Person: ' +cast(A.PersonId as varchar(20)) as Problem
from #Schedule as A
inner join #Schedule as B
on A.PersonId=B.PersonId
where not (A.EndDate < B.StartDate or B.EndDate<A.StartDate)
and A.ID<>B.id
This gives us 2 rows: the wedding anniversary overlaps with
Weeding Anniversary Dinner OVERLAPS Work Meeting for Person: 1
Work Meeting OVERLAPS Weeding Anniversary Dinner for Person: 1
we can get one row easily as below
select A.Activity + ' OVERLAPS ' +B.Activity + ' for Person: ' +cast(A.PersonId as varchar(20)) as Problem
from #Schedule as A
inner join #Schedule as B
on A.PersonId=B.PersonId
where not (A.EndDate < B.StartDate or B.EndDate<A.StartDate)
and B.id>a.id--HERE I GET ONLY THE LAST EVENT ADDED THAT OVERLAPPED WITH THE PREVIOUS ONE
We found that this performs lots better for us.
I am interested if somebody has some considerations on the matter (I fear that my solution could have hidden pitfalls I didn't see) [Smile]
Hi there,
Your solution is pretty good , I can't see any problem.
Thnks for commenting.
November 28, 2013 at 11:32 am
Very clever solution to enforce business rules!
November 28, 2013 at 12:32 pm
mishaluba (11/28/2013)
Very clever solution to enforce business rules!
Thanks Mishaluba =)
December 4, 2013 at 9:11 am
adrian.facio (11/27/2013)
Hi, thank you. Doing some math i figured out that 3 million rows of 1 decade of time span with minutes granularity will need 15 768 000 million rows consuming about 314000 GB of space, and 2 million rows of 5 hours interval needs 286 GB .It seems that this approach is not suitable for millions of rows or long time spans with short granularity. In my field of work we might use about 400K rows of 1 day span with minutes precision, we would need 11 GB, which is little for some, but much for others.
I will dome some performance tests on the last and get back with the information.
Thanks for you comments.
You're welcome, and thank you for both a novel solution, and for doing the math - the major trick I see with the indexed view is the scalability/performance/disk space factor.
I did dig up some old, SQL 2000 vintage function based table constraint code I looked at some years ago - there may be flaws in it, but it'd be interesting to do a performance comparison.
USE tempdb;
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = object_id(N'[dbo].[tb_FnChkOverlapTESTING]') AND OBJECTPROPERTY(object_id, N'IsUserTable') = 1)
DROP TABLE [dbo].[tb_FnChkOverlapTESTING]
CREATE TABLE tempdb.dbo.tb_FnChkOverlapTESTING
(
ID1 int -- pretend it's value we're looking for date overlaps on
,ID2 VARCHAR(15) -- pretend it's the rest of the unique index
,startDT DATETIME
,endDT DATETIME
,value VARCHAR(15)
,CONSTRAINT PK_tb_FnChkOverlapTESTING PRIMARY KEY CLUSTERED
(
ID1
,ID2
,startDT -- this column is necessary, but completely insufficient to prevent overlaps in and of itself.
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 92)
ON [PRIMARY]
)
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = object_id(N'[dbo].[fn_CK_TMP_ChkOverlapTESTING]') AND type IN (N'FN', N'IF', N'TF'))
DROP FUNCTION [dbo].[fn_CK_TMP_ChkOverlapTESTING]
GO
CREATE FUNCTION dbo.fn_CK_TMP_ChkOverlapTESTING
(
@ID1 int -- value to prevent date overlaps on
,@ID2 VARCHAR(15) -- remainder of the key, used to prevent a row from "overlapping" itself
,@StartDateTime DATETIME
,@EndDateTime DATETIME
)
RETURNS TINYINT
AS
-- Read http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=97546
-- Purpose: To be used in a check constraint in order to prevent overlapping ranges
-- WORKS on SQL 2000+
-- TODO: for SQL 2005+, there was a more efficient way to do this.
BEGIN
RETURN
CASE
WHEN
(SELECT COUNT(*)
FROM tempdb.dbo.tb_FnChkOverlapTESTING
WHERE ID1 = @ID1
AND ID2 = @ID2
AND startDT <= @EndDateTime
AND endDT >= @StartDateTime
) > 1 -- to allow for the row already entered to be present. The constraint fires after the row is added!
-- We could use a simple WHEN EXISTS if we had a portion that was required to be different, i.e. ID2 <> @ID2
THEN 1
ELSE 0
END
END
GO
-- Add the function as a constraint
ALTER TABLE tempdb.dbo.tb_FnChkOverlapTESTING
WITH CHECK ADD CONSTRAINT CK_TMP_ChkOverlapTESTING
CHECK (dbo.fn_CK_TMP_ChkOverlapTESTING(ID1,ID2,startDT, endDT) = 0)
GO
SELECT * FROM tempdb.dbo.tb_FnChkOverlapTESTING
--TRUNCATE TABLE tempdb.dbo.tb_FnChkOverlapTESTING
-- Successful inserts
INSERT INTO tempdb.dbo.tb_FnChkOverlapTESTING
SELECT 1, 'a', '20110101','20111231','Great'
UNION ALL
SELECT 1, 'b', '20110101','20111231','Good'
UNION ALL
SELECT 1, 'c', '20110101','20111231','Great'
UNION ALL
SELECT 2, 'a', '19600101','20651231','Terrible'
UNION ALL
SELECT 2, 'b', '20110101','20110101','Mistake'
UNION ALL
SELECT 2, 'b', '20110102','20110630','FirstHalf'
UNION ALL
SELECT 2, 'b', '20111001','20111231','LastQuarter'
UNION ALL
SELECT 3, 'a', '20110101','20111231','Great'
SELECT * FROM tempdb.dbo.tb_FnChkOverlapTESTING
-- Overlap prevented
INSERT INTO tempdb.dbo.tb_FnChkOverlapTESTING
SELECT 1, 'a', '20101231','20110101','GreatGreat'
-- Overlap prevented on the entertaining startDT portion of the PK
INSERT INTO tempdb.dbo.tb_FnChkOverlapTESTING
SELECT 1, 'a', '20110101','20110102','GreatGreat'
-- Overlap prevented
INSERT INTO tempdb.dbo.tb_FnChkOverlapTESTING
SELECT 1, 'a', '20110102','20110102','GreatGreat'
-- Success
INSERT INTO tempdb.dbo.tb_FnChkOverlapTESTING
SELECT 2, 'b', '20110701','20110731','3Qtr'
-- Success
UPDATE tempdb.dbo.tb_FnChkOverlapTESTING
SET endDT = '20110831'
WHERE ID1 = 2
AND ID2 = 'b'
AND startDT = '20110701'
-- Success
UPDATE tempdb.dbo.tb_FnChkOverlapTESTING
SET startDT = '20110901'
WHERE ID1 = 2
AND ID2 = 'b'
AND startDT = '20111001'
-- Overlap prevented
UPDATE tempdb.dbo.tb_FnChkOverlapTESTING
SET startDT = '20110831'
WHERE ID1 = 2
AND ID2 = 'b'
AND startDT = '20110901'
-- Overlap prevented
UPDATE tempdb.dbo.tb_FnChkOverlapTESTING
SET startDT = '20110830'
WHERE ID1 = 2
AND ID2 = 'b'
AND startDT = '20110901'
-- Overlap prevented
UPDATE tempdb.dbo.tb_FnChkOverlapTESTING
SET startDT = '20101231'
WHERE ID1 = 2
AND ID2 = 'b'
AND startDT = '20110901'
-- Overlap prevented
UPDATE tempdb.dbo.tb_FnChkOverlapTESTING
SET startDT = '20070101'
WHERE ID1 = 2
AND ID2 = 'b'
AND startDT = '20110901'
SELECT * FROM tempdb.dbo.tb_FnChkOverlapTESTING
-- Overlap prevented
UPDATE tempdb.dbo.tb_FnChkOverlapTESTING
SET endDT = '20110901'
WHERE ID1 = 2
AND ID2 = 'b'
AND startDT = '20110701'
-- Overlap prevented
UPDATE tempdb.dbo.tb_FnChkOverlapTESTING
SET endDT = '20110902'
WHERE ID1 = 2
AND ID2 = 'b'
AND startDT = '20110701'
-- Overlap prevented
UPDATE tempdb.dbo.tb_FnChkOverlapTESTING
SET endDT = '20781231'
WHERE ID1 = 2
AND ID2 = 'b'
AND startDT = '20110701'
SELECT * FROM tempdb.dbo.tb_FnChkOverlapTESTING;
DROP TABLE tempdb.dbo.tb_FnChkOverlapTESTING;
DROP FUNCTION [dbo].[fn_CK_TMP_ChkOverlapTESTING];
February 17, 2014 at 12:25 pm
Hi Nadrek,
I'm really sorry for responding so late. I looked your function andI liked it, I'm also interested in the performance comparison, I have the feeling mine will be slower, but let's give it a try. What metrics should be look, just execution time of inserting a number of records?
February 18, 2014 at 8:55 am
adrian.facio (2/17/2014)
Hi Nadrek,I'm really sorry for responding so late. I looked your function andI liked it, I'm also interested in the performance comparison, I have the feeling mine will be slower, but let's give it a try. What metrics should be look, just execution time of inserting a number of records?
No problem - I've been pretty swamped lately, too.
For benchmarking SQL, I generally start with Profiler on the SQL:BatchCompleted event, and look at Reads, Writes, CPU, and Duration for sets of at least three runs at a time (to let the buffers settle out and the procedure cache cache the query plan).
May 16, 2014 at 4:42 pm
Now that i tried to make the performance test, i have found out that i don't have profiler. My servers are 2005 and my laptops is express 2008.
Viewing 13 posts - 16 through 27 (of 27 total)
You must be logged in to reply to this topic. Login to reply