Prevent overlapping of time events with an indexed view

Question

Prevent overlapping of time events with an indexed view

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply

archon99 Newbie Points: 6 More actions · Answer 1

We had to solve the same problem in our business, and the first solution we adopted was as shown in the article, but the strain on the db and the sheer amount of data proved to be too much for us (too many rows in the table, like millions and the query took minutes to run)

Then we considered the fact that a row A does not overlap another row B if the former ends before the latter starts or vice versa (and one row does not overlap with itself by definition)

This leads to the idea of using a self join on the table on the Person ID field and then discarding the rows that do not overlap .

To do this we needed a primary key on the table (in fact we already had it)

So we had something like this:

CREATE TABLE #Schedule

(

ID int identity(1,1) --primary key

, PersonId INT NOT NULL

, Activity VARCHAR(150) NOT NULL

, StartDate DATETIME NOT NULL

, EndDate DATETIME NOT NULL

, CHECK ( EndDate > StartDate )

)

INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)

VALUES ( 1, 'Piano Concert', '2013-11-20 18:00', '2013-11-20 18:20' )

INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)

VALUES ( 1, 'Weeding Anniversary Dinner', '2013-12-09 19:00',

'2013-12-09 23:30' )

INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)

VALUES ( 1, 'Work Meeting', '2013-12-09 18:00', '2013-12-09 19:15' )

-- Oooops! it overlaps with the weeding anniversary

INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)

VALUES ( 2, 'Football Night', '2013-11-20 17:00', '2013-11-20 20:30' )

INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)

VALUES ( 2, 'Reading Club', '2013-12-09 19:00', '2013-12-09 20:30' )

select A.Activity + ' OVERLAPS ' +B.Activity + ' for Person: ' +cast(A.PersonId as varchar(20)) as Problem

from #Schedule as A

inner join #Schedule as B

on A.PersonId=B.PersonId

where not (A.EndDate < B.StartDate or B.EndDate<A.StartDate)

and A.ID<>B.id

This gives us 2 rows: the wedding anniversary overlaps with

Weeding Anniversary Dinner OVERLAPS Work Meeting for Person: 1

Work Meeting OVERLAPS Weeding Anniversary Dinner for Person: 1

we can get one row easily as below

select A.Activity + ' OVERLAPS ' +B.Activity + ' for Person: ' +cast(A.PersonId as varchar(20)) as Problem

from #Schedule as A

inner join #Schedule as B

on A.PersonId=B.PersonId

where not (A.EndDate < B.StartDate or B.EndDate<A.StartDate)

and B.id>a.id--HERE I GET ONLY THE LAST EVENT ADDED THAT OVERLAPPED WITH THE PREVIOUS ONE

We found that this performs lots better for us.

I am interested if somebody has some considerations on the matter (I fear that my solution could have hidden pitfalls I didn't see) [Smile]

adrian.facio SSCrazy Points: 2406 More actions · Answer 2

Nadrek (11/26/2013)
This is an interesting take on temporal constraints, but have you tested it with large sets, i.e. perhaps millions of rows of data where the time ranges are measured in years or decades?

Hi, thank you. Doing some math i figured out that 3 million rows of 1 decade of time span with minutes granularity will need 15 768 000 million rows consuming about 314000 GB of space, and 2 million rows of 5 hours interval needs 286 GB .It seems that this approach is not suitable for millions of rows or long time spans with short granularity. In my field of work we might use about 400K rows of 1 day span with minutes precision, we would need 11 GB, which is little for some, but much for others.

I will dome some performance tests on the last and get back with the information.

Thanks for you comments.

adrian.facio SSCrazy Points: 2406 More actions · Answer 3

jjturner (11/26/2013)
This is a decent workaround for something SQL Server doesn't seem to adequately support (afaik): temporal awareness
Not to plug another rdbms, but the one starting with "p" has gone a long ways towards implementing duration constraints for overlaps and other types of non-/partial-/extra-equi comparisons (or "Exclusion Constraints", as they're referred to).
Not only that, the PERIOD data type has been generalized to include distance or most any other type of continuous range (cf. "range types").
Using a Start and End column for a timespan has never made much sense to me. It's one datum, not two disjunct attributes relatable only by their row index.
It would be nice to see SQL Server deal seriously with this type of data.
My 2 cantakerous cents :satisfied:
John

Hi there,

Thanks for commenting. What is that other rdbms? i could not guess it he he he, I agree that it will be very good if sql server provided that functionality too. About the Start and End column , looks like it would be better if used a Duration field instead of End, the Start will be still be needed , but the (Start,Duration) pair is more cohesive.

Thanks again.

adrian.facio SSCrazy Points: 2406 More actions · Answer 4

lubork (11/26/2013)
Nice simple solution. Observe the Schedule table is fully updateable (e.g. you can change the EndDate value and
UPDATE Schedule SET EndDate = '2013-12-09 18:30' WHERE Activity like 'Mexico%'
will work while
UPDATE Schedule SET EndDate = '2013-12-09 19:40' WHERE Activity like 'Mexico%'
correctly fails).
The "supporting" index view may get pretty big (theoretically for one year of activities we may need up to 24*60*365 = 525600 index view entries for one year activities); no problem for SQL Server even if we need to cover several years. Things may get more demanding if we need to drop the time granularity to seconds and watch several independent timelines. I'm sure there exists other solutions; I'm thinking about Itzik's "interval" challenges and algorithms...

Hi there

You are right, the view checks every change on data, also the index size might become problematic for many many rows. I haven't read Itzik's solutions but I'm planning too , I imagine they are great. My goal was to enforce the business rule with an indexed view, and this is the only way i could make it work, because self joins, correlated queries, windowing functions and other things are not allowed in a materialized view. I believe a trigger that checks only the affected rows of a insert , update or delete is a good option.

Something like this:

ALTER TABLE Schedule ADD ScheduleId INT NOT NULL IDENTITY(1,1)

GO

CREATE INDEX IX_ScheduleId ON dbo.Schedule(ScheduleId)

GO

CREATE TRIGGER [dbo].[AvoidOverlap] ON [dbo].[Schedule]

AFTER INSERT,UPDATE,DELETE

NOT FOR REPLICATION

AS

BEGIN

IF EXISTS ( SELECT s.PersonId ,

MinuteNeeded = DATEADD(MINUTE,n.Number -1,s.StartDate) ,

COUNT(*)

FROM dbo.Schedule s

INNER JOIN dbo.Numbers n

ON n.Number BETWEEN 1 AND DATEDIFF(MINUTE,s.StartDate ,s.EndDate )

WHERE s.ScheduleId IN (SELECT ScheduleId FROM INSERTED

UNION

SELECT ScheduleId FROM DELETED)

GROUP BY s.PersonId , DATEADD(MINUTE,n.Number -1,s.StartDate)

HAVING COUNT(*) > 1

)

BEGIN

RAISERROR ('Overlapping',14,1)

ROLLBACK TRANSACTION

END

The code is to express and idea, i have not tested it, but I'm pretty sure it works =D.

adrian.facio SSCrazy Points: 2406 More actions · Answer 5

lubork (11/26/2013)
Nice simple solution. Observe the Schedule table is fully updateable (e.g. you can change the EndDate value and
UPDATE Schedule SET EndDate = '2013-12-09 18:30' WHERE Activity like 'Mexico%'
will work while
UPDATE Schedule SET EndDate = '2013-12-09 19:40' WHERE Activity like 'Mexico%'
correctly fails).
The "supporting" index view may get pretty big (theoretically for one year of activities we may need up to 24*60*365 = 525600 index view entries for one year activities); no problem for SQL Server even if we need to cover several years. Things may get more demanding if we need to drop the time granularity to seconds and watch several independent timelines. I'm sure there exists other solutions; I'm thinking about Itzik's "interval" challenges and algorithms...

I forgot to say, Thanks for commenting =D.

adrian.facio SSCrazy Points: 2406 More actions · Answer 6

adrian.facio (11/27/2013)
lubork (11/26/2013)
Nice simple solution. Observe the Schedule table is fully updateable (e.g. you can change the EndDate value and
UPDATE Schedule SET EndDate = '2013-12-09 18:30' WHERE Activity like 'Mexico%'
will work while
UPDATE Schedule SET EndDate = '2013-12-09 19:40' WHERE Activity like 'Mexico%'
correctly fails).
The "supporting" index view may get pretty big (theoretically for one year of activities we may need up to 24*60*365 = 525600 index view entries for one year activities); no problem for SQL Server even if we need to cover several years. Things may get more demanding if we need to drop the time granularity to seconds and watch several independent timelines. I'm sure there exists other solutions; I'm thinking about Itzik's "interval" challenges and algorithms...
Hi there
You are right, the view checks every change on data, also the index size might become problematic for many many rows. I haven't read Itzik's solutions but I'm planning too , I imagine they are great. My goal was to enforce the business rule with an indexed view, and this is the only way i could make it work, because self joins, correlated queries, windowing functions and other things are not allowed in a materialized view. I believe a trigger that checks only the affected rows of a insert , update or delete is a good option.
Something like this:
ALTER TABLE Schedule ADD ScheduleId INT NOT NULL IDENTITY(1,1)
GO
CREATE INDEX IX_ScheduleId ON dbo.Schedule(ScheduleId)
GO

CREATE TRIGGER [dbo].[AvoidOverlap] ON [dbo].[Schedule]
AFTER INSERT,UPDATE,DELETE
NOT FOR REPLICATION
AS
BEGIN
IF EXISTS ( SELECT s.PersonId ,
MinuteNeeded = DATEADD(MINUTE,n.Number -1,s.StartDate) ,
COUNT(*)
FROM dbo.Schedule s
INNER JOIN dbo.Numbers n
ON n.Number BETWEEN 1 AND DATEDIFF(MINUTE,s.StartDate ,s.EndDate )
WHERE s.ScheduleId IN (SELECT ScheduleId FROM INSERTED
UNION
SELECT ScheduleId FROM DELETED)
GROUP BY s.PersonId , DATEADD(MINUTE,n.Number -1,s.StartDate)
HAVING COUNT(*) > 1
)
BEGIN
RAISERROR ('Overlapping',14,1)
ROLLBACK TRANSACTION
END
END
The code is to express and idea, i have not tested it, but I'm pretty sure it works =D.

I had something wrong, the trigger must check all the rows of the affected persons, not only of the affected rows. See the correction.

CREATE TRIGGER [dbo].[AvoidOverlap] ON [dbo].[Schedule]

AFTER INSERT,UPDATE,DELETE

NOT FOR REPLICATION

AS

BEGIN

IF EXISTS ( SELECT s.PersonId ,

MinuteNeeded = DATEADD(MINUTE,n.Number -1,s.StartDate) ,

COUNT(*)

FROM dbo.Schedule s

INNER JOIN dbo.Numbers n

ON n.Number BETWEEN 1 AND DATEDIFF(MINUTE,s.StartDate ,s.EndDate )

WHERE s.PersonId IN (SELECT PersonId FROM INSERTED

UNION

SELECT PersonId FROM DELETED)

GROUP BY s.PersonId , DATEADD(MINUTE,n.Number -1,s.StartDate)

HAVING COUNT(*) > 1

)

BEGIN

RAISERROR ('Overlapping',14,1)

ROLLBACK TRANSACTION

END

adrian.facio SSCrazy Points: 2406 More actions · Answer 7

archon99 (11/27/2013)
We had to solve the same problem in our business, and the first solution we adopted was as shown in the article, but the strain on the db and the sheer amount of data proved to be too much for us (too many rows in the table, like millions and the query took minutes to run)
Then we considered the fact that a row A does not overlap another row B if the former ends before the latter starts or vice versa (and one row does not overlap with itself by definition)
This leads to the idea of using a self join on the table on the Person ID field and then discarding the rows that do not overlap .
To do this we needed a primary key on the table (in fact we already had it)
So we had something like this:
CREATE TABLE #Schedule
(
ID int identity(1,1) --primary key
, PersonId INT NOT NULL
, Activity VARCHAR(150) NOT NULL
, StartDate DATETIME NOT NULL
, EndDate DATETIME NOT NULL
, CHECK ( EndDate > StartDate )
)
INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)
VALUES ( 1, 'Piano Concert', '2013-11-20 18:00', '2013-11-20 18:20' )
INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)
VALUES ( 1, 'Weeding Anniversary Dinner', '2013-12-09 19:00',
'2013-12-09 23:30' )
INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)
VALUES ( 1, 'Work Meeting', '2013-12-09 18:00', '2013-12-09 19:15' )
-- Oooops! it overlaps with the weeding anniversary
INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)
VALUES ( 2, 'Football Night', '2013-11-20 17:00', '2013-11-20 20:30' )
INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)
VALUES ( 2, 'Reading Club', '2013-12-09 19:00', '2013-12-09 20:30' )
select A.Activity + ' OVERLAPS ' +B.Activity + ' for Person: ' +cast(A.PersonId as varchar(20)) as Problem
from #Schedule as A
inner join #Schedule as B
on A.PersonId=B.PersonId
where not (A.EndDate < B.StartDate or B.EndDate<A.StartDate)
and A.ID<>B.id
This gives us 2 rows: the wedding anniversary overlaps with
Weeding Anniversary Dinner OVERLAPS Work Meeting for Person: 1
Work Meeting OVERLAPS Weeding Anniversary Dinner for Person: 1
we can get one row easily as below
select A.Activity + ' OVERLAPS ' +B.Activity + ' for Person: ' +cast(A.PersonId as varchar(20)) as Problem
from #Schedule as A
inner join #Schedule as B
on A.PersonId=B.PersonId
where not (A.EndDate < B.StartDate or B.EndDate<A.StartDate)
and B.id>a.id--HERE I GET ONLY THE LAST EVENT ADDED THAT OVERLAPPED WITH THE PREVIOUS ONE
We found that this performs lots better for us.
I am interested if somebody has some considerations on the matter (I fear that my solution could have hidden pitfalls I didn't see) [Smile]

Hi there,

Your solution is pretty good , I can't see any problem.

Thnks for commenting.

Misha_SQL SSCertifiable Points: 5407 More actions · Answer 8

Misha_SQL

SSCertifiable

Points: 5407

November 28, 2013 at 11:32 am

#1669715

Very clever solution to enforce business rules!

adrian.facio SSCrazy Points: 2406 More actions · Answer 9

mishaluba (11/28/2013)
Very clever solution to enforce business rules!

Thanks Mishaluba =)

Nadrek SSC-Insane Points: 20039 More actions · Answer 10

adrian.facio (11/27/2013)
Hi, thank you. Doing some math i figured out that 3 million rows of 1 decade of time span with minutes granularity will need 15 768 000 million rows consuming about 314000 GB of space, and 2 million rows of 5 hours interval needs 286 GB .It seems that this approach is not suitable for millions of rows or long time spans with short granularity. In my field of work we might use about 400K rows of 1 day span with minutes precision, we would need 11 GB, which is little for some, but much for others.
I will dome some performance tests on the last and get back with the information.
Thanks for you comments.

You're welcome, and thank you for both a novel solution, and for doing the math - the major trick I see with the indexed view is the scalability/performance/disk space factor.

I did dig up some old, SQL 2000 vintage function based table constraint code I looked at some years ago - there may be flaws in it, but it'd be interesting to do a performance comparison.

USE tempdb;

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = object_id(N'[dbo].[tb_FnChkOverlapTESTING]') AND OBJECTPROPERTY(object_id, N'IsUserTable') = 1)

DROP TABLE [dbo].[tb_FnChkOverlapTESTING]

CREATE TABLE tempdb.dbo.tb_FnChkOverlapTESTING

(

ID1 int -- pretend it's value we're looking for date overlaps on

,ID2 VARCHAR(15) -- pretend it's the rest of the unique index

,startDT DATETIME

,endDT DATETIME

,value VARCHAR(15)

,CONSTRAINT PK_tb_FnChkOverlapTESTING PRIMARY KEY CLUSTERED

(

ID1

,ID2

,startDT -- this column is necessary, but completely insufficient to prevent overlaps in and of itself.

) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 92)

ON [PRIMARY]

)

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = object_id(N'[dbo].[fn_CK_TMP_ChkOverlapTESTING]') AND type IN (N'FN', N'IF', N'TF'))

DROP FUNCTION [dbo].[fn_CK_TMP_ChkOverlapTESTING]

GO

CREATE FUNCTION dbo.fn_CK_TMP_ChkOverlapTESTING

(

@ID1 int -- value to prevent date overlaps on

,@ID2 VARCHAR(15) -- remainder of the key, used to prevent a row from "overlapping" itself

,@StartDateTime DATETIME

,@EndDateTime DATETIME

)

RETURNS TINYINT

AS

-- Read http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=97546

-- Purpose: To be used in a check constraint in order to prevent overlapping ranges

-- WORKS on SQL 2000+

-- TODO: for SQL 2005+, there was a more efficient way to do this.

BEGIN

RETURN

CASE

WHEN

(SELECT COUNT(*)

FROM tempdb.dbo.tb_FnChkOverlapTESTING

WHERE ID1 = @ID1

AND ID2 = @ID2

AND startDT <= @EndDateTime

AND endDT >= @StartDateTime

) > 1 -- to allow for the row already entered to be present. The constraint fires after the row is added!

-- We could use a simple WHEN EXISTS if we had a portion that was required to be different, i.e. ID2 <> @ID2

THEN 1

ELSE 0

END

GO

-- Add the function as a constraint

ALTER TABLE tempdb.dbo.tb_FnChkOverlapTESTING

WITH CHECK ADD CONSTRAINT CK_TMP_ChkOverlapTESTING

CHECK (dbo.fn_CK_TMP_ChkOverlapTESTING(ID1,ID2,startDT, endDT) = 0)

GO

SELECT * FROM tempdb.dbo.tb_FnChkOverlapTESTING

--TRUNCATE TABLE tempdb.dbo.tb_FnChkOverlapTESTING

-- Successful inserts

INSERT INTO tempdb.dbo.tb_FnChkOverlapTESTING

SELECT 1, 'a', '20110101','20111231','Great'

UNION ALL

SELECT 1, 'b', '20110101','20111231','Good'

UNION ALL

SELECT 1, 'c', '20110101','20111231','Great'

UNION ALL

SELECT 2, 'a', '19600101','20651231','Terrible'

UNION ALL

SELECT 2, 'b', '20110101','20110101','Mistake'

UNION ALL

SELECT 2, 'b', '20110102','20110630','FirstHalf'

UNION ALL

SELECT 2, 'b', '20111001','20111231','LastQuarter'

UNION ALL

SELECT 3, 'a', '20110101','20111231','Great'

SELECT * FROM tempdb.dbo.tb_FnChkOverlapTESTING

-- Overlap prevented

INSERT INTO tempdb.dbo.tb_FnChkOverlapTESTING

SELECT 1, 'a', '20101231','20110101','GreatGreat'

-- Overlap prevented on the entertaining startDT portion of the PK

INSERT INTO tempdb.dbo.tb_FnChkOverlapTESTING

SELECT 1, 'a', '20110101','20110102','GreatGreat'

-- Overlap prevented

INSERT INTO tempdb.dbo.tb_FnChkOverlapTESTING

SELECT 1, 'a', '20110102','20110102','GreatGreat'

-- Success

INSERT INTO tempdb.dbo.tb_FnChkOverlapTESTING

SELECT 2, 'b', '20110701','20110731','3Qtr'

-- Success

UPDATE tempdb.dbo.tb_FnChkOverlapTESTING

SET endDT = '20110831'

WHERE ID1 = 2

AND ID2 = 'b'

AND startDT = '20110701'

-- Success

UPDATE tempdb.dbo.tb_FnChkOverlapTESTING

SET startDT = '20110901'

WHERE ID1 = 2

AND ID2 = 'b'

AND startDT = '20111001'

-- Overlap prevented

UPDATE tempdb.dbo.tb_FnChkOverlapTESTING

SET startDT = '20110831'

WHERE ID1 = 2

AND ID2 = 'b'

AND startDT = '20110901'

-- Overlap prevented

UPDATE tempdb.dbo.tb_FnChkOverlapTESTING

SET startDT = '20110830'

WHERE ID1 = 2

AND ID2 = 'b'

AND startDT = '20110901'

-- Overlap prevented

UPDATE tempdb.dbo.tb_FnChkOverlapTESTING

SET startDT = '20101231'

WHERE ID1 = 2

AND ID2 = 'b'

AND startDT = '20110901'

-- Overlap prevented

UPDATE tempdb.dbo.tb_FnChkOverlapTESTING

SET startDT = '20070101'

WHERE ID1 = 2

AND ID2 = 'b'

AND startDT = '20110901'

SELECT * FROM tempdb.dbo.tb_FnChkOverlapTESTING

-- Overlap prevented

UPDATE tempdb.dbo.tb_FnChkOverlapTESTING

SET endDT = '20110901'

WHERE ID1 = 2

AND ID2 = 'b'

AND startDT = '20110701'

-- Overlap prevented

UPDATE tempdb.dbo.tb_FnChkOverlapTESTING

SET endDT = '20110902'

WHERE ID1 = 2

AND ID2 = 'b'

AND startDT = '20110701'

-- Overlap prevented

UPDATE tempdb.dbo.tb_FnChkOverlapTESTING

SET endDT = '20781231'

WHERE ID1 = 2

AND ID2 = 'b'

AND startDT = '20110701'

SELECT * FROM tempdb.dbo.tb_FnChkOverlapTESTING;

DROP TABLE tempdb.dbo.tb_FnChkOverlapTESTING;

DROP FUNCTION [dbo].[fn_CK_TMP_ChkOverlapTESTING];

adrian.facio SSCrazy Points: 2406 More actions · Answer 11

Hi Nadrek,

I'm really sorry for responding so late. I looked your function andI liked it, I'm also interested in the performance comparison, I have the feeling mine will be slower, but let's give it a try. What metrics should be look, just execution time of inserting a number of records?

Nadrek SSC-Insane Points: 20039 More actions · Answer 12

adrian.facio (2/17/2014)
Hi Nadrek,
I'm really sorry for responding so late. I looked your function andI liked it, I'm also interested in the performance comparison, I have the feeling mine will be slower, but let's give it a try. What metrics should be look, just execution time of inserting a number of records?

No problem - I've been pretty swamped lately, too.

For benchmarking SQL, I generally start with Profiler on the SQL:BatchCompleted event, and look at Reads, Writes, CPU, and Duration for sets of at least three runs at a time (to let the buffers settle out and the procedure cache cache the query plan).

adrian.facio SSCrazy Points: 2406 More actions · Answer 13

Now that i tried to make the performance test, i have found out that i don't have profiler. My servers are 2005 and my laptops is express 2008.