June 15, 2011 at 9:39 am
Hi,
I have been facing an annoying issue and not able to overcome it. Need some inputs from experts. Please help.
I have defined a table as below:
CREATE TABLE TEST( SRC varchar(6)not null,
OrgNo varchar(5)not null,
Sdate date Not nUll,
Edate date Not Null)
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/1/1999','12/31/2010')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/1/2011','6/30/2012')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','8/1/2012','12/31/2015')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/1/2016','6/1/2017')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','6/2/2017','12/31/2018')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/1/2019','12/31/2020')
When I populate the table, it looks like:
SRCORGNO Sdate Edate
abc999991/1/199912/31/2010
abc999991/1/20116/30/2012
abc999998/1/201212/31/2015
abc999991/1/20166/1/2017
abc999996/2/201712/31/2018
abc999991/1/201912/31/2020
From above data i want the continuous period in which that organisation was working.
Like the enddate of 1st row is the previous day of start date of 2nd row. But there is difference of more than one day between the enddate of 2nd row and start date of 3rd row . So first period should close at enddate of 2nd row. Next period starts from 3rd row and it should end at 6th row as the dates are continous.
I am running a query on this table on as:
DECLARE @holdTable TABLE
(Rowid int IDENTITY(1,1) Not null ,
SRC varchar(6)not null,
OrgNo varchar(5)not null,
Sdate date Not nUll,
Edate date Not Null)
---insert data into temp table
INSERT INTO @holdTable (SRC ,OrgNo ,Sdate ,Edate )
SELECT SRC ,OrgName,Sdate,Edate from Test WHERE OrgNo = '9999' AND SRC = 'abc'
---- select dates for continuous time period
Select s.Sdate as a,t.Edate as b
FROM @holdTable s
LEFT JOIN @holdTable t ON t.Rowid = s.Rowid +1
AND s.Orgno = t.Orgno
AND s.SRC = t.SRC
WHERE DATEDIFF(D,s.Edate,t.Sdate) = 1
And I am expecting the output of this as :
(Correct output)
SRCORGNO Sdate Edate
abc999991/1/19996/30/2012
abc999998/1/201212/31/2020
But I am getting it as:
(Wrong Output)
SRCORGNO Sdate Edate
abc99999 1/1/19996/30/2012
abc99999 8/1/20126/1/2017
abc99999 6/2/201712/31/2020
Is there anything wrong with the query?
June 15, 2011 at 10:11 am
This should work for you, not particularly efficient though
WITH Sdates AS (
SELECT a.Sdate,a.SRC,a.OrgNo
FROM TEST a
WHERE NOT EXISTS(SELECT * FROM TEST b
WHERE a.SRC=b.SRC
AND a.OrgNo=b.OrgNo
AND a.Sdate IN (b.Edate,DATEADD(day,1,b.Edate)))),
Edates AS (
SELECT a.Edate,a.SRC,a.OrgNo
FROM TEST a
WHERE NOT EXISTS(SELECT * FROM TEST b
WHERE a.SRC=b.SRC
AND a.OrgNo=b.OrgNo
AND a.Edate IN (b.Sdate,DATEADD(day,-1,b.Sdate))))
SELECT s.SRC,s.OrgNo,s.Sdate,
MIN(e.Edate) AS Edate
FROM Sdates s
INNER JOIN Edates e ON e.SRC=s.SRC
AND e.OrgNo=s.OrgNo
AND e.Edate>=s.Sdate
GROUP BY s.Sdate,s.SRC,s.OrgNo;
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537June 15, 2011 at 12:23 pm
You might want to have a look at the following blog for a better performing alternative:
http://sqlblogcasts.com/blogs/sqlandthelike/archive/2009/08/27/sql-and-contiguous-data-ranges.aspx
June 15, 2011 at 1:26 pm
How about this?
; with diffenrentiator as
(
SELECT s.Rowid sRowID, s.SRC sSRC , s.OrgNo sOrgNo , s.Sdate sSate
,s.Edate sEdate , t.*,
case when DATEDIFF(dd, s.edate , t.Sdate) <> 1
then 1
else 0
end indicator
FROM @holdTable s
LEFT JOIN @holdTable t
ON t.Rowid = s.Rowid +1
AND s.Orgno = t.Orgno
AND s.SRC = t.SRC
),
sequenced As
(
select sRowid , sSRC, sORgNo , sSate,
coalesce( Edate ,sEdate ) Edate,
rndiff = srowid - ROW_NUMBER() over(partition by sSRC , SoRGNO , indicator order by sRowid)
from diffenrentiator
where indicator <> 1
)
select sSRC, sORgNo , MIN( sSate) StartDate , MAX(Edate) EndDate
from sequenced
GROUP BY
sSRC, sORgNo , rndiff
June 16, 2011 at 10:11 pm
Hi ,
Thanks for the query.. Just a minor change and it worked.:-)
June 18, 2011 at 10:50 am
Mark-101232 (6/15/2011)
This should work for you, not particularly efficient though
WITH Sdates AS (
SELECT a.Sdate,a.SRC,a.OrgNo
FROM TEST a
WHERE NOT EXISTS(SELECT * FROM TEST b
WHERE a.SRC=b.SRC
AND a.OrgNo=b.OrgNo
AND a.Sdate IN (b.Edate,DATEADD(day,1,b.Edate)))),
Edates AS (
SELECT a.Edate,a.SRC,a.OrgNo
FROM TEST a
WHERE NOT EXISTS(SELECT * FROM TEST b
WHERE a.SRC=b.SRC
AND a.OrgNo=b.OrgNo
AND a.Edate IN (b.Sdate,DATEADD(day,-1,b.Sdate))))
SELECT s.SRC,s.OrgNo,s.Sdate,
MIN(e.Edate) AS Edate
FROM Sdates s
INNER JOIN Edates e ON e.SRC=s.SRC
AND e.OrgNo=s.OrgNo
AND e.Edate>=s.Sdate
GROUP BY s.Sdate,s.SRC,s.OrgNo;
I realize there was a follow up but, just to be sure... This particular one doesn't work quite right. Add the following data to the test table and see...
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/1/1900', '1/1/1900')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/2/1900', '1/2/1900')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/3/1900', '1/3/1900')
--Jeff Moden
Change is inevitable... Change for the better is not.
June 18, 2011 at 11:32 am
@Cold Coffee.
There is a ticking time bomb in it in the form of "Hidden RBAR" because of an "accidental Cross Join". Using the following test data and looking at the actual execution plan, there's an arrow coming from one of the two scans on @HoldTable with an actual row-count of 100.
{Edit} I was also able to break the code later in this post.
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/1/1999','12/31/2010')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/1/2011','6/30/2012')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','8/1/2012', '12/31/2015')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/1/2016', '6/1/2017')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','6/2/2017', '12/31/2018')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/1/2019', '12/31/2020')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/1/1900', '1/1/1900')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/2/1900', '1/2/1900')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/3/1900', '1/3/1900')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/5/1900', '1/5/1900')
Ten rows of test data... 100 rows of internally used data. Adding just one more row (11 total) causes the internally used data count on the actual exectution plan to jump to the expected 11*11 or 121 rows.
If you run this code on just 10,000 rows, the internally used row-count of 10,000*10,000 (100,000,000) will really begin to impact performance.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 18, 2011 at 2:31 pm
Unfortunately, ColdCoffee's code breaks.
CREATE TABLE TEST( SRC varchar(6)not null,
OrgNo varchar(5)not null,
Sdate datetime Not nUll,
Edate datetime Not Null)
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/1/1999','12/31/1999')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/1/2011','6/30/2012')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/1/2012', '12/31/2012')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/1/2016', '6/1/2017')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','6/2/2017', '12/31/2018')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/1/2019', '12/31/2020')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/1/1900', '1/1/1900')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/2/1900', '1/2/1900')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/3/1900', '1/3/1900')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/5/1900', '1/5/1900')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/1/1901', '12/31/1901')
DECLARE @holdTable TABLE
(Rowid int IDENTITY(1,1) Not null ,
SRC varchar(6)not null,
OrgNo varchar(5)not null,
Sdate datetime Not nUll,
Edate datetime Not Null)
---insert data into temp table
INSERT INTO @holdTable (SRC ,OrgNo ,Sdate ,Edate )
SELECT SRC ,OrgNo,Sdate,Edate from Test WHERE OrgNo = '99999' AND SRC = 'abc'
;
; with diffenrentiator as
(
SELECT s.Rowid sRowID, s.SRC sSRC , s.OrgNo sOrgNo , s.Sdate sSate
,s.Edate sEdate , t.*,
case when DATEDIFF(dd, s.edate , t.Sdate) <> 1
then 1
else 0
end indicator
FROM @holdTable s
LEFT JOIN @holdTable t
ON t.Rowid = s.Rowid +1
AND s.Orgno = t.Orgno
AND s.SRC = t.SRC
)
--select * from diffenrentiator
,
sequenced As
(
select sRowid , sSRC, sORgNo , sSate,
coalesce( Edate ,sEdate ) Edate,
rndiff = srowid - ROW_NUMBER() over(partition by sSRC , sORgNo , indicator order by sRowid)
from diffenrentiator
where indicator <> 1
)
--select * from sequenced
select sSRC, sORgNo , MIN( sSate) StartDate , MAX(Edate) EndDate
from sequenced
GROUP BY
sSRC, sORgNo , rndiff
Notice that the above code only returns 3 ranges. Here's what it returned...
sSRCsORgNoStartDateEndDate
abc999992016-01-01 00:00:00.0002020-12-31 00:00:00.000
abc999991900-01-01 00:00:00.0001900-01-03 00:00:00.000
abc999991901-01-01 00:00:00.0001901-12-31 00:00:00.000
Here's what it should have returned...
SrcOrgNoSDateEDate
abc999991900-01-01 00:00:00.0001900-01-03 00:00:00.000
abc999991900-01-05 00:00:00.0001900-01-05 00:00:00.000
abc999991901-01-01 00:00:00.0001901-12-31 00:00:00.000
abc999991999-01-01 00:00:00.0001999-12-31 00:00:00.000
abc999992011-01-01 00:00:00.0002012-12-31 00:00:00.000
abc999992016-01-01 00:00:00.0002020-12-31 00:00:00.000
--Jeff Moden
Change is inevitable... Change for the better is not.
June 18, 2011 at 2:57 pm
Jeff Moden (6/18/2011)
Actually, I was able to break the Mark's second rendition...
CREATE TABLE TEST( SRC varchar(6)not null,
OrgNo varchar(5)not null,
Sdate datetime Not nUll,
Edate datetime Not Null)
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/1/1999','12/31/1999')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/1/2011','6/30/2012')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/1/2012', '12/31/2012')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/1/2016', '6/1/2017')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','6/2/2017', '12/31/2018')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/1/2019', '12/31/2020')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/1/1900', '1/1/1900')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/2/1900', '1/2/1900')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/3/1900', '1/3/1900')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/5/1900', '1/5/1900')
INSERT INTO TEST (SRC ,OrgNo ,Sdate ,Edate ) Values ('abc','99999','1/1/1901', '12/31/1901')
DECLARE @holdTable TABLE
(Rowid int IDENTITY(1,1) Not null ,
SRC varchar(6)not null,
OrgNo varchar(5)not null,
Sdate datetime Not nUll,
Edate datetime Not Null)
---insert data into temp table
INSERT INTO @holdTable (SRC ,OrgNo ,Sdate ,Edate )
SELECT SRC ,OrgNo,Sdate,Edate from Test WHERE OrgNo = '99999' AND SRC = 'abc'
;
; with diffenrentiator as
(
SELECT s.Rowid sRowID, s.SRC sSRC , s.OrgNo sOrgNo , s.Sdate sSate
,s.Edate sEdate , t.*,
case when DATEDIFF(dd, s.edate , t.Sdate) <> 1
then 1
else 0
end indicator
FROM @holdTable s
LEFT JOIN @holdTable t
ON t.Rowid = s.Rowid +1
AND s.Orgno = t.Orgno
AND s.SRC = t.SRC
)
--select * from diffenrentiator
,
sequenced As
(
select sRowid , sSRC, sORgNo , sSate,
coalesce( Edate ,sEdate ) Edate,
rndiff = srowid - ROW_NUMBER() over(partition by sSRC , sORgNo , indicator order by sRowid)
from diffenrentiator
where indicator <> 1
)
--select * from sequenced
select sSRC, sORgNo , MIN( sSate) StartDate , MAX(Edate) EndDate
from sequenced
GROUP BY
sSRC, sORgNo , rndiff
Notice that the above code only returns 3 ranges. Here's what it returned...
sSRCsORgNoStartDateEndDate
abc999992016-01-01 00:00:00.0002020-12-31 00:00:00.000
abc999991900-01-01 00:00:00.0001900-01-03 00:00:00.000
abc999991901-01-01 00:00:00.0001901-12-31 00:00:00.000
Here's what it should have returned...
SrcOrgNoSDateEDate
abc999991900-01-01 00:00:00.0001900-01-03 00:00:00.000
abc999991900-01-05 00:00:00.0001900-01-05 00:00:00.000
abc999991901-01-01 00:00:00.0001901-12-31 00:00:00.000
abc999991999-01-01 00:00:00.0001999-12-31 00:00:00.000
abc999992011-01-01 00:00:00.0002012-12-31 00:00:00.000
abc999992016-01-01 00:00:00.0002020-12-31 00:00:00.000
Mark's second rendition???? Nope not mine, ColdCoffee's offering.
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537June 18, 2011 at 3:47 pm
LutzM (6/15/2011)
You might want to have a look at the following blog for a better performing alternative:http://sqlblogcasts.com/blogs/sqlandthelike/archive/2009/08/27/sql-and-contiguous-data-ranges.aspx
Hi Lutz,
I strongly recommend that you delete that method from your briefcase. It doesn't correctly calculate overlapping date ranges like the ones in this particular thread (I've cleaned it up a bit getting ready for an article on this subject :-D). Here's the proof...
First, build the Calendar table the author of that blog is so proud of...
--===== Do this testing in a nice, safe place that everyone has
USE tempdb;
GO
--===== Create a super simplified version of a calendar table
IF OBJECT_ID('tempdb.dbo.Calendar','U') IS NOT NULL
DROP TABLE dbo.Calendar;
GO
--===== Create the simplified Calendar table
CREATE TABLE dbo.Calendar
(
dt DATETIME NOT NULL PRIMARY KEY CLUSTERED
);
GO
--===== Populate the Calendar table with 2 centuries of dates
-- using a high-speed pseudo cursor
INSERT INTO dbo.Calendar
(dt)
SELECT TOP (DATEDIFF(DAY, '19000101', '21000101'))
DATEADD(dd,ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1,'1900')
FROM sys.all_columns sc1,
sys.all_columns sc2;
GO
Now, let's create the 11 rows of data I've been using on this thread...
--===== Create the test table
CREATE TABLE dbo.Test
(
Src VARCHAR(6)NOT NULL,
OrgNo VARCHAR(5)NOT NULL,
SDate DATETIME NOT NULL,
EDate DATETIME NOT NULL
);
GO
--===== Populate the test table with known data (as opposed to random data)
INSERT INTO dbo.Test
(Src, OrgNo, SDate, EDate)
SELECT 'abc','99999','01/01/1999','12/31/1999' UNION ALL
SELECT 'abc','99999','01/01/2011','06/30/2012' UNION ALL
SELECT 'abc','99999','01/01/2012','12/31/2012' UNION ALL
SELECT 'abc','99999','01/01/2016','06/01/2017' UNION ALL
SELECT 'abc','99999','06/02/2017','12/31/2018' UNION ALL
SELECT 'abc','99999','01/01/2019','12/31/2020' UNION ALL
SELECT 'abc','99999','01/01/1900','01/01/1900' UNION ALL
SELECT 'abc','99999','01/02/1900','01/02/1900' UNION ALL
SELECT 'abc','99999','01/03/1900','01/03/1900' UNION ALL
SELECT 'abc','99999','01/05/1900','01/05/1900' UNION ALL
SELECT 'abc','99999','01/01/1901','12/31/1901'
GO
And, now, finally, let's make a column name correction in the code from that article and run it...
with cteDateList(DateCol,Grouping)
as
(
Select Calendar.Dt,
Calendar.Dt + row_number() over (order by Calendar.Dt desc)
from dbo.Test,
Calendar
where Calendar.Dt between SDate and EDate
)
Select Min(DateCol),Max(DateCol)
from cteDateList
group by Grouping
order by 1
;
GO
I'm sure that it'll run MUCH faster on your machine than my ol' war-horse, but it took a whopping 2 seconds on just 11 rows to come up with the wrong answer not to mention violating a couple of best-practices along the way. 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
June 18, 2011 at 3:55 pm
Mark-101232 (6/18/2011)
Mark's second rendition???? Nope not mine, ColdCoffee's offering.
Ack! Sorry. Thanks for the correction, Mark. I'll fix that post. :blush:
--Jeff Moden
Change is inevitable... Change for the better is not.
June 18, 2011 at 5:23 pm
Using the Calendar table I posted in one of the previous posts above, here's what the code at the link that Lutz posted should look like with a couple of very necessary corrections to make it work for this thread and just about any place else where overlapping date ranges need to be solved...
WITH
cteExplodeDateRanges AS
(
SELECT DISTINCT Src, OrgNo,
ExplodedDate = c.Dt
FROM dbo.Test test
CROSS JOIN dbo.Calendar c
WHERE c.Dt BETWEEN test.SDate AND test.EDate
)
,
cteGroupDates AS
(
SELECT Src, OrgNo, ExplodedDate,
DateGroup = ExplodedDate - ROW_NUMBER() OVER (PARTITION BY Src, OrgNo ORDER BY ExplodedDate)
FROM cteExplodeDateRanges
)
SELECT Src, OrgNo, SDate = MIN(ExplodedDate), EDate = MAX(ExplodedDate)
FROM cteGroupDates
GROUP BY Src, OrgNo, DateGroup
ORDER BY Src, OrgNo, DateGroup
;
Using the previous 11 rows already posted several times, it produces the correct answer.
If you want to have some fun with testing for performance, you can run the following code to setup a good number more rows...
WITH
cteGenRandomDates AS
(
SELECT TOP (10000)
SDate = DATEADD(dd,ABS(CHECKSUM(NEWID()))%DATEDIFF(dd,'2000','2050'),CAST('2000' AS DATETIME))
FROM sys.all_columns ac1,
sys.all_columns ac2
)
SELECT Src = 'abc',
OrgNo = '99999',
SDate,
EDate = DATEADD(dd,ABS(CHECKSUM(NEWID()))%15,SDate)
INTO dbo.Test
FROM cteGenRandomDates
;
Still, the corrected code above still takes a whopping 1.2 to 1.5 seconds to return the answer on a lousy 10,000 rows. I think I may know a better way and I'll work on it over the next day or two.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 18, 2011 at 8:02 pm
Actually, I couldn't wait. I tried using a "Modified Quirky Update" and it beat the tar out of the Calendar Table method for larger start/end date combinations.
I'm still tweekin' and testin' so I won't post anything yet. For smaller stuff, the Calendar Table method will probably do for now.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 19, 2011 at 1:52 am
Mr. Moden,
Is there any chance you might publish a book?
Please?
--SJTerrill--
June 19, 2011 at 10:19 am
SJTerrill (6/19/2011)
Mr. Moden,Is there any chance you might publish a book?
Please?
--SJTerrill--
I started one over a year ago. Got about 3 chapters into it and was overtaken by events. Since you're about the 4th person in the last couple of weeks to ask if I might publish a book, now seems like a good time to pick it back up. 🙂
Thank you for the confidence and the kudo. I'm humbled by your question.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply