September 13, 2013 at 9:00 am
Hi, I have a group of date ranges and wanted to identify all of the date gaps within the ranges, outputting the dates as another date range dataset.
Example dataset SQL below:
CREATE TABLE #test (daterow int identity, obj_id int, datestart DATETIME, dateend DATETIME)
INSERT INTO #test
SELECT 1, '20130428', '20130523'
UNION
SELECT 1, '20130526', '20130823'
UNION
SELECT 1, '20130728', '20130728'
UNION
SELECT 1, '20130826', '20130830'
UNION
SELECT 2, '20130501', '20130515'
UNION
SELECT 2, '20130525', '20130830'
select * from #test
I would expect a dataset to be returned consisting of:
1, 24/05/2013, 25/05/2013
1, 24/08/2013, 25/08/2013
2, 16/05/2013, 24/05/2013
Does anyone have a good way of doing this? I have found a lot of examples of problems where I have just a single date column, and then I find the gaps in between that, but I'm having difficulty finding examples where it works with start and end date columns...
Thanks!
September 13, 2013 at 11:04 am
I'm not sure where your expected data for "2" is coming from ... but you should be able to expand this if necessary to groups
This is quick and dirty but it shows the logic. A recursive CTE was not used to create the date library due to the possibility of large date spans
/* Create a Table of all Possible Dates fro mthe First Start to the last end */
DECLARE @dateToAdd DATETIME
DECLARE @dateLast DATETIME
DECLARE @tDates TABLE (dateDate DATETIME)
SELECT @dateToAdd = MIN(datestart), @dateLast = MAX(dateEnd) FROM #test
WHILE @dateToAdd <= @dateLast
BEGIN
INSERT INTO @tDates SELECT @dateToAdd
SET @dateToadd = @dateToadd + 1
END
/* Find All Dates Not in a range */
WITH CTEMissingDates AS
(SELECT
dateDate
FROM
@tDates t1
WHERE
NOT EXISTS (SELECT daterow FROM #test WHERE t1.dateDate >= datestart AND t1.dateDate <= dateend)
),
/* The start date of the new range will be any without a date before it */
CTEMissingRange1 AS
(
SELECT
d1.dateDate
FROM
CTEMissingDates d1
LEFT JOIN CTEMissingDates d2 ON DATEADD(dd,-1,d1.dateDate) = d2.dateDate
WHERE
d2.dateDate IS NULL)
/* considering all of the mussing dates after the range start, the end of the range is the day before the first next found date */
SELECT
r1.dateDate dtStart,
DATEADD(d,-1,MIN(d.dateDate)) dtEnd
FROM
CTEMissingRange1 r1
INNER JOIN @tDates d ON r1.dateDate < d.dateDate
WHERE
d.dateDate NOT IN (SELECT dateDate FROM CTEMissingDates)
GROUP BY
r1.dateDate
Hope this helps,
Russ Baker
September 13, 2013 at 11:55 am
Sorry, the 2 was to represent that I could potentially have multiple items, and each one would have their own gaps represented, but the updated dataset never left my management studio.
I'll take a look at the example SQL, thanks.
September 13, 2013 at 12:21 pm
I suggest that you read this article by Jeff Moden
http://www.sqlservercentral.com/articles/T-SQL/71550/
For an excellent article including sample data and the code that will do what you seem to asking how to do.
September 13, 2013 at 3:21 pm
Hi,
I read that article before, but from my understanding, that's more like the inverse of what I was doing, as I want to find the gaps in between the date ranges, while the article was referring to highlighting the "islands" of dates. Is there an easy way to convert one problem into the other?
September 13, 2013 at 3:35 pm
Start with my code for finding "CTEMissingDates" ... but add a group ID for your groups (1,2, etc)
At that point you can group them into ranges using the technique from that article.
September 14, 2013 at 1:23 pm
The approach I would use is first to pack overlapping intervals and then find the gaps.
You can read about both methods in the last book from Itzik Ben-Gan about "High Performance T-SQL using Window Functions".
http://shop.oreilly.com/product/0790145323088.do
SET NOCOUNT ON;
USE tempdb;
GO
CREATE TABLE #test (
daterow int IDENTITY,
obj_id int,
datestart datetime,
dateend datetime
);
INSERT INTO #test
SELECT
1,
'20130428',
'20130523'
UNION
SELECT
1,
'20130526',
'20130823'
UNION
SELECT
1,
'20130728',
'20130728'
UNION
SELECT
1,
'20130826',
'20130830'
UNION
SELECT
2,
'20130501',
'20130515'
UNION
SELECT
2,
'20130525',
'20130830';
-- Itzik's solution for packing intervals previous to SS2012
WITH C1 AS (
SELECT
daterow,
obj_id,
datestart AS dt,
1 AS [type],
NULL AS e,
ROW_NUMBER() OVER(PARTITION BY obj_id ORDER BY datestart, daterow) AS s
FROM
#test
UNION ALL
SELECT
daterow,
obj_id,
dateend AS dt,
-1 AS [type],
ROW_NUMBER() OVER(PARTITION BY obj_id ORDER BY dateend, daterow) AS e,
NULL AS s
FROM
#test
)
, C2 AS (
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY obj_id ORDER BY dt, [type] DESC, daterow) AS se
FROM
C1
)
, C3 AS (
SELECT
*,
((ROW_NUMBER() OVER(PARTITION BY obj_id ORDER BY dt) - 1) / 2) + 1 AS grpnum
FROM
C2
WHERE
COALESCE(s - (se - s) - 1, (se - e) - e) = 0
)
SELECT
obj_id,
MIN(dt) AS datestart,
MAX(dt) AS dateend
INTO
#T
FROM
C3
GROUP BY
obj_id,
grpnum;
-- Finding gaps
SELECT
A.obj_id,
DATEADD([day], 1, A.dateend) AS dtstart,
DATEADD([day], -1, B.datestart) AS dtend
FROM
#T AS A
CROSS APPLY
(
SELECT TOP (1)
T.datestart
FROM
#T AS T
WHERE
T.obj_id = A.obj_id AND T.datestart > A.dateend
ORDER BY
T.datestart
) AS B
WHERE
DATEDIFF([day], A.dateend, B.datestart) > 1
ORDER BY
A.obj_id,
dtstart;
GO
DROP TABLE #test, #T;
GO
Result:
/*
obj_iddtstartdtend
12013-05-24 00:00:00.0002013-05-25 00:00:00.000
12013-08-24 00:00:00.0002013-08-25 00:00:00.000
22013-05-16 00:00:00.0002013-05-24 00:00:00.000
*/
September 14, 2013 at 1:24 pm
Sorry for the formatting but I have no idea how to post T-SQL code.
September 15, 2013 at 6:48 pm
Like hunchback, I'll start with Mr. Ben-Gan's interval packing approach but I believe the last part diverges from what he proposes:
WITH C1 AS (
SELECT obj_id, ts, Type
,e=CASE Type WHEN 1 THEN NULL ELSE ROW_NUMBER() OVER (PARTITION BY obj_id, Type ORDER BY dateend) END
,s=CASE Type WHEN -1 THEN NULL ELSE ROW_NUMBER() OVER (PARTITION BY obj_id, Type ORDER BY datestart) END
FROM #test
CROSS APPLY (
VALUES (1, datestart), (-1, dateend)) a(Type, ts)
),
C2 AS (
SELECT C1.*
,se=ROW_NUMBER() OVER (PARTITION BY obj_id ORDER BY ts, Type DESC)
FROM C1),
C3 AS (
SELECT obj_id, ts
,grpnm=FLOOR((ROW_NUMBER() OVER (PARTITION BY obj_id ORDER BY ts)-1) / 2 + 1)
FROM C2
WHERE COALESCE(s-(se-s)-1, (se-e)-e) = 0),
-- C1, C2, C3, C4 combined remove the overlapping date periods
C4 AS (
SELECT obj_id, datestart=MIN(ts), dateend=MAX(ts)
FROM C3
GROUP BY obj_id, grpnm)
SELECT obj_id, datestart=MIN(newdate), dateend=MAX(newdate)
FROM (
SELECT obj_id, newdate
,rn=ROW_NUMBER() OVER (PARTITION BY obj_id ORDER BY newdate) / 2
FROM C4 a
CROSS APPLY (
VALUES (datestart-1),(dateend+1)) b(newdate)
) a
GROUP BY obj_id, rn
HAVING COUNT(*) = 2
ORDER BY obj_id, datestart;
1. The first process (combination of CTEs C1-C4) is a way of removing the overlapping date intervals, which I first saw here: Interval Packing by Itzik Ben-Gan. You end up with "islands" of date ranges where the overlaps have been removed.
2. The final step is what I call the CROSS APPLY VALUES approach to Gaps from Islands, and that is described here: The SQL of Gaps and Islands in Sequences[/url]. While this article applies the approach to sequence numbers, it is equally applicable to DATETIME data. This differs a bit from what hunchback proposes (eliminates the temp table) but also works based on one pass through the C4 table.
Should be pretty fast performing too.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
September 15, 2013 at 6:54 pm
bitbucket-25253 (9/13/2013)
I suggest that you read this article by Jeff Modenhttp://www.sqlservercentral.com/articles/T-SQL/71550/
For an excellent article including sample data and the code that will do what you seem to asking how to do.
Ron - That was my initial thought also, however Jeff's technique (also described in SQL MVP Deep Dives) doesn't work when there are date duplicates and is predicated on there being a single date column. The overlapping aspect of the intervals nullifies this approach, so it could not be used to generate the islands from which I calculated the gaps.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
September 15, 2013 at 7:03 pm
hunchback (9/14/2013)
Sorry for the formatting but I have no idea how to post T-SQL code.
When you reply to the post, look on the left at the IFCode Shortcuts. Use the one that says code="sql"
My edits above were to acknowledge the fact the we both used Mr. Ben-Gan's approach to packing intervals, which I didn't notice initially in the code you posted.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
September 16, 2013 at 5:33 am
Thanks, Dwain!
--
AMB
October 30, 2015 at 8:50 am
To: dwain.c
For this data:
INSERT INTO #test
SELECT
1,
'20130428',
'20130523'
UNION
SELECT
1,
'20130526',
'20130823'
UNION
SELECT
1,
'20130728',
'20130728'
UNION
SELECT
1,
'20130826',
'20130830'
UNION
SELECT
2,
'20130501',
'20130515'
UNION
SELECT
2,
'20130516',
'20130520'
UNION
SELECT
2,
'20130525',
'20130830';
i got wrong result:
12013-05-24 00:00:00.0002013-05-25 00:00:00.000
12013-08-24 00:00:00.0002013-08-25 00:00:00.000
22013-05-15 00:00:00.0002013-05-16 00:00:00.000
22013-05-21 00:00:00.0002013-05-24 00:00:00.000
there should not be gap between 2013-05-15 and 2013-05-16
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply