I want to select non-overlapping date ranges, which share a common value. I have prepared the following to illustrate my problem
DROP TABLE IF EXISTS #SampleRanges;
CREATE TABLE #SampleRanges (
RowNum INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
SampleDate DATETIME NOT NULL,
SampleValue BIGINT NOT NULL);
INSERT INTO #SampleRanges (SampleDate, SampleValue)
SELECT '2018-08-30 05:07:17', 31267 UNION ALL
SELECT '2018-08-30 05:07:18', 31267 UNION ALL
SELECT '2018-08-30 05:12:35', 31267 UNION ALL
SELECT '2018-08-30 07:10:10', 134159490 UNION ALL
SELECT '2018-08-30 07:13:54', 134159490 UNION ALL
SELECT '2018-08-30 07:17:19', 134159490 UNION ALL
SELECT '2018-08-30 07:22:16', 134159490 UNION ALL
SELECT '2018-08-30 07:25:42', 134159490 UNION ALL
SELECT '2018-08-30 07:28:03', 64498 UNION ALL
SELECT '2018-08-30 07:28:07', 64498 UNION ALL
SELECT '2018-08-30 08:16:14', 64498 UNION ALL
SELECT '2018-08-30 08:16:37', 134159490 UNION ALL
SELECT '2018-08-30 08:20:27', 134159490 UNION ALL
SELECT '2018-08-30 08:27:53', 460350737 UNION ALL
SELECT '2018-08-30 08:28:04', 15375002 UNION ALL
SELECT '2018-08-30 08:28:05', 15375002 UNION ALL
SELECT '2018-08-30 08:28:22', 134159490;
-- DESIRED OUTPUT
/*
RangeStartDate, RangeEndDate, SampleValue, ValueCount
'2018-08-30 05:07:17','2018-08-30 05:12:35', 31267, 3
'2018-08-30 07:10:10','2018-08-30 07:25:42', 134159490, 5
'2018-08-30 07:28:03','2018-08-30 08:16:14', 64498, 2
'2018-08-30 08:16:37','2018-08-30 08:20:27', 134159490, 2
'2018-08-30 08:27:53','2018-08-30 08:27:53', 460350737, 1
'2018-08-30 08:28:04','2018-08-30 08:28:05',15375002, 2
'2018-08-30 08:28:22','2018-08-30 08:28:22', 134159490, 1
*/
-- what I have tried
SELECT MIN(S1.SampleDate) AS RangeStartDate, MAX(S1.SampleDate) AS RangeEndDate, S1.SampleValue, COUNT(S1.SampleValue) AS ValueCount
FROM #SampleRanges AS S1
-- LEFT OUTER JOIN #SampleRanges AS S2 ON S1.SampleValue = S2.SampleValue AND S1.RowNum+1 = S2.RowNum
GROUP BY S1.SampleValue
ORDER BY RangeStartDate
-- This is nearly there, but produces a single range for Value 134159490, which overlaps other ranges, instead of 3 non-overlapping ranges for thate Value.
-- I know I have to somehow JOIN the table to itself, but the JOIN I have written makes no difference
This is a gaps and islands problem
WITH Src AS (
SELECT SampleDate, SampleValue,
ROW_NUMBER() OVER(ORDER BY SampleDate) -
ROW_NUMBER() OVER(PARTITION BY SampleValue ORDER BY SampleDate) AS grp
FROM #SampleRanges
)
SELECT MIN(SampleDate) AS RangeStartDate,
MAX(SampleDate) AS RangeEndDate,
SampleValue,
COUNT(*) AS ValueCount
FROM Src
GROUP BY grp,SampleValue
ORDER BY RangeStartDate;
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537September 3, 2019 at 3:26 pm
Brilliant. Great solution
I knew I'd come across this before, but "gaps and islands" are terms I'd forgotten , so I can now search properly for this type of problem.
September 4, 2019 at 9:04 am
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply