Problem
Combining overlapping or continuous intervals could be a discouraging task even for advanced programmers. This article presents a new solution based on usage of the geometry data type and an unknown function (if not, maybe less used function) specifically designed for geometry objects.
Computing the overall period when every user (of some website) was connected (logged in) is just one of those examples where this SQL problem must have a performant solution. This problem is also recognized under the following names: Gaps and Islands or Packing Date and Time Intervals (source 1, source 2, source 3, source 4).
While this solution is using INT columns, we could adapt simple to DATE, DATETIME, TIME intervals by converting these values into INT(BIGINT) thus: {ts '2020-01-30 14:30:00'} becomes 20200130143000
Given the following intervals (1, 5), (4, 30), (67, 80), (63, 70), the question is how could we combine them easily into nonoverlapping intervals, thus: (1,30), (63,80)?
Proposed Solution
In order to combine these intervals, first, we are going to transform every interval (Start, End) into a polygon object thus:
POLYGON((Start 0, End 0, End 1, Start 1, Start 0)):
(Start 1)-(End 1)
| |
(Start 0)-(End 0)
Thus, first interval (1, 5) become POLYGON((1 0,5 0,5 1,1 1,1 0)):
Note 1:
The general syntax for a simple polygon is
POLYGON((Point1X Point1Y,Point2X Point2Y,Point3X Point13Y,Point4X Point4Y,Point1X Point1Y)): (Point4X Point4Y)-(Point3X Point3Y) | | (Point1X Point1Y)-(Point2X Point2Y)
Note 2: Polygons are closed objects, this means that first point and last point are the same.
Then, we are going to use the UnionAggregate() spatial function to combine these polygons (intervals). Next polygons
POLYGON((1 0,5 0,5 1,1 1,1 0)) and POLYGON((4 0,30 0,30 1,4 1,4 0)) (interval (1, 5) and (4, 30))
are going to be combined into a single polygon
POLYGON ((1 0, 4 0, 5 0, 30 0, 30 1, 5 1, 4 1, 1 1, 1 0))
SELECT GEOMETRY::UnionAggregate(s.Pol).ToString() FROM ( SELECT Pol = GEOMETRY::STPolyFromText('POLYGON((1 0,5 0,5 1,1 1,1 0))', 0) UNION ALL SELECT Pol = GEOMETRY::STPolyFromText('POLYGON((4 0,30 0,30 1,4 1,4 0))', 0) ) s
Then, all those 4 polygons are going to be combined into non-overlapping polygons:
MULTIPOLYGON ( ((1 0, 4 0, 5 0, 30 0, 30 1, 5 1, 4 1, 1 1, 1 0)), ((63 0, 67 0, 70 0, 80 0, 80 1, 70 1, 67 1, 63 1, 63 0)) )
The problem is that the resulting polygon (the result of calling the UnionAggregate() function) contains all points and not only the extremities points. Next step of this process is to get the bounding box of this polygon using the STEnvelope() spatial geometry method, thus POLYGON(1 0, 4 0, 5 0, 30 0, 30 1, 5 1, 4 1, 1 1, 1 0) becomes POLYGON((1 0,30 0,30 1,1 1,1 0)):
Final step of this process suppose to decompress these polygons back into ranges using the STPointN(index) property used to return a single point from polygon and also using two properties specific to every point STX and STY. These properties are used to get the first point (Start) and the second point of every interval polygon (End) thus:
POLYGON((Point1X, Point1Y), (Point2X, Point2Y), .. POLYGON((Start 0, End 0, .. Start = 1 End = 30
Source Code
The final source code is:
SELECT * INTO Tab FROM ( SELECT POL = GEOMETRY::STPolyFromText('POLYGON((1 0,5 0,5 1,1 1,1 0))', 0) UNION ALL SELECT POL = GEOMETRY::STPolyFromText('POLYGON((4 0,30 0,30 1,4 1,4 0))', 0) UNION ALL SELECT POL = GEOMETRY::STPolyFromText('POLYGON((67 0,80 0,80 1,67 1,67 0))', 0) UNION ALL SELECT POL = GEOMETRY::STPolyFromText('POLYGON((63 0,70 0,70 1,63 1,63 0))', 0) ) s DECLARE @pol GEOMETRY; SELECT @pol = GEOMETRY::UnionAggregate(t.POL) FROM Tab t; SELECT PK = ptl.number, [Start] = @pol.STGeometryN(ptl.number).STEnvelope().STPointN(1).STX, [End] = @pol.STGeometryN(ptl.number).STEnvelope().STPointN(2).STX, Polygon = @pol.STGeometryN(ptl.number).ToString() FROM master.dbo.spt_values ptl WHERE ptl.type = N'P' AND ptl.number >= 1 AND ptl.number <= @pol.STNumGeometries()
Note:
- spt_values.type = ‘P’ returns a list with sequential increasing numbers starting with 1.
- STNumGeometries() returns the number of polygons and STGeometryN(index) returns a single polygon object from a collection (@pol) of polygons.
Conclusion
This article describes a new solution based on the usage of polygon objects and UnionAggregate() function used to combine those objects that have common zones.