June 18, 2013 at 3:17 pm
I am hoping that someone on here can help me out with my problem. I have a single table full of shapes that may or may not intersect each other. My goal is to identify the shapes that intersect and group them together by giving them a suedo "cluster id". Here is a simplified version of the data that I am working with.
IF EXISTS (
SELECT *
FROM sys.tables
WHERE NAME = 'OverlappingShapes'
AND Schema_Name(schema_id) = 'dbo'
AND [type] = 'U'
)
DROP TABLE dbo.OverlappingShapes
GO
CREATE TABLE dbo.OverlappingShapes (ShapeID CHAR(5), Shape GEOMETRY)
GO
INSERT INTO dbo.OverlappingShapes (ShapeID, Shape)
VALUES ('A1B20', geometry::STGeomFromText('POLYGON((01 15, 02 13, 03 11, 05 13, 04 17, 02 17, 01 15))', 0))
, ('D1B19', geometry::STGeomFromText('POLYGON((03 14, 06 11, 06 16, 03 14))', 0))
, ('C5B23', geometry::STGeomFromText('POLYGON((05 04, 09 04, 09 08, 05 08, 05 04))', 0))
, ('A1B11', geometry::STGeomFromText('POLYGON((06 02, 10 06, 06 09, 06 02))', 0))
, ('D1B25', geometry::STGeomFromText('POLYGON((01 04, 02 03, 02 02, 03 02, 04 03, 03 04, 04 06, 02 07, 01 06, 02 05, 01 04))', 0))
, ('E3A22', geometry::STGeomFromText('LINESTRING(00 12, 07 12)', 0))
GO
If you run the following SQL text you should get the results listed below.
SELECT A.ShapeID AS ShapeID_A, B.ShapeID AS ShapeID_B
FROM dbo.OverlappingShapes AS A
INNER JOIN dbo.OverlappingShapes AS B
ON A.Shape.STIntersects(B.Shape) = 1
GO
ShapeID_A ShapeID_B
--------- ---------
A1B20 A1B20
D1B19 A1B20
E3A22 A1B20
A1B20 D1B19
D1B19 D1B19
E3A22 D1B19
C5B23 C5B23
A1B11 C5B23
C5B23 A1B11
A1B11 A1B11
D1B25 D1B25
A1B20 E3A22
D1B19 E3A22
E3A22 E3A22
(14 row(s) affected)
What I am trying to accomplish is something like this:
ShapeID_A ShapeID_B ClusterID
--------- --------- ---------
A1B20 A1B20 1
D1B19 A1B20 1
E3A22 A1B20 1
A1B20 D1B19 1
D1B19 D1B19 1
E3A22 D1B19 1
A1B20 E3A22 1
D1B19 E3A22 1
E3A22 E3A22 1
C5B23 C5B23 2
A1B11 C5B23 2
C5B23 A1B11 2
A1B11 A1B11 2
D1B25 D1B25 3
Or to go one step further, something like this:
ShapeID ClusterID
------- ---------
A1B20 1
D1B19 1
E3A22 1
C5B23 2
A1B11 2
D1B25 3
Any help would be greatly appreciated.
Thanks,
Mike
June 18, 2013 at 9:16 pm
This solution was pretty quick to slap together and seems to do the trick, although I must admit to feeling slightly dirty posting it! :hehe:
SET NOCOUNT ON
CREATE TABLE dbo.#out (ShapeID char(5) NOT NULL, GroupID tinyint NOT NULL) --size group as needed
DECLARE @GroupID tinyint = 1, @ShapeID char(5), @Shape geometry
DECLARE Csr cursor FAST_FORWARD for
SELECT ShapeID, Shape
FROM dbo.OverlappingShapes
OPEN Csr
FETCH NEXT FROM Csr INTO @ShapeID, @Shape
WHILE (@@fetch_status = 0)
BEGIN
INSERT dbo.#out
SELECT ShapeID, @GroupID
FROM dbo.OverlappingShapes os
WHERE @Shape.STIntersects(os.Shape) = 1
AND NOT EXISTS (SELECT * FROM dbo.#out t WHERE t.ShapeID = os.ShapeID)
IF @@ROWCOUNT <> 0 --hit a new group, so increment
BEGIN
SET @GroupID += 1
END
FETCH NEXT FROM Csr INTO @ShapeID, @Shape
END
CLOSE Csr
DEALLOCATE Csr
SELECT * FROM dbo.#out
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
June 19, 2013 at 2:54 am
Here's another way using recursive CTEs
WITH Base AS (
SELECT A.ShapeID AS ShapeID_A, B.ShapeID AS ShapeID_B
FROM dbo.OverlappingShapes AS A
INNER JOIN dbo.OverlappingShapes AS B
ON A.Shape.STIntersects(B.Shape) = 1
AND A.ShapeID < B.ShapeID),
Recur AS (
SELECT b.ShapeID_A,
b.ShapeID_B
FROM Base b
WHERE NOT EXISTS(SELECT * FROM Base b2 WHERE b2.ShapeID_B=b.ShapeID_A)
UNION ALL
SELECT r.ShapeID_A,
b.ShapeID_B
FROM Base b
INNER JOIN Recur r ON r.ShapeID_B = b.ShapeID_A),
Results AS (
SELECT ShapeID_A,
ShapeID_B
FROM Recur
UNION
SELECT a.ShapeID,
a.ShapeID
FROM dbo.OverlappingShapes a
WHERE NOT EXISTS (SELECT * FROM Recur r WHERE r.ShapeID_B=a.ShapeID))
SELECT ShapeID_B AS ShapeID,
DENSE_RANK() OVER(ORDER BY ShapeID_A) AS ClusterID
FROM Results;
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537June 19, 2013 at 4:43 am
TheSQLGuru (6/18/2013)
This solution was pretty quick to slap together and seems to do the trick, although I must admit to feeling slightly dirty posting it! :hehe:
Thanks Kevin. I had something similar to this but just not as elegant. I really appreciate the quick response.
Mike
June 19, 2013 at 4:53 am
Mark-101232 (6/19/2013)
Here's another way using recursive CTEs
Now this is what I was hoping for. Thanks Mark! I am still trying to wrap my head around recursive CTE's and I knew there was a solution but I just couldn't see it.
Thank you very much and I appreciate the quick response.
Mike
June 19, 2013 at 6:09 am
Okay, I am a little dissapointed. I had read how efficient and fast recursive CTE's are compared to CURSOR's, however when I run the CTE against my data set of 1000+ records it takes far too long. When I run the CURSOR it only takes 4 seconds. The shape table has the ShapeID as the primary key and the shape field does have a spatial index. Any thoughts on why the recursive CTE is taking so long?
June 19, 2013 at 6:38 am
tssopa (6/19/2013)
... I had read how efficient and fast recursive CTE's are compared to CURSOR's...
Typically they're not. You could try splitting out a temporary table from the CTE.
IF OBJECT_ID('tempdb..#Base') IS NOT NULL DROP TABLE #Base;
SELECT A.ShapeID AS ShapeID_A, B.ShapeID AS ShapeID_B
INTO #Base
FROM dbo.OverlappingShapes AS A
INNER JOIN dbo.OverlappingShapes AS B
ON A.Shape.STIntersects(B.Shape) = 1
AND A.ShapeID < B.ShapeID;
-- May want to add indexes to #Base here .. CREATE INDEX IX ON #Base(ShapeID_A);
WITH
Recur AS (
SELECT b.ShapeID_A,
b.ShapeID_B
FROM #Base b
WHERE NOT EXISTS(SELECT * FROM #Base b2 WHERE b2.ShapeID_B=b.ShapeID_A)
UNION ALL
SELECT r.ShapeID_A,
b.ShapeID_B
FROM #Base b
INNER JOIN Recur r ON r.ShapeID_B = b.ShapeID_A),
Results AS (
SELECT ShapeID_A,
ShapeID_B
FROM Recur
UNION
SELECT a.ShapeID,
a.ShapeID
FROM dbo.OverlappingShapes a
WHERE NOT EXISTS (SELECT * FROM Recur r WHERE r.ShapeID_B=a.ShapeID))
SELECT ShapeID_B AS ShapeID,
DENSE_RANK() OVER(ORDER BY ShapeID_A) AS ClusterID
FROM Results;
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537June 19, 2013 at 7:40 am
tssopa (6/19/2013)
Okay, I am a little dissapointed. I had read how efficient and fast recursive CTE's are compared to CURSOR's, however when I run the CTE against my data set of 1000+ records it takes far too long. When I run the CURSOR it only takes 4 seconds. The shape table has the ShapeID as the primary key and the shape field does have a spatial index. Any thoughts on why the recursive CTE is taking so long?
Recursive CTEs SUCK @SS and should be avoided at almost all costs, IMNSHO. I only use them when there is NO other recourse or testing shows they are optimal for a specific data processing need.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
June 19, 2013 at 7:43 am
tssopa (6/19/2013)
TheSQLGuru (6/18/2013)
This solution was pretty quick to slap together and seems to do the trick, although I must admit to feeling slightly dirty posting it! :hehe:Thanks Kevin. I had something similar to this but just not as elegant. I really appreciate the quick response.
Mike
You are welcome! It was a fun little problem to tackle. Something in the back of my brain is telling me there is a set-based efficient mechanism to do this, and I (quickly) tried a number of approaches but was unsuccessful. Sometimes a cursor really is the best tool for the job (despite how horribly inefficient they are in SQL Server), and I am ALL about using the best tool for the job!! 🙂
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
June 19, 2013 at 8:06 am
If you change the data, replacing
, ('E3A22', geometry::STGeomFromText('LINESTRING(00 12, 07 12)', 0))
with
, ('E3A22', geometry::STGeomFromText('LINESTRING(06 12, 07 12)', 0))
so that A1B20 and D1B19 intersect,
D1B19 and E3A22 intersect
but A1B20 and E3A22 now do not intersect, the cursor and CTE solutions give different results.
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537June 19, 2013 at 10:47 am
Seems like further clarification of the needs is required. Should "chained" intersections be allowed/considered as one group? Or discarded.
I also note that the cursor version for your data returns:
ShapeID GroupID
------- -------
A1B20 1
D1B19 1
E3A22 2
C5B23 3
A1B11 3
D1B25 4
Since D1B19 and E3A22 intersect, it would seem apparent that a modification to the code would be required to make the output actually put BOTH of those in the same group, i.e.
ShapeID GroupID
------- -------
A1B20 1
D1B19 1
D1B19 2
E3A22 2
C5B23 3
A1B11 3
D1B25 4
Again, some logic requirements are needed for muti-grouping (such as happens for D1B19 here).
OP, what is the actual and complete desired effect of this new set of inputs??
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
June 19, 2013 at 1:16 pm
TheSQLGuru (6/19/2013)
Seems like further clarification of the needs is required. Should "chained" intersections be allowed/considered as one group? Or discarded.
TheSQLGuru (6/19/2013)
Since D1B19 and E3A22 intersect, it would seem apparent that a modification to the code would be required to make the output actually put BOTH of those in the same group, i.e.Again, some logic requirements are needed for muti-grouping (such as happens for D1B19 here).
OP, what is the actual and complete desired effect of this new set of inputs??
Multi-grouping is not allowed. If a series of shapes intersect in a "daisy chain" fashion then ultimately they all belong to the same group. Making the change that Mark describes produces the desire effect when using the recursive CTE. However, when using the CURSOR the line segment E3A22 shows as a separate group. It would be acceptable in this case but not preferred. Furthermore, since E3A22 is not present in more than one group and all shapes are accounted for, it will suffice.
Thank you Kevin and Mark.
Mike
July 4, 2013 at 7:19 pm
TheSQLGuru (6/19/2013)
tssopa (6/19/2013)
Okay, I am a little dissapointed. I had read how efficient and fast recursive CTE's are compared to CURSOR's, however when I run the CTE against my data set of 1000+ records it takes far too long. When I run the CURSOR it only takes 4 seconds. The shape table has the ShapeID as the primary key and the shape field does have a spatial index. Any thoughts on why the recursive CTE is taking so long?Recursive CTEs SUCK @SS and should be avoided at almost all costs, IMNSHO. I only use them when there is NO other recourse or testing shows they are optimal for a specific data processing need.
Very strong anti-rCTE statement there!
I'll protest. They are but a tool. Often times they are misused but for some things they are the best option available.
Jeff Moden once suggested to me that any rCTE can be rewritten as a set-based loop. The loop basically looks something like this (using a Temp table):
INSERT INTO #Temp
-- rCTE anchor leg
WHILE -- terminating criteria
INSERT INTO #Temp
-- rCTE recursive leg
END
This can be faster than the rCTE (I did this in one of my articles if you're interested in the proof).
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
July 5, 2013 at 8:35 am
Protest all you want. But do note that I was NOT unequivocal in my statements and did allow for edge-case usage. 🙂
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply