Grouping Related Rows in Same Table

Question

Grouping Related Rows in Same Table

tssopa

SSC-Addicted

Points: 400
More actions
June 18, 2013 at 3:17 pm

#276474

I am hoping that someone on here can help me out with my problem. I have a single table full of shapes that may or may not intersect each other. My goal is to identify the shapes that intersect and group them together by giving them a suedo "cluster id". Here is a simplified version of the data that I am working with.
IF EXISTS (
SELECT *
FROM sys.tables
WHERE NAME = 'OverlappingShapes'
AND Schema_Name(schema_id) = 'dbo'
AND [type] = 'U'
)
DROP TABLE dbo.OverlappingShapes
GO
CREATE TABLE dbo.OverlappingShapes (ShapeID CHAR(5), Shape GEOMETRY)
GO
INSERT INTO dbo.OverlappingShapes (ShapeID, Shape)
VALUES ('A1B20', geometry::STGeomFromText('POLYGON((01 15, 02 13, 03 11, 05 13, 04 17, 02 17, 01 15))', 0))
, ('D1B19', geometry::STGeomFromText('POLYGON((03 14, 06 11, 06 16, 03 14))', 0))
, ('C5B23', geometry::STGeomFromText('POLYGON((05 04, 09 04, 09 08, 05 08, 05 04))', 0))
, ('A1B11', geometry::STGeomFromText('POLYGON((06 02, 10 06, 06 09, 06 02))', 0))
, ('D1B25', geometry::STGeomFromText('POLYGON((01 04, 02 03, 02 02, 03 02, 04 03, 03 04, 04 06, 02 07, 01 06, 02 05, 01 04))', 0))
, ('E3A22', geometry::STGeomFromText('LINESTRING(00 12, 07 12)', 0))
GO
If you run the following SQL text you should get the results listed below.
SELECT A.ShapeID AS ShapeID_A, B.ShapeID AS ShapeID_B
FROM dbo.OverlappingShapes AS A
INNER JOIN dbo.OverlappingShapes AS B
ON A.Shape.STIntersects(B.Shape) = 1
GO
ShapeID_A ShapeID_B
--------- ---------
A1B20 A1B20
D1B19 A1B20
E3A22 A1B20
A1B20 D1B19
D1B19 D1B19
E3A22 D1B19
C5B23 C5B23
A1B11 C5B23
C5B23 A1B11
A1B11 A1B11
D1B25 D1B25
A1B20 E3A22
D1B19 E3A22
E3A22 E3A22
(14 row(s) affected)
What I am trying to accomplish is something like this:
ShapeID_A ShapeID_B ClusterID
--------- --------- ---------
A1B20 A1B20 1
D1B19 A1B20 1
E3A22 A1B20 1
A1B20 D1B19 1
D1B19 D1B19 1
E3A22 D1B19 1
A1B20 E3A22 1
D1B19 E3A22 1
E3A22 E3A22 1
C5B23 C5B23 2
A1B11 C5B23 2
C5B23 A1B11 2
A1B11 A1B11 2
D1B25 D1B25 3
Or to go one step further, something like this:
ShapeID ClusterID
------- ---------
A1B20 1
D1B19 1
E3A22 1
C5B23 2
A1B11 2
D1B25 3
Any help would be greatly appreciated.
Thanks,
Mike

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 1

This solution was pretty quick to slap together and seems to do the trick, although I must admit to feeling slightly dirty posting it! :hehe:

SET NOCOUNT ON

CREATE TABLE dbo.#out (ShapeID char(5) NOT NULL, GroupID tinyint NOT NULL) --size group as needed

DECLARE @GroupID tinyint = 1, @ShapeID char(5), @Shape geometry

DECLARE Csr cursor FAST_FORWARD for

SELECT ShapeID, Shape

FROM dbo.OverlappingShapes

OPEN Csr

FETCH NEXT FROM Csr INTO @ShapeID, @Shape

WHILE (@@fetch_status = 0)

BEGIN

INSERT dbo.#out

SELECT ShapeID, @GroupID

FROM dbo.OverlappingShapes os

WHERE @Shape.STIntersects(os.Shape) = 1

AND NOT EXISTS (SELECT * FROM dbo.#out t WHERE t.ShapeID = os.ShapeID)

IF @@ROWCOUNT <> 0 --hit a new group, so increment

BEGIN

SET @GroupID += 1

END

FETCH NEXT FROM Csr INTO @ShapeID, @Shape

END

CLOSE Csr

DEALLOCATE Csr

SELECT * FROM dbo.#out

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Mark Cowne One Orange Chip Points: 26968 More actions · Answer 2

Here's another way using recursive CTEs

WITH Base AS (

SELECT A.ShapeID AS ShapeID_A, B.ShapeID AS ShapeID_B

FROM dbo.OverlappingShapes AS A

INNER JOIN dbo.OverlappingShapes AS B

ON A.Shape.STIntersects(B.Shape) = 1

AND A.ShapeID < B.ShapeID),

Recur AS (

SELECT b.ShapeID_A,

b.ShapeID_B

FROM Base b

WHERE NOT EXISTS(SELECT * FROM Base b2 WHERE b2.ShapeID_B=b.ShapeID_A)

UNION ALL

SELECT r.ShapeID_A,

b.ShapeID_B

FROM Base b

INNER JOIN Recur r ON r.ShapeID_B = b.ShapeID_A),

Results AS (

SELECT ShapeID_A,

ShapeID_B

FROM Recur

UNION

SELECT a.ShapeID,

a.ShapeID

FROM dbo.OverlappingShapes a

WHERE NOT EXISTS (SELECT * FROM Recur r WHERE r.ShapeID_B=a.ShapeID))

SELECT ShapeID_B AS ShapeID,

DENSE_RANK() OVER(ORDER BY ShapeID_A) AS ClusterID

FROM Results;

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

tssopa SSC-Addicted Points: 400 More actions · Answer 3

TheSQLGuru (6/18/2013)
This solution was pretty quick to slap together and seems to do the trick, although I must admit to feeling slightly dirty posting it! :hehe:

Thanks Kevin. I had something similar to this but just not as elegant. I really appreciate the quick response.

Mike

tssopa SSC-Addicted Points: 400 More actions · Answer 4

Mark-101232 (6/19/2013)
Here's another way using recursive CTEs

Now this is what I was hoping for. Thanks Mark! I am still trying to wrap my head around recursive CTE's and I knew there was a solution but I just couldn't see it.

Thank you very much and I appreciate the quick response.

Mike

tssopa SSC-Addicted Points: 400 More actions · Answer 5

Okay, I am a little dissapointed. I had read how efficient and fast recursive CTE's are compared to CURSOR's, however when I run the CTE against my data set of 1000+ records it takes far too long. When I run the CURSOR it only takes 4 seconds. The shape table has the ShapeID as the primary key and the shape field does have a spatial index. Any thoughts on why the recursive CTE is taking so long?

Mark Cowne One Orange Chip Points: 26968 More actions · Answer 6

tssopa (6/19/2013)
... I had read how efficient and fast recursive CTE's are compared to CURSOR's...

Typically they're not. You could try splitting out a temporary table from the CTE.

IF OBJECT_ID('tempdb..#Base') IS NOT NULL DROP TABLE #Base;

SELECT A.ShapeID AS ShapeID_A, B.ShapeID AS ShapeID_B

INTO #Base

FROM dbo.OverlappingShapes AS A

INNER JOIN dbo.OverlappingShapes AS B

ON A.Shape.STIntersects(B.Shape) = 1

AND A.ShapeID < B.ShapeID;

-- May want to add indexes to #Base here .. CREATE INDEX IX ON #Base(ShapeID_A);

WITH

Recur AS (

SELECT b.ShapeID_A,

b.ShapeID_B

FROM #Base b

WHERE NOT EXISTS(SELECT * FROM #Base b2 WHERE b2.ShapeID_B=b.ShapeID_A)

UNION ALL

SELECT r.ShapeID_A,

b.ShapeID_B

FROM #Base b

INNER JOIN Recur r ON r.ShapeID_B = b.ShapeID_A),

Results AS (

SELECT ShapeID_A,

ShapeID_B

FROM Recur

UNION

SELECT a.ShapeID,

a.ShapeID

FROM dbo.OverlappingShapes a

WHERE NOT EXISTS (SELECT * FROM Recur r WHERE r.ShapeID_B=a.ShapeID))

SELECT ShapeID_B AS ShapeID,

DENSE_RANK() OVER(ORDER BY ShapeID_A) AS ClusterID

FROM Results;

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 7

tssopa (6/19/2013)
Okay, I am a little dissapointed. I had read how efficient and fast recursive CTE's are compared to CURSOR's, however when I run the CTE against my data set of 1000+ records it takes far too long. When I run the CURSOR it only takes 4 seconds. The shape table has the ShapeID as the primary key and the shape field does have a spatial index. Any thoughts on why the recursive CTE is taking so long?

Recursive CTEs SUCK @SS and should be avoided at almost all costs, IMNSHO. I only use them when there is NO other recourse or testing shows they are optimal for a specific data processing need.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 8

tssopa (6/19/2013)
TheSQLGuru (6/18/2013)
This solution was pretty quick to slap together and seems to do the trick, although I must admit to feeling slightly dirty posting it! :hehe:
Thanks Kevin. I had something similar to this but just not as elegant. I really appreciate the quick response.
Mike

You are welcome! It was a fun little problem to tackle. Something in the back of my brain is telling me there is a set-based efficient mechanism to do this, and I (quickly) tried a number of approaches but was unsuccessful. Sometimes a cursor really is the best tool for the job (despite how horribly inefficient they are in SQL Server), and I am ALL about using the best tool for the job!! 🙂

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Mark Cowne One Orange Chip Points: 26968 More actions · Answer 9

If you change the data, replacing

, ('E3A22', geometry::STGeomFromText('LINESTRING(00 12, 07 12)', 0))

with

, ('E3A22', geometry::STGeomFromText('LINESTRING(06 12, 07 12)', 0))

so that A1B20 and D1B19 intersect,

D1B19 and E3A22 intersect

but A1B20 and E3A22 now do not intersect, the cursor and CTE solutions give different results.

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 10

Seems like further clarification of the needs is required. Should "chained" intersections be allowed/considered as one group? Or discarded.

I also note that the cursor version for your data returns:

ShapeID GroupID

------- -------

A1B20 1

D1B19 1

E3A22 2

C5B23 3

A1B11 3

D1B25 4

Since D1B19 and E3A22 intersect, it would seem apparent that a modification to the code would be required to make the output actually put BOTH of those in the same group, i.e.

ShapeID GroupID

------- -------

A1B20 1

D1B19 1

D1B19 2

E3A22 2

C5B23 3

A1B11 3

D1B25 4

Again, some logic requirements are needed for muti-grouping (such as happens for D1B19 here).

OP, what is the actual and complete desired effect of this new set of inputs??

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

tssopa SSC-Addicted Points: 400 More actions · Answer 11

TheSQLGuru (6/19/2013)
Seems like further clarification of the needs is required. Should "chained" intersections be allowed/considered as one group? Or discarded.

TheSQLGuru (6/19/2013)
Since D1B19 and E3A22 intersect, it would seem apparent that a modification to the code would be required to make the output actually put BOTH of those in the same group, i.e.
Again, some logic requirements are needed for muti-grouping (such as happens for D1B19 here).
OP, what is the actual and complete desired effect of this new set of inputs??

Multi-grouping is not allowed. If a series of shapes intersect in a "daisy chain" fashion then ultimately they all belong to the same group. Making the change that Mark describes produces the desire effect when using the recursive CTE. However, when using the CURSOR the line segment E3A22 shows as a separate group. It would be acceptable in this case but not preferred. Furthermore, since E3A22 is not present in more than one group and all shapes are accounted for, it will suffice.

Thank you Kevin and Mark.

Mike

Dwain Camps SSC Guru Points: 86978 More actions · Answer 12

TheSQLGuru (6/19/2013)
tssopa (6/19/2013)
Okay, I am a little dissapointed. I had read how efficient and fast recursive CTE's are compared to CURSOR's, however when I run the CTE against my data set of 1000+ records it takes far too long. When I run the CURSOR it only takes 4 seconds. The shape table has the ShapeID as the primary key and the shape field does have a spatial index. Any thoughts on why the recursive CTE is taking so long?
Recursive CTEs SUCK @SS and should be avoided at almost all costs, IMNSHO. I only use them when there is NO other recourse or testing shows they are optimal for a specific data processing need.

Very strong anti-rCTE statement there!

I'll protest. They are but a tool. Often times they are misused but for some things they are the best option available.

Jeff Moden once suggested to me that any rCTE can be rewritten as a set-based loop. The loop basically looks something like this (using a Temp table):

INSERT INTO #Temp

-- rCTE anchor leg

WHILE -- terminating criteria

INSERT INTO #Temp

-- rCTE recursive leg

END

This can be faster than the rCTE (I did this in one of my articles if you're interested in the proof).

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 13

Protest all you want. But do note that I was NOT unequivocal in my statements and did allow for edge-case usage. 🙂

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service