identifying many to many relationship between columns

Question

identifying many to many relationship between columns

lmeinke

SSCommitted

Points: 1956
More actions
November 11, 2013 at 1:23 pm

#279015

Hello All,
I am struggling with something that is conceptually simple but I'm at a loss at how to implement it. I have two columns with a many to many relationship between the columns;
ITEM1 ITEM2
0224180 0224181
0224180 0224190
0224181 0224180
0224181 0224190
0224190 0224180
0224190 0224181
0202294 0202295
0202295 0202294
0209250 0209251
0209251 0209250
I need to add a single identifier that will allow these cross referenced parts to be queried. The desired outcome would be something like;
ITEM1 ITEM2 ID
0224180 0224181 1
0224180 0224190 1
0224181 0224180 1
0224181 0224190 1
0224190 0224180 1
0224190 0224181 1
0202294 0202295 2
0202295 0202294 2
0209250 0209251 3
0209251 0209250 3
I hope this is clear and can attach a script or data if necessary.
Thanks in Advance,
Lonnie

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply

pietlinden SSC Guru Points: 63503 More actions · Answer 1

Not sure you can show a many-to-many relationship without a child table. That's just the standard way to do it... the standard example is

Invoice--(1,M)--LineItem---(M,1)--Product

where the LineItem table breaks the M-M relationship between Invoice and Product into two 1-M relationships.

Dwain Camps SSC Guru Points: 86948 More actions · Answer 2

PL is probably correct in the way this should be handled from a theoretical perspective.

On the other hand, this may work but also may not be particularly performant.

DECLARE @T1 TABLE (ITEM1 INT, ITEM2 INT);

INSERT INTO @T1

SELECT 0224180,0224181

UNION ALL SELECT 0224180,0224190

UNION ALL SELECT 0224181,0224180

UNION ALL SELECT 0224181,0224190

UNION ALL SELECT 0224190,0224180

UNION ALL SELECT 0224190,0224181

UNION ALL SELECT 0202294,0202295

UNION ALL SELECT 0202295,0202294

UNION ALL SELECT 0209250,0209251

UNION ALL SELECT 0209251,0209250

UNION ALL SELECT 1, 2

UNION ALL SELECT 1, 3

UNION ALL SELECT 1, 4

UNION ALL SELECT 2, 3

UNION ALL SELECT 2, 4

UNION ALL SELECT 3, 4

UNION ALL SELECT 2, 1

UNION ALL SELECT 3, 1

UNION ALL SELECT 4, 1

UNION ALL SELECT 3, 2

UNION ALL SELECT 4, 2

UNION ALL SELECT 4, 3

UNION ALL SELECT 7, 8

UNION ALL SELECT 7, 9

UNION ALL SELECT 8, 9

UNION ALL SELECT 8, 7

UNION ALL SELECT 9, 7

UNION ALL SELECT 9, 8

;

WITH rCTE AS

(

SELECT n=0, ITEM

,rn=ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

,ITEM1=ITEM

,ITEM2=NULL

FROM

(

SELECT DISTINCT ITEM=MIN(b.ITEM1)

FROM

(

SELECT ITEM1, ITEM2, rn=ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

FROM @T1

) a

CROSS APPLY

(

SELECT ITEM1=CASE WHEN ITEM1 < ITEM2 THEN ITEM1 ELSE ITEM2 END

,ITEM2=CASE WHEN ITEM1 < ITEM2 THEN ITEM2 ELSE ITEM1 END

FROM @T1 b

WHERE a.ITEM1 IN (b.ITEM1, b.ITEM2)

) b

GROUP BY rn

) a

UNION ALL

SELECT n+1, b.ITEM1, rn, b.ITEM1, b.ITEM2

FROM rCTE a

JOIN @T1 b ON a.ITEM = b.ITEM1

WHERE n <= 1

UNION ALL

SELECT n+1, b.ITEM1, rn, b.ITEM1, b.ITEM2

FROM rCTE a

JOIN @T1 b ON a.ITEM = b.ITEM2

WHERE n <= 1

)

SELECT DISTINCT ITEM1, ITEM2, ID=rn

FROM rCTE

WHERE n = 2

ORDER BY rn, ITEM1, ITEM2

Note that I added some additional sample data because I wanted to see how it would resolve out with an additional set of triplet product IDs and also a 4-tuple of product IDs.

I'm assuming you don't care which set of grouped products gets which ID. Just that each group is numbered independently.

Edit: Slightly simplified the query. Note that this may only work if all permutations appear in the adjacency list.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

lmeinke SSCommitted Points: 1956 More actions · Answer 3

Thank you Dwain. I'll read through the code and give it a try this morning!

lmeinke SSCommitted Points: 1956 More actions · Answer 4

On a quick read through I noticed your data type is int. This would drop the leading zeros on many of my skus. When I run this with the varchar type I get:

Msg 240, Level 16, State 1, Line 1

Types don't match between the anchor and the recursive part in column "STD SKU" of recursive query "rCTE".

I'll keep working through this but thought I'd throw that out there if you have a quick reply.

Thanks,

lmeinke SSCommitted Points: 1956 More actions · Answer 5

I had another suggestion (from a different forum) that provided a solution. It may not be the most eloquent but it ran on 98K rows in 8 minutes.

This is from "Lamprey" on the SQLTeam .com forum:

DECLARE @Foo TABLE (ITEM1 INT, ITEM2 INT, ID INT)

INSERT @Foo (ITEM1, ITEM2) VALUES

(0224180, 0224181),

(0224180, 0224190),

(0224181, 0224180),

(0224181, 0224190),

(0224190, 0224180),

(0224190, 0224181),

(0202294, 0202295),

(0202295, 0202294),

(0209250, 0209251),

(0209251, 0209250)

DECLARE @Val1 INT;

DECLARE @Val2 INT;

DECLARE @Iterator INT = 1;

-- Prime Loop

SELECT TOP 1

@Val1 = Item1,

@Val2 = Item2

FROM

@Foo

ORDER BY

Item1

WHILE @Val1 IS NOT NULL

BEGIN

-- Perform Update

UPDATE

@Foo

SET

ID = @Iterator

WHERE

Item1 IN (@Val1, @Val2)

OR Item2 IN (@Val1, @Val2);

-- Get next item/group

SELECT TOP 1

@Val1 = Item1,

@Val2 = Item2

FROM

@Foo

WHERE

Item1 NOT IN (@Val1, @Val2)

AND Item2 NOT IN (@Val1, @Val2)

AND ID IS NULL

ORDER BY

Item1

IF @@ROWCOUNT = 0

BEGIN

SET @Val1 = NULL;

END

SET @Iterator = @Iterator + 1;

END

SELECT *

FROM @Foo

Dwain Camps SSC Guru Points: 86948 More actions · Answer 6

lmeinke (11/12/2013)
On a quick read through I noticed your data type is int. This would drop the leading zeros on many of my skus. When I run this with the varchar type I get:
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "STD SKU" of recursive query "rCTE".
I'll keep working through this but thought I'd throw that out there if you have a quick reply.
Thanks,

That error is pretty simple to resolve by CASTing the NULL assignment to ITEM2 in the rCTE anchor to a VARCHAR type, the same as your item codes.

DECLARE @T1 TABLE (ITEM1 VARCHAR(20), ITEM2 VARCHAR(20));

INSERT INTO @T1

SELECT '0224180','0224181'

UNION ALL SELECT '0224180','0224190'

UNION ALL SELECT '0224181','0224180'

UNION ALL SELECT '0224181','0224190'

UNION ALL SELECT '0224190','0224180'

UNION ALL SELECT '0224190','0224181'

UNION ALL SELECT '0202294','0202295'

UNION ALL SELECT '0202295','0202294'

UNION ALL SELECT '0209250','0209251'

UNION ALL SELECT '0209251','0209250';

WITH rCTE AS

(

SELECT n=0, ITEM

,rn=ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

,ITEM1=ITEM

,ITEM2=CAST(NULL AS VARCHAR(20))

FROM

(

SELECT DISTINCT ITEM=MIN(b.ITEM1)

FROM

(

SELECT ITEM1, ITEM2, rn=ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

FROM @T1

) a

CROSS APPLY

(

SELECT ITEM1=CASE WHEN ITEM1 < ITEM2 THEN ITEM1 ELSE ITEM2 END

,ITEM2=CASE WHEN ITEM1 < ITEM2 THEN ITEM2 ELSE ITEM1 END

FROM @T1 b

WHERE a.ITEM1 IN (b.ITEM1, b.ITEM2)

) b

GROUP BY rn

) a

UNION ALL

SELECT n+1, b.ITEM1, rn, b.ITEM1, b.ITEM2

FROM rCTE a

JOIN @T1 b ON a.ITEM = b.ITEM1

WHERE n <= 1

UNION ALL

SELECT n+1, b.ITEM1, rn, b.ITEM1, b.ITEM2

FROM rCTE a

JOIN @T1 b ON a.ITEM = b.ITEM2

WHERE n <= 1

)

SELECT DISTINCT ITEM1, ITEM2, ID=rn

FROM rCTE

WHERE n = 2

ORDER BY rn, ITEM1, ITEM2;

I would love to hear how this performs against your 98K rows.

Edit: Minor edit to the sample data.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86948 More actions · Answer 7

BTW. I thought this was a really fun problem. Best I've seen on the forums in weeks.

I have an idea for another way to solve it that might be just a tad faster than my rCTE, which I'm betting will beat the pants off the RBAR UPDATE that other forum suggested. I'm talking about the rCTE I already posted that is.

Setting up a performance test harness might be a bit of a challenge. Since you have piqued my interest, I may continue on it for a bit.

Also, about the note where I indicated the product combinations present must include all permutations. I have a way around that now also, in case you need it.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

lmeinke SSCommitted Points: 1956 More actions · Answer 8

Dwain,

I appreciate your interest and effort on this. I would like to find a more efficient way to handle this but after changing the ITEM1 and ITEM2 to the appropriate column names and changing @T1 to my actual table name I get these errors;

Msg 102, Level 15, State 1, Line 18

Incorrect syntax near '='.

Msg 156, Level 15, State 1, Line 29

Incorrect syntax near the keyword 'WHERE'.

Msg 156, Level 15, State 1, Line 34

Incorrect syntax near the keyword 'WHERE'.

I will try to get back to this this afternoon but thought I owed you a response.

lmeinke SSCommitted Points: 1956 More actions · Answer 9

Correction....those errors were due to my sloppy typing....but back to the original error:

Msg 240, Level 16, State 1, Line 1

Types don't match between the anchor and the recursive part in column "STD SKU" of recursive query "rCTE".

autoexcrement SSCertifiable Points: 5956 More actions · Answer 10

Can someone please help me to catch up? I feel I'm not understanding the issue. Using the OP's initial data:

ITEM1 ITEM2 ID

0224180 0224181 1

0224180 0224190 1

0224181 0224180 1

0224181 0224190 1

0224190 0224180 1

0224190 0224181 1

0202294 0202295 2

0202295 0202294 2

0209250 0209251 3

0209251 0209250 3

Why, for example, does the last value pair "0209251, 0209250" get an ID of 3? (This value pair appears only 2 times in the data.) Or why does the value pair "0224190, 0224181" get an ID of 1? (This value pair appears twice in the data.)

Thanks, am hoping to grasp what's going on here so I can learn from it too! 🙂

"If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

Dwain Camps SSC Guru Points: 86948 More actions · Answer 11

lmeinke (11/13/2013)
Correction....those errors were due to my sloppy typing....but back to the original error:
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "STD SKU" of recursive query "rCTE".

In my original post, the rCTE started like this:

WITH rCTE AS

(

SELECT n=0, ITEM

,rn=ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

,ITEM1=ITEM

,ITEM2=NULL

FROM

In my suggested correction for this error, it looks like this.

WITH rCTE AS

(

SELECT n=0, ITEM

,rn=ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

,ITEM1=ITEM

,ITEM2=CAST(NULL AS VARCHAR(20))

FROM

So did you remember to add the CAST around the NULL for ITEM2? The length of the VARCHAR should match whatever the length of your product code is in the table.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86948 More actions · Answer 12

autoexcrement (11/13/2013)
Can someone please help me to catch up? I feel I'm not understanding the issue. Using the OP's initial data:
ITEM1 ITEM2 ID
0224180 0224181 1
0224180 0224190 1
0224181 0224180 1
0224181 0224190 1
0224190 0224180 1
0224190 0224181 1
0202294 0202295 2
0202295 0202294 2
0209250 0209251 3
0209251 0209250 3
Why, for example, does the last value pair "0209251, 0209250" get an ID of 3? (This value pair appears only 2 times in the data.) Or why does the value pair "0224190, 0224181" get an ID of 1? (This value pair appears twice in the data.)
Thanks, am hoping to grasp what's going on here so I can learn from it too! 🙂

I'll try to explain. Look at the last 2 rows containing 0209250 and 0209251. These share a relationship in that either product can be substituted for the other. The same is true for 0202295 and 0202294 (in the rows with ID = 2).

The first 6 rows all contain a pair chosen from any of these 3 products: 0224180, 0224181 and 0224190. Hence they share a relationship that any can be substituted for any other.

Therefore ID = Product Substitution Category (or something like that)

This is an excellent SQL problem to learn from as it is fiendishly diabolical to solve with high performance.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

autoexcrement SSCertifiable Points: 5956 More actions · Answer 13

Wow, I never in a million years would have been able to interpret the data in that way. I still don't exactly understand how you were able to see this pattern there, or how you determined the parameters of it. Hats off, and thanks for the explanation. I can't even begin to fathom the posted SQL solution. 🙁

"If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

Dwain Camps SSC Guru Points: 86948 More actions · Answer 14

autoexcrement (11/13/2013)
Wow, I never in a million years would have been able to interpret the data in that way. I still don't exactly understand how you were able to see this pattern there, or how you determined the parameters of it. Hats off, and thanks for the explanation. I can't even begin to fathom the posted SQL solution. 🙁

The solution is even more fiendish if only limited substitutions are allowed.

Think about it another way. Each product in the two column table represents a node on a graph. Each row of the table represents an edge that connects two nodes in a specified direction. For the case of IDs 2 and 3, the 2 rows for each represent a single bi-directional edge on the graph. In a graph where all substitutions are allowed, there will exist exactly n (number of products) * (n - 1) edges. Limited substitution means there are less than that.

Having a little background in graph theory helps.

Edit: Added parentheses around (n - 1) to override the standard operator precedence for clarity.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St