Parsing paired relationships.

Question

Parsing paired relationships.

Viewing 11 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply

richard.gardner 6009 SSC-Addicted Points: 484 More actions · Answer 1

Well my solution ignores the problem of 1:1 entries, as does Peter's, which runs in 4 seconds and looks to be spot on (aside from the face that it results in a comma delimited list, but I figure it's down to me to get my head around exactly what he's doing and sort that one (or else hack it post processing, which is probably what I'll end up doing), thanks Peter! ;)).

It's web developer data, which is to say we should expect it to be imperfect, anything with either field null or both fields the same are valid candidates for deletion pre-processing. There are breaks in the chain, remember, so chaining stuff together isn't necessarily going to give the right answer...

But can I just say I'm really grateful for all this help, it's awesome :w00t:

matzer SSCrazy Eights Points: 9727 More actions · Answer 2

Sorry, I thought the comma separated list was a requirement. If you just want a set of (set_id, product_id) replace

SELECT DISTINCT

set_id,

STUFF((

SELECT

', ' + CAST(Q.product_id AS VARCHAR(10))

FROM

(

SELECT DISTINCT

PS2.product_id

FROM

dbo.ProductSets PS2

WHERE

PS2.set_id = PS1.set_id

) Q

FOR XML PATH('')

), 1, 2, '') product_set

FROM

#ProductSets PS1

ORDER BY

set_id

with

SELECT DISTINCT

set_id,

product_id

FROM

#ProductSets

ORDER BY

set_id, product_id;

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 3

richard.gardner 6009 (12/22/2016)
Hi Jeff,
Here's a sample dataset, my bad, 89,000 is a different number, it's just 50,000 pairs of internal keys.
@chris-2 - Thanks for the offer, appreciated, I'll have another look at it, I did run it a couple of times and the first time I got an error >100 levels of recursion, second time it ran until I killed it.
Regards
Richard

Quote. Excellent. Thanks, Richard. Now I have something to very real to work with instead of making up something that might not be quite right.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Joe Celko SSCertifiable Points: 5749 More actions · Answer 4

I have to group these records [sic: rows are not records] together - the above data creates two groups :

Group 1 --> ProdA, ProdB, ProdC

Group 2 --> ProdD, ProdE [\quote]

Instead of mimicking assembly language pointer chains with recursive CTE's, why do not you implement your own example?

CREATE TABLE Product_Groups

(product_group_id CHAR(5) NOT NULL,

product_id CHAR(5) NOT NULL,

PRIMARY KEY (product_group_id, product_id));

These products are at one level of aggregation in your data model, and the groups are at a different level.. What I am not sure about is whether a product has to belong to one and only one group, or can be in several groups (for example, a lot of recipes include the ingredient "water"). The next question is can your groups contain groups?

Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 5

@richard-2.gardner,

I'm still playing with the data but let me make a suggestion.

The data you currently have is the result of someone creating the "opposite direction" node for every node in a nice, clean Adjacency List (Parent/Child table). That original clean list lives SOMEWHERE. I'd take the time to find out where it is because all of this would become the proverbial cake-walk if we could get our hands on that original Adjacency List.

Because of the bidirectional nature of all the nodes, it takes a comparatively very long time to find the "root" node of any given tree in the data. While it can certainly be accomplished, I have to ask if there's any other column in the table that helps identify either the "root" nodes, or the "leaf" nodes. Perhaps there's a "Level" column or a "leaf node" indication or a "base part number" that might identify the "root" nodes?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jason A. Long SSC-Insane Points: 23790 More actions · Answer 6

@jeff - It looks to be the result of some sort of matching process, where A was matched to B and B was matched to A... Resulting in 50% redundant rows.

If I'm correct, you should be able to eliminate the redundant 1/2 of the rows by deleting all rows WHERE B < A.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 7

Jason A. Long (12/23/2016)
@jeff - It looks to be the result of some sort of matching process, where A was matched to B and B was matched to A... Resulting in 50% redundant rows.
If I'm correct, you should be able to eliminate the redundant 1/2 of the rows by deleting all rows WHERE B < A.

Such duplication of edge data is used for a lot of different things. A lot of times they're used for the creation of certain types of web structures such as airline routes or mapping functionality such as MapQesst, etc. It really helps on maps because while you might be able to take the route of A,B,C, you might not be able to get back using C,B,A because of one way streets, etc. Even some wiring diagrams are supported by this type of double entry, although it's really not necessary.

And, yes, you could delete 1/2 the rows by deleting all rows where B<A... which would also successfully destroy the hierarchy because not all parents will have values greater than their children or vice versa.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 8

Hmmm... looking at the original small list of nodes in the original post, there might just be a way to Ferret out the root nodes. I'll give that a try...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 9

Jason A. Long (12/23/2016)
@jeff - It looks to be the result of some sort of matching process, where A was matched to B and B was matched to A... Resulting in 50% redundant rows.
If I'm correct, you should be able to eliminate the redundant 1/2 of the rows by deleting all rows WHERE B < A.

I wasn't bagging what you were raking. After looking at the data provided in the csv file, it's not a hierarchy at all. It's a "multi-web" where there are multiple islands of data and all points on each island are connected to all points on the same island. Now I get it. Thanks, Jason.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jason A. Long SSC-Insane Points: 23790 More actions · Answer 10

Jeff Moden (12/23/2016)
Jason A. Long (12/23/2016)
@jeff - It looks to be the result of some sort of matching process, where A was matched to B and B was matched to A... Resulting in 50% redundant rows.
If I'm correct, you should be able to eliminate the redundant 1/2 of the rows by deleting all rows WHERE B < A.
I wasn't bagging what you were raking. After looking at the data provided in the csv file, it's not a hierarchy at all. It's a "multi-web" where there are multiple islands of data and all points on each island are connected to all points on the same island. Now I get it. Thanks, Jason.

No worries. There was a fair possibility that I was way off base. I just knew that I'd seen data structured in that format when I was asked to work through some patient matching data a while back.

richard.gardner 6009 SSC-Addicted Points: 484 More actions · Answer 11

Happy Christmas guys.

Just for info there is no hierarchy, each item is at the same level, it's a relationship between items which have different PKs but are the same design, just different colours, thus "Islands Of Groups" as Jeff says. We have a Rbar solution anyway from Peter, but I would be interested in how to tackle it from a set based point of view.

I can't get into the product that controls this list, one of the joys of "software as a service" is the utter blindness of the providers when it comes to ODBC access - "You can simply use CURL requests to get the data one record at a time". Right, let's do that then, I love spending my time building web calls in Powershell, plus all the completeness and integrity stuff that comes with it (that's sarcasm, btw).

They make me feel like a dinosaur when I argue a once a day call over ODBC to get the complete recordset might be a better idea, like I'm a complete idiot for thinking a record at a time web request is a shitty way to retrieve a dataset, but I have a strong and abiding suspicion that I'm right.

As the function of the list is to accept a single key and respond with a list of group members I don't think it's particularly dysfunctional, it's just when I'm trying to recreate the groups post processing....