December 22, 2016 at 10:05 am
Well my solution ignores the problem of 1:1 entries, as does Peter's, which runs in 4 seconds and looks to be spot on (aside from the face that it results in a comma delimited list, but I figure it's down to me to get my head around exactly what he's doing and sort that one (or else hack it post processing, which is probably what I'll end up doing), thanks Peter! ;)).
It's web developer data, which is to say we should expect it to be imperfect, anything with either field null or both fields the same are valid candidates for deletion pre-processing. There are breaks in the chain, remember, so chaining stuff together isn't necessarily going to give the right answer...
But can I just say I'm really grateful for all this help, it's awesome :w00t:
December 22, 2016 at 10:32 am
Sorry, I thought the comma separated list was a requirement. If you just want a set of (set_id, product_id) replace
SELECT DISTINCT
set_id,
STUFF((
SELECT
', ' + CAST(Q.product_id AS VARCHAR(10))
FROM
(
SELECT DISTINCT
PS2.product_id
FROM
dbo.ProductSets PS2
WHERE
PS2.set_id = PS1.set_id
) Q
FOR XML PATH('')
), 1, 2, '') product_set
FROM
#ProductSets PS1
ORDER BY
set_id
with
SELECT DISTINCT
set_id,
product_id
FROM
#ProductSets
ORDER BY
set_id, product_id;
December 22, 2016 at 1:56 pm
richard.gardner 6009 (12/22/2016)
Hi Jeff,Here's a sample dataset, my bad, 89,000 is a different number, it's just 50,000 pairs of internal keys.
@chris-2 - Thanks for the offer, appreciated, I'll have another look at it, I did run it a couple of times and the first time I got an error >100 levels of recursion, second time it ran until I killed it.
Regards
Richard
Quote. Excellent. Thanks, Richard. Now I have something to very real to work with instead of making up something that might not be quite right.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 23, 2016 at 8:50 am
I have to group these records [sic: rows are not records] together - the above data creates two groups :
Group 1 --> ProdA, ProdB, ProdC
Group 2 --> ProdD, ProdE [\quote]
Instead of mimicking assembly language pointer chains with recursive CTE's, why do not you implement your own example?
CREATE TABLE Product_Groups
(product_group_id CHAR(5) NOT NULL,
product_id CHAR(5) NOT NULL,
PRIMARY KEY (product_group_id, product_id));
These products are at one level of aggregation in your data model, and the groups are at a different level.. What I am not sure about is whether a product has to belong to one and only one group, or can be in several groups (for example, a lot of recipes include the ingredient "water"). The next question is can your groups contain groups?
Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
December 23, 2016 at 1:16 pm
@richard.gardner,
I'm still playing with the data but let me make a suggestion.
The data you currently have is the result of someone creating the "opposite direction" node for every node in a nice, clean Adjacency List (Parent/Child table). That original clean list lives SOMEWHERE. I'd take the time to find out where it is because all of this would become the proverbial cake-walk if we could get our hands on that original Adjacency List.
Because of the bidirectional nature of all the nodes, it takes a comparatively very long time to find the "root" node of any given tree in the data. While it can certainly be accomplished, I have to ask if there's any other column in the table that helps identify either the "root" nodes, or the "leaf" nodes. Perhaps there's a "Level" column or a "leaf node" indication or a "base part number" that might identify the "root" nodes?
--Jeff Moden
Change is inevitable... Change for the better is not.
December 23, 2016 at 3:29 pm
@jeff - It looks to be the result of some sort of matching process, where A was matched to B and B was matched to A... Resulting in 50% redundant rows.
If I'm correct, you should be able to eliminate the redundant 1/2 of the rows by deleting all rows WHERE B < A.
December 23, 2016 at 5:31 pm
Jason A. Long (12/23/2016)
@Jeff - It looks to be the result of some sort of matching process, where A was matched to B and B was matched to A... Resulting in 50% redundant rows.If I'm correct, you should be able to eliminate the redundant 1/2 of the rows by deleting all rows WHERE B < A.
Such duplication of edge data is used for a lot of different things. A lot of times they're used for the creation of certain types of web structures such as airline routes or mapping functionality such as MapQesst, etc. It really helps on maps because while you might be able to take the route of A,B,C, you might not be able to get back using C,B,A because of one way streets, etc. Even some wiring diagrams are supported by this type of double entry, although it's really not necessary.
And, yes, you could delete 1/2 the rows by deleting all rows where B<A... which would also successfully destroy the hierarchy because not all parents will have values greater than their children or vice versa.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 23, 2016 at 5:59 pm
Hmmm... looking at the original small list of nodes in the original post, there might just be a way to Ferret out the root nodes. I'll give that a try...
--Jeff Moden
Change is inevitable... Change for the better is not.
December 23, 2016 at 8:18 pm
Jason A. Long (12/23/2016)
@Jeff - It looks to be the result of some sort of matching process, where A was matched to B and B was matched to A... Resulting in 50% redundant rows.If I'm correct, you should be able to eliminate the redundant 1/2 of the rows by deleting all rows WHERE B < A.
I wasn't bagging what you were raking. After looking at the data provided in the csv file, it's not a hierarchy at all. It's a "multi-web" where there are multiple islands of data and all points on each island are connected to all points on the same island. Now I get it. Thanks, Jason.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 27, 2016 at 8:34 am
Jeff Moden (12/23/2016)
Jason A. Long (12/23/2016)
@Jeff - It looks to be the result of some sort of matching process, where A was matched to B and B was matched to A... Resulting in 50% redundant rows.If I'm correct, you should be able to eliminate the redundant 1/2 of the rows by deleting all rows WHERE B < A.
I wasn't bagging what you were raking. After looking at the data provided in the csv file, it's not a hierarchy at all. It's a "multi-web" where there are multiple islands of data and all points on each island are connected to all points on the same island. Now I get it. Thanks, Jason.
No worries. There was a fair possibility that I was way off base. I just knew that I'd seen data structured in that format when I was asked to work through some patient matching data a while back.
December 28, 2016 at 2:34 am
Happy Christmas guys.
Just for info there is no hierarchy, each item is at the same level, it's a relationship between items which have different PKs but are the same design, just different colours, thus "Islands Of Groups" as Jeff says. We have a Rbar solution anyway from Peter, but I would be interested in how to tackle it from a set based point of view.
I can't get into the product that controls this list, one of the joys of "software as a service" is the utter blindness of the providers when it comes to ODBC access - "You can simply use CURL requests to get the data one record at a time". Right, let's do that then, I love spending my time building web calls in Powershell, plus all the completeness and integrity stuff that comes with it (that's sarcasm, btw).
They make me feel like a dinosaur when I argue a once a day call over ODBC to get the complete recordset might be a better idea, like I'm a complete idiot for thinking a record at a time web request is a shitty way to retrieve a dataset, but I have a strong and abiding suspicion that I'm right.
As the function of the list is to accept a single key and respond with a list of group members I don't think it's particularly dysfunctional, it's just when I'm trying to recreate the groups post processing....
Viewing 11 posts - 16 through 25 (of 25 total)
You must be logged in to reply to this topic. Login to reply