Traversing down a tree

Question

Traversing down a tree

mark.worthen

SSC-Addicted

Points: 477
More actions
February 9, 2015 at 2:53 pm

#304010

I'm working with a development bug database where if someone puts in a bug number, they want to get the parent bug. No problem.
SET @cur_parent = (SELECTdupeof_key
FROMqds.dbo.Dimduplicates d
WHEREd.dupe_key = @checkbug
ANDd.end_ts IS NULL)
WHILE @cur_parent IS NOT NULL
BEGIN
SET @parent_bug_id = @cur_parent
SET @cur_parent = (SELECTdupeof_key
FROMqds.dbo.Dimduplicates d
WHEREd.dupe_key = @cur_parent
ANDd.end_ts IS NULL)
END
However, now they want to be able to do the same thing in reverse. Put in a bug and determine how many bugs are duplicates (either directly - children, or indirectly - grandchildren/great-grandchildren, etc.).
The majority of bugs go no more than three generations deep but there are some that go much deeper so I need to have something where I can loop through each potential generation and get the number of bugs associated with it.
Example.
Parent Bug Child Grandchild GGChild
1 2 3 4
1 5
1 28 32
1 40 41 42
1 50
1 60
1 65 70
If I put in bug 1 I should be able to get a count of 13 associated duplicate bugs. Additionally, they want to be able to get a list of the bugs that are associated with that parent.
If there were a set number of levels, self-joining would work to create a generational matrix. However, as there is no limit on how deep the generations can go, I need something dynamic to traverse the tree and generate the list of bugs/count of bugs.
It was easy going up the tree because there was a strict one-to-one relationship between child and parent. However, going back down in a dynamic fashion I'm just not visualizing a solution.
Thanks for any help you can offer.

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply

Alan Burstein SSC Guru Points: 61152 More actions · Answer 1

I periodically have to do this kind of thing. Take a look at:

Hierarchies on Steroids #1: Convert an Adjacency List to Nested Sets[/url] by Jeff Moden. This has helped me a great deal.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

mark.worthen SSC-Addicted Points: 477 More actions · Answer 2

Thanks Alan.

Just opened and it looks quite in-depth. Will make for some good reading. 🙂

Alan Burstein SSC Guru Points: 61152 More actions · Answer 3

Quite frankly, I had to read it a few times to get it. It is extremely well-written but the subject matter is a bit heavy in my opinion. Well worth the read though.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Dwain Camps SSC Guru Points: 86978 More actions · Answer 4

Assuming your "tree" is an adjacency list, traversing down is nearly the same as traversing up.

So, is this something like what you want?

WITH SampleData (parent, child) AS

(

SELECT 1,2

UNION ALL SELECT 2, 3

UNION ALL SELECT 3, 4

UNION ALL SELECT 1, 5

UNION ALL SELECT 1, 28

UNION ALL SELECT 28, 32

UNION ALL SELECT 1, 40

UNION ALL SELECT 40, 41

UNION ALL SELECT 41, 42

UNION ALL SELECT 1, 50

UNION ALL SELECT 1, 60

UNION ALL SELECT 1, 65

UNION ALL SELECT 65, 70

),

TraverseDown AS

(

SELECT n=1, rn=ROW_NUMBER() OVER (ORDER BY child)

,parent, child, s=CAST(parent AS VARCHAR(8000)) + '>' + CAST(child AS VARCHAR(8000))

FROM SampleData a

WHERE parent = 1

UNION ALL

SELECT n=n+1, rn, a.parent, a.child, s=s + '>' + CAST(a.child AS VARCHAR(8000))

FROM SampleData a

JOIN TraverseDown b ON b.child = a.parent

)

SELECT [Parent Bug Child Grandchild GGChild etc]=MAX(s)

FROM TraverseDown

GROUP BY rn;

Note that if you want the children, grandchildren, etc. in separate columns you'd best start with the above (lose the GROUP BY), put that result into a temporary table and then use the n column to determine max level to construct a dynamic cross tab query.

There are many ways to traverse a hierarchy:

The Performance of Traversing a SQL Hierarchy [/url]

On dynamic crosstab queries:

Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs [/url]

And in case you need the background on the basic crosstab query:

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns [/url]

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

mark.worthen SSC-Addicted Points: 477 More actions · Answer 5

Thanks Dwain,

The data is structure such that children point to their parent but parents do not point to children.

Bug 1 has no idea it has any children but bug 2 has a pointer that it is a child of bug 1. A given bug may have multiple children and some of those may have their own children/grandchildren etc.

Going from the bottom up is easy as it is a one-to-one relationship, going down, is another as there may be a one-to-many relationship on the children (or there may be no children), and there may be multiple generations of one-to-many relationships. Still working through and learning as I read the various materials that have been provided.

Thanks.

Dwain Camps SSC Guru Points: 86978 More actions · Answer 6

mark.worthen (2/10/2015)
Thanks Dwain,
The data is structure such that children point to their parent but parents do not point to children.
Bug 1 has no idea it has any children but bug 2 has a pointer that it is a child of bug 1. A given bug may have multiple children and some of those may have their own children/grandchildren etc.
Going from the bottom up is easy as it is a one-to-one relationship, going down, is another as there may be a one-to-many relationship on the children (or there may be no children), and there may be multiple generations of one-to-many relationships. Still working through and learning as I read the various materials that have been provided.
Thanks.

Good on you for saying you'll be reading up.

By all means, post back once you have a solution that works for you.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 7

mark.worthen (2/9/2015)
I'm working with a development bug database where if someone puts in a bug number, they want to get the parent bug. No problem.
SET @cur_parent = (SELECTdupeof_key
FROMqds.dbo.Dimduplicates d
WHEREd.dupe_key = @checkbug
ANDd.end_ts IS NULL)
WHILE @cur_parent IS NOT NULL
BEGIN
SET @parent_bug_id = @cur_parent
SET @cur_parent = (SELECTdupeof_key
FROMqds.dbo.Dimduplicates d
WHEREd.dupe_key = @cur_parent
ANDd.end_ts IS NULL)
END
However, now they want to be able to do the same thing in reverse. Put in a bug and determine how many bugs are duplicates (either directly - children, or indirectly - grandchildren/great-grandchildren, etc.).
The majority of bugs go no more than three generations deep but there are some that go much deeper so I need to have something where I can loop through each potential generation and get the number of bugs associated with it.
Example.
Parent Bug Child Grandchild GGChild
1 2 3 4
1 5
1 28 32
1 40 41 42
1 50
1 60
1 65 70
If I put in bug 1 I should be able to get a count of 13 associated duplicate bugs. Additionally, they want to be able to get a list of the bugs that are associated with that parent.
If there were a set number of levels, self-joining would work to create a generational matrix. However, as there is no limit on how deep the generations can go, I need something dynamic to traverse the tree and generate the list of bugs/count of bugs.
It was easy going up the tree because there was a strict one-to-one relationship between child and parent. However, going back down in a dynamic fashion I'm just not visualizing a solution.
Thanks for any help you can offer.

Is it an absolutely true parent/child relationship where no child can be it's own parent and that each child has one and only one parent?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Dwain Camps SSC Guru Points: 86978 More actions · Answer 8

Jeff Moden (2/10/2015)
Is it an absolutely true parent/child relationship where no child can be it's own parent and that each child has one and only one parent?

Why do I sense an epiphany coming?

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 9

dwain.c (2/10/2015)
Jeff Moden (2/10/2015)
Is it an absolutely true parent/child relationship where no child can be it's own parent and that each child has one and only one parent?
Why do I sense an epiphany coming?

Probably not an epiphany but it seems that the op also wants to know the number of bugs in the "downline" of each of the bugs. Perhaps people shy away from it because of the length of the explanations I included in the first "Hierarchies On Steroids" article but the solution for this is pretty simple... if and only if it's a true adjacency list.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mark.worthen SSC-Addicted Points: 477 More actions · Answer 10

Jeff Moden (2/10/2015)
Is it an absolutely true parent/child relationship where no child can be it's own parent and that each child has one and only one parent?

Yes. No child can be its own parent and each child can only have one parent.

Thanks.

Still working to understand your great articles. A bit over my head - been mostly a BI guy (front-end) and not as involved in the back-end.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 11

mark.worthen (2/11/2015)
Jeff Moden (2/10/2015)
Is it an absolutely true parent/child relationship where no child can be it's own parent and that each child has one and only one parent?
Yes. No child can be its own parent and each child can only have one parent.
Thanks.
Still working to understand your great articles. A bit over my head - been mostly a BI guy (front-end) and not as involved in the back-end.

Understood and no problem. And I love such honesty.

So that I can make some code a bit more customized for your need, can you post the CREATE TABLE statement for the table along with any indexes and constraints you may have? It would also be helpful to know a bit more about the data. For example, behind the scenes, is there a master top level so that all entries make up a single tree or does each "top level" ticket form it's own tree making a "forest" of individual trees? In either case, how is the "top level" identified? Does the "parent" of the "top level" contain a NULL or ???

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mark.worthen SSC-Addicted Points: 477 More actions · Answer 12

Here is the create table:

CREATE TABLE [dbo].[Dimduplicates](

[duplicates_key] [int] IDENTITY(1,1) NOT NULL,

[dupeof_key] [int] NOT NULL,

[dupe_key] [int] NOT NULL,

[start_ts] [datetime] NOT NULL,

[end_ts] [datetime] NULL,

CONSTRAINT [PK_Dimduplicates_key] PRIMARY KEY CLUSTERED

(

[duplicates_key] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY]

The dupeof_key is the parent =BugID and the dupe_key is the child BugID. This table only contains bugs that are considered duplicates of others. As such a parent bug (one that has no parent), does not have its own record but is one that exists as a dupeof_key but never as a dupe_key. A child will have a record where it is the dupe but may also have records where it appears as the dupeof (i.e., it has children).

With about 15 years of data, we have about 130k bugs and 13k that are duplicates. What I have been asked to do is create two functions: one that tells how many descendants exist for a given bug and one that lists the bugs that are descendants of a given bug. Thus, for all functional purposes, that parameter which would be passed is the ultimate parent.

So, to determine the top level, I simply query where the dupeof_key = @BugInQuestion and traverse down from there.

Does that make sense?

Thanks!

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 13

Yes. It makes sense.

So, just to clarify... if we did a search for all rows in this table where the "parent" ID was NOT contained in the "child" column anywhere in the table, that would correctly identify the "top level" of each "dupe" tree?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mark.worthen SSC-Addicted Points: 477 More actions · Answer 14

Jeff Moden (2/11/2015)
Yes. It makes sense.
So, just to clarify... if we did a search for all rows in this table where the "parent" ID was NOT contained in the "child" column anywhere in the table, that would correctly identify the "top level" of each "dupe" tree?

Correct. I created exactly that function yesterday. Select all the dupeof's EXCEPT those found as dupe's.

Thanks.