How to count left side nodes and right side nodes for managing a chain link(binary tree)application

Question

How to count left side nodes and right side nodes for managing a chain link(binary tree)application

Viewing 15 posts - 16 through 30 (of 61 total)

You must be logged in to reply to this topic. Login to reply

mahbub422 SSC-Addicted Points: 461 More actions · Answer 1

Dear Paul,

Your technique is better. So, I am interested to use your technique. Peter also suggested me to follow your method. I already inserted around 1200 data in database using following way:

insert into user_registration_tbl

(id, user_first_name, parentid, Node)

values

(1, 'john', 0, 0),

(2, 'jack', 1, 0),

(3, 'jam', 1, 1),

(4, 'sam', 2, 0),

(5, 'sat', 2, 1),

(6, 'jay', 3, 0),

(7, 'rai', 3, 1),

(8, 'ram', 4, 0),

(9, 'dan', 4, 1)

But you are suggesting to insert data in this way:

INSERT dbo.Tree(node_id, name, hid) VALUES (1, 'Node 1', hierarchyid::GetRoot());

INSERT dbo.Tree(node_id, name, hid) VALUES (2, 'Node 2', hierarchyid::Parse(N'/1/'));

INSERT dbo.Tree(node_id, name, hid) VALUES (3, 'Node 3', hierarchyid::Parse(N'/2/'));

INSERT dbo.Tree(node_id, name, hid) VALUES (4, 'Node 4', hierarchyid::Parse(N'/1/1/'));

INSERT dbo.Tree(node_id, name, hid) VALUES (5, 'Node 5', hierarchyid::Parse(N'/1/2/'));

INSERT dbo.Tree(node_id, name, hid) VALUES (6, 'Node 6', hierarchyid::Parse(N'/2/1/'));

INSERT dbo.Tree(node_id, name, hid) VALUES (7, 'Node 7', hierarchyid::Parse(N'/2/2/'));

INSERT dbo.Tree(node_id, name, hid) VALUES (8, 'Node 8', hierarchyid::Parse(N'/1/1/1/'));

INSERT dbo.Tree(node_id, name, hid) VALUES (9, 'Node 9', hierarchyid::Parse(N'/1/1/2/'));

What should I do now? Do I need to re-enter all the records in your way. I have used parent id and node (0 for left and 1 for right). you are using hierarchyid::Parse(N'/1/1/2/'). I am not very sure how should I enter this part. I am waiting for your suggestion. Thanks.

- Mahbub

matzer SSCrazy Eights Points: 9727 More actions · Answer 2

Joe Celko (5/12/2010)
Your approach to Binary trees is wrong. Buy a copy of TREES & HIERARCHIES IN SQL and read it.

I suppose your are the author of it, otherwise you could have recommend to go to a library. Looks like SPAM to me.

Paul White SSC Guru Points: 150467 More actions · Answer 3

Joe Celko (5/12/2010)
Paul recognized that the problem is the DDL, but he still had his head stuck in proprietary, non-relational code. He did not go far enough.

No, my intention was to show how it could be done with hierarchyid.

That is all.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Paul White SSC Guru Points: 150467 More actions · Answer 4

mahbub422 (5/12/2010)
Dear Paul. Your technique is better. So, I am interested to use your technique. Peter also suggested me to follow your method.

It is an alternative - whether it is 'better' or not depends on one's perspective.

Peter has very kindly provided you with a great deal of support and some concrete code examples - I'm afraid I do not have the free time right now to do the same 🙁

I already inserted around 1200 data in database...Do I need to re-enter all the records in your way?

If you choose to use that method - yes you would need to convert the existing data.

I can recommend Itzik Ben-Gan's book "Inside Microsoft SQL Server 2008 T-SQL Querying", though as always search engines are a great source of ideas and examples too.

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

matzer SSCrazy Eights Points: 9727 More actions · Answer 5

Joe Celko (5/12/2010)
To answer your original question:
SELECT SUM(CASE WHEN binary_place_nbr % 2 = 0
THEN 1 ELSE 0 END AS rgt_cnt,
SUM(CASE WHEN binary_place_nbr % 2 = 1
THEN 1 ELSE 0 END AS lft_cnt,
FROM User_Registrations;
Bad DDL leads to insanely complicated DML. Design first, code later.

The answer is wrong. This piece of code just counts the number of left and right children in the entire tree (where the root is counted as a right child). The original question was to count the number of nodes in the left and right branch of a given node (not necessarily the root) in the tree.

Read first, design later.

Lowell SSC Guru Points: 323518 More actions · Answer 6

there was missing commas and closing parentesis; Here is a syntactically correct verison of the above example, but I did not test ti to see if it worked:

declare @my_root_nbr int

set @my_root_nbr=4

;WITH Subtree(node_nbr)

AS

(SELECT

T1.node_nbr

FROM Binary_Tree AS T1

WHERE T1.node_nbr = @my_root_nbr

UNION

SELECT

T2.node_nbr

FROM Binary_Tree AS T2, Subtree AS S1

WHERE T2.node_nbr IN ((2*S1.node_nbr), (2*S1.node_nbr +1))

)

SELECT

@my_root_nbr,

SUM(CASE

WHEN node_nbr % 2 = 0

THEN 1

ELSE 0

END) AS rgt_cnt,

SUM(CASE

WHEN node_nbr % 2 = 1

THEN 1

ELSE 0

END) AS lft_cnt

FROM Subtree AS S2

WHERE S2.node_nbr <> @my_root_nbr

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

matzer SSCrazy Eights Points: 9727 More actions · Answer 7

matzer

SSCrazy Eights

Points: 9727

May 13, 2010 at 8:09 am

#1166892

Half answer accidentilly posted. Removed

matzer SSCrazy Eights Points: 9727 More actions · Answer 8

Joe Celko (5/13/2010)
To generalize this, we can use a recursive CTE to get the subtree under a given node and use the previous query on it (untested):
WITH Subtree(node_nbr)
AS
(SELECT T1.node_nbr
FROM Binary_Tree AS T1
WHERE T1.node_nbr = @my_root_nbr
UNION
SELECT T2.node_nbr
FROM Binary_Tree AS T2, Subtree AS S1
WHERE T2.node_nbr IN ((2*S1.node_nbr), (2*S1.node_nbr +1))
SELECT @my_root_nbr
SUM(CASE WHEN node_nbr % 2 = 0
THEN 1 ELSE 0 END AS rgt_cnt,
SUM(CASE WHEN node_nbr % 2 = 1
THEN 1 ELSE 0 END AS lft_cnt
FROM Subtree AS S2
WHERE S2.node_nbr > 1;
If you want to exclude the subtree root from the counts, then change the WHERE clause in the base query to “WHERE S2.node_nbr <> @my_root_nbr” instead.
What I am trying to remember is whether there is a closed form for subtrees that avoids recursion. I thik there was, but I cannot find it in my notes.

You entered this thread with some big words about wrong design/wrong solution. Now you come up yourself with a recursive CTE solution which still produces the wrong result! Compare the following solutions.

Test setup

if object_id('dbo.Log2', 'IF') is not null

drop function dbo.Log2

go

create function dbo.Log2(@n int)

returns table

as return

(

select floor(log(@n) / log(2)) n

)

go

if object_id('dbo.User_registrations', 'U') is not null

drop table dbo.User_registrations

go

create table dbo.User_registrations

(

binary_place_nbr int not null,

user_first_name varchar(100) not null,

primary key(binary_place_nbr)

)

go

insert into dbo.User_registrations(binary_place_nbr, user_first_name)

values

(1,'john'),

(2,'jack'),

(3,'jam'),

(4,'sam'),

(5,'sat'),

(6,'jay'),

(7,'rai'),

(8,'ram'),

(9,'dan')

go

Joe's solution (after fixing all the errors and some proper aligning)

declare @my_root_nbr int = 1

;WITH Subtree(node_nbr)

AS

(

SELECT T1.binary_place_nbr FROM dbo.User_registrations AS T1

WHERE T1.binary_place_nbr = @my_root_nbr

UNION ALL

SELECT T2.binary_place_nbr

FROM dbo.User_Registrations AS T2, Subtree AS S1

WHERE T2.binary_place_nbr IN ((2*S1.node_nbr), (2*S1.node_nbr +1))

)

SELECT

@my_root_nbr,

SUM(CASE WHEN node_nbr % 2 = 0 THEN 1 ELSE 0 END) AS rgt_cnt,

SUM(CASE WHEN node_nbr % 2 = 1 THEN 1 ELSE 0 END) AS lft_cnt

FROM

Subtree AS S2

WHERE

S2.node_nbr > @my_root_nbr;

As you said yourself, with some arithmetic it easy to address the required nodes using the heapsort method to label nodes. No need for a recursive CTE solution. Disadvantage of the method is that there is a restriction on the depth of the tree regardless of the number of nodes in it. When using INT the maximum depth is 30.

select

binary_place_nbr, user_first_name,

(

select

count(*)

from

dbo.User_Registrations

cross apply

dbo.Log2(binary_place_nbr) l1

cross apply

dbo.Log2(2 * u.binary_place_nbr) l2

where

binary_place_nbr >= 2 * u.binary_place_nbr

and 2 * u.binary_place_nbr = binary_place_nbr / power(2, l1.n - l2.n)

) LeftChildren,

(

select

count(*)

from

dbo.User_Registrations

cross apply

dbo.Log2(binary_place_nbr) l1

cross apply

dbo.Log2(2 * u.binary_place_nbr + 1) l2

where

binary_place_nbr >= 2 * u.binary_place_nbr + 1

and 2 * u.binary_place_nbr + 1 = binary_place_nbr / power(2, l1.n - l2.n)

) RightChildren

from

dbo.User_Registrations u

Peter

matzer SSCrazy Eights Points: 9727 More actions · Answer 9

Peter Brinkhaus (5/13/2010)
select binary_place_nbr, user_first_name, ( select count(*) from dbo.User_Registrations cross apply dbo.Log2(binary_place_nbr) l1 cross apply dbo.Log2(2 * u.binary_place_nbr) l2 where binary_place_nbr >= 2 * u.binary_place_nbr and 2 * u.binary_place_nbr = binary_place_nbr / power(2, l1.n - l2.n) ) LeftChildren, ( select count(*) from dbo.User_Registrations cross apply dbo.Log2(binary_place_nbr) l1 cross apply dbo.Log2(2 * u.binary_place_nbr + 1) l2 where binary_place_nbr >= 2 * u.binary_place_nbr + 1 and 2 * u.binary_place_nbr + 1 = binary_place_nbr / power(2, l1.n - l2.n) ) RightChildren from dbo.User_Registrations u
Peter

Before anybody starts complaining about RBAR, I realized that. I just wanted to show the outcome for all nodes in the tree, not just for one as the OP asked for. Basically, the two subqueries are the solution to problem.

Below a more optimized version to show all the number of nodes in the left and right branch of any node. However, still a triangular join.

select

u.binary_place_nbr, u.user_first_name,

sum(coalesce(x.InLeftBranch, 0)) LeftChildren,

sum(coalesce(x.InRightBranch, 0)) RightChildren

from

dbo.User_Registrations u

outer apply

(

select

case when 2 * u.binary_place_nbr = binary_place_nbr / power(2, l1.n - l2.n) then 1 else 0 end InLeftBranch,

case when 2 * u.binary_place_nbr + 1 = binary_place_nbr / power(2, l1.n - l2.n) then 1 else 0 end InRightBranch

from

dbo.User_Registrations

cross apply

dbo.Log2(binary_place_nbr) l1

cross apply

dbo.Log2(2 * u.binary_place_nbr) l2

where

binary_place_nbr >= 2 * u.binary_place_nbr

and u.binary_place_nbr = binary_place_nbr / power(2, l1.n - l2.n + 1)

) x

group by

u.binary_place_nbr, u.user_first_name

Edit: moved coalesce inside sum to get rid of a warning

Paul White SSC Guru Points: 150467 More actions · Answer 10

Peter Brinkhaus (5/13/2010)
Below a more optimized version to show all the number of nodes in the left and right branch of any node. However, still a triangular join.

Very cool, Peter 😎

It's not easy to avoid a triangular join here, given the nature of the problem. I'd be interested to hear more from mahbub422 about the real problem that is being solved here. It's a very interesting technical issue, but I can't help wondering if we would not offer different design advice if we knew more details...?

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Paul White SSC Guru Points: 150467 More actions · Answer 11

For completeness, here's one way to return the child node counts and list of names from those nodes using hierarchyid (same set-up code I posted earlier):

SELECT SQ1.node_id,

SQ1.name,

MAX(CASE WHEN SQ1.child_position = 'L' THEN SQ1.child_count ELSE 0 END) AS left_count,

MAX(CASE WHEN SQ1.child_position = 'R' THEN SQ1.child_count ELSE 0 END) AS right_count,

MAX(CASE WHEN SQ1.child_position = 'L' THEN SQ1.child_list ELSE SPACE(0) END) AS left_list,

MAX(CASE WHEN SQ1.child_position = 'R' THEN SQ1.child_list ELSE SPACE(0) END) AS right_list

FROM (

SELECT T1.node_id,

T1.name,

child_position = CASE ROW_NUMBER() OVER (PARTITION BY T1.hid ORDER BY T2.hid) WHEN 1 THEN 'L' ELSE 'R' END,

child_count = (SELECT COUNT(*) FROM dbo.Tree T3 WHERE T3.hid.IsDescendantOf(T2.hid) = 1),

child_list = STUFF((SELECT ',' + T3.name FROM dbo.Tree T3 WHERE T3.hid.IsDescendantOf(T2.hid) = 1 ORDER BY T3.hid FOR XML PATH(''), TYPE).value('./text()[1]', 'VARCHAR(8000)'), 1, 1, SPACE(0))

FROM dbo.Tree T1

LEFT

JOIN dbo.Tree T2

ON T2.hid.GetAncestor(1) = T1.hid

) SQ1

GROUP BY

SQ1.node_id,

SQ1.name;

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

mahbub422 SSC-Addicted Points: 461 More actions · Answer 12

Dear Peter and Paul,

Thank you very much for your great support. Recently I got a client who's business is MLM (Multilevel Marketing). A person can join to the company and can only refer two persons in his left and right side. In every week that person get benefit on the basis of total person joined in his left and right side. My client demanded several reports. Here I have attached two. First report is the weekly statement and second report is the list of left and right side members of a person. I was in deep trouble of finding out the solution. Finally I got first solution from peter. Then Paul also gave the solution. I have used Peters procedure. Initially it took 49 seconds to process data and now using peter's solution it takes only 3 seconds.

I really thankful to both of you. And you will be happy to know our client is really happy to see the report.

Thanks.

Mahbub

Paul White SSC Guru Points: 150467 More actions · Answer 13

Mahbub,

That's really good news - and full credit to Peter for the time he spent on this too.

It's great to see the reports, but there is some pretty sensitive personal information there. I would ask you to obscure those details or remove the attachments if you wouldn't mind...?

Thanks

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1004313 More actions · Answer 14

Peter Brinkhaus (3/20/2010)
Unfortunately we don't have a realistic testset (and I'm too lazy now to generate one), but I'm almost certain that Paul's solution outperforms my solution by a factor of 'a large number'. Well done, Paul

Don't be too sure... First, as good as it is (well done, Paul), Paul's code didn't convert the Adjacency List to the Hierarchical Path that HierarchyID needs. It was hard coded, instead.

Second, the HierarchicalID method is a lot more difficult to maintain than the almost unusable Adjacency List.

Your code, on the other hand, has the best of both worlds.

I'll also add that there have been reports on the internet of some performance problems with HierarchyID. I can't substantiate nor refute any of those reports because I haven't installed 2k8 on my nice new portable, yet, so do some testing.

The following won't build a pure binary tree (but could be easily modified to do so), but it WILL build a million row random hierarchy in under 22 seconds including the creation of some necessary indexes...

--===== Do this in a nice safe place that everyone has.

USE tempdb

;

drop procedure dbo.BuildTestHierarchy

;

GO

CREATE PROCEDURE dbo.BuildTestHierarchy

/******************************************************************************

Create a randomized "clean" hierarchy. Each EmployeeID (except the first one,

of course) is assigned a random ManagerID number which is always less than the

current EmployeeID. This code runs nasty fast and is great for testing

hierarchical processing code.

Usage: (both examples build a million row Adjacency List Hierarchy)

EXEC dbo.BuildTestHierarchy 1000000

EXEC dbo.BuildTestHierarchy @pRowsToBuild = 1000000

Revision History:

Rev 00 - 28 Apr 2010 - Jeff Moden - Initial creation and test.

Rev 01 - 15 May 2010 - Jeff Moden - Abort if current DB isn't "tempdb" to

protect users that want to "play".

******************************************************************************/

--===== Declare the I/O parameters

@pRowsToBuild INT

AS

--===== Make sure that we're in a safe place to run this...

IF DB_NAME() <> N'tempdb'

BEGIN

RAISERROR('Current DB is NOT tempdb. Run aborted.',11,1);

RETURN;

END

;

--===== Conditionaly drop the test table so we can do reruns more easily

IF OBJECT_ID('TempDB.dbo.Employee','U') IS NOT NULL

DROP TABLE TempDB.dbo.Employee

;

--===== Build the test table and populate it on the fly.

-- Everything except ManagerID is populated here.

SELECT TOP (@pRowsToBuild)

ISNULL(ROW_NUMBER() OVER (ORDER BY (SELECT 1)),0) AS EmployeeID,

CAST(0 AS INT) AS ManagerID,

CAST(NEWID() AS VARCHAR(36)) AS EmployeeName,

(ABS(CHECKSUM(NEWID()))%12+1)*1000 AS Sales

INTO TempDB.dbo.Employee

FROM Master.sys.All_Columns ac1

CROSS JOIN Master.sys.All_Columns ac2

;

--===== Update the test table with ManagerID's. The ManagerID is some random

-- value which is always less than the current EmployeeID to keep the

-- hierarchy "clean" and free from "loop backs".

UPDATE TempDB.dbo.Employee

SET ManagerID = CASE

WHEN EmployeeID > 1

THEN ABS(CHECKSUM(NEWID())) % (EmployeeID-1) +1

ELSE NULL

END

;

--===== Add some indexes that most folks would like have on such a table

ALTER TABLE TempDB.dbo.Employee

ADD CONSTRAINT PK_Employee

PRIMARY KEY CLUSTERED (EmployeeID)

;

CREATE INDEX IX_Employee_ManagerID

ON TempDB.dbo.Employee (ManagerID)

;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mahbub422 SSC-Addicted Points: 461 More actions · Answer 15

mahbub422

SSC-Addicted

Points: 461

May 15, 2010 at 12:13 pm

#1167949

Thanks Paul for your suggestion. I have removed.