How to count left side nodes and right side nodes for managing a chain link(binary tree)application

  • Dear Paul,

    Your technique is better. So, I am interested to use your technique. Peter also suggested me to follow your method. I already inserted around 1200 data in database using following way:

    insert into user_registration_tbl

    (id, user_first_name, parentid, Node)

    values

    (1, 'john', 0, 0),

    (2, 'jack', 1, 0),

    (3, 'jam', 1, 1),

    (4, 'sam', 2, 0),

    (5, 'sat', 2, 1),

    (6, 'jay', 3, 0),

    (7, 'rai', 3, 1),

    (8, 'ram', 4, 0),

    (9, 'dan', 4, 1)

    But you are suggesting to insert data in this way:

    INSERT dbo.Tree(node_id, name, hid) VALUES (1, 'Node 1', hierarchyid::GetRoot());

    INSERT dbo.Tree(node_id, name, hid) VALUES (2, 'Node 2', hierarchyid::Parse(N'/1/'));

    INSERT dbo.Tree(node_id, name, hid) VALUES (3, 'Node 3', hierarchyid::Parse(N'/2/'));

    INSERT dbo.Tree(node_id, name, hid) VALUES (4, 'Node 4', hierarchyid::Parse(N'/1/1/'));

    INSERT dbo.Tree(node_id, name, hid) VALUES (5, 'Node 5', hierarchyid::Parse(N'/1/2/'));

    INSERT dbo.Tree(node_id, name, hid) VALUES (6, 'Node 6', hierarchyid::Parse(N'/2/1/'));

    INSERT dbo.Tree(node_id, name, hid) VALUES (7, 'Node 7', hierarchyid::Parse(N'/2/2/'));

    INSERT dbo.Tree(node_id, name, hid) VALUES (8, 'Node 8', hierarchyid::Parse(N'/1/1/1/'));

    INSERT dbo.Tree(node_id, name, hid) VALUES (9, 'Node 9', hierarchyid::Parse(N'/1/1/2/'));

    What should I do now? Do I need to re-enter all the records in your way. I have used parent id and node (0 for left and 1 for right). you are using hierarchyid::Parse(N'/1/1/2/'). I am not very sure how should I enter this part. I am waiting for your suggestion. Thanks.

    - Mahbub

  • Joe Celko (5/12/2010)


    Your approach to Binary trees is wrong. Buy a copy of TREES & HIERARCHIES IN SQL and read it.

    I suppose your are the author of it, otherwise you could have recommend to go to a library. Looks like SPAM to me.

  • Joe Celko (5/12/2010)


    Paul recognized that the problem is the DDL, but he still had his head stuck in proprietary, non-relational code. He did not go far enough.

    No, my intention was to show how it could be done with hierarchyid.

    That is all.

  • mahbub422 (5/12/2010)


    Dear Paul. Your technique is better. So, I am interested to use your technique. Peter also suggested me to follow your method.

    It is an alternative - whether it is 'better' or not depends on one's perspective.

    Peter has very kindly provided you with a great deal of support and some concrete code examples - I'm afraid I do not have the free time right now to do the same 🙁

    I already inserted around 1200 data in database...Do I need to re-enter all the records in your way?

    If you choose to use that method - yes you would need to convert the existing data.

    I can recommend Itzik Ben-Gan's book "Inside Microsoft SQL Server 2008 T-SQL Querying", though as always search engines are a great source of ideas and examples too.

    Paul

  • Joe Celko (5/12/2010)


    To answer your original question:

    SELECT SUM(CASE WHEN binary_place_nbr % 2 = 0

    THEN 1 ELSE 0 END AS rgt_cnt,

    SUM(CASE WHEN binary_place_nbr % 2 = 1

    THEN 1 ELSE 0 END AS lft_cnt,

    FROM User_Registrations;

    Bad DDL leads to insanely complicated DML. Design first, code later.

    The answer is wrong. This piece of code just counts the number of left and right children in the entire tree (where the root is counted as a right child). The original question was to count the number of nodes in the left and right branch of a given node (not necessarily the root) in the tree.

    Read first, design later.

  • there was missing commas and closing parentesis; Here is a syntactically correct verison of the above example, but I did not test ti to see if it worked:

    declare @my_root_nbr int

    set @my_root_nbr=4

    ;WITH Subtree(node_nbr)

    AS

    (SELECT

    T1.node_nbr

    FROM Binary_Tree AS T1

    WHERE T1.node_nbr = @my_root_nbr

    UNION

    SELECT

    T2.node_nbr

    FROM Binary_Tree AS T2, Subtree AS S1

    WHERE T2.node_nbr IN ((2*S1.node_nbr), (2*S1.node_nbr +1))

    )

    SELECT

    @my_root_nbr,

    SUM(CASE

    WHEN node_nbr % 2 = 0

    THEN 1

    ELSE 0

    END) AS rgt_cnt,

    SUM(CASE

    WHEN node_nbr % 2 = 1

    THEN 1

    ELSE 0

    END) AS lft_cnt

    FROM Subtree AS S2

    WHERE S2.node_nbr <> @my_root_nbr

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Half answer accidentilly posted. Removed

  • Joe Celko (5/13/2010)


    To generalize this, we can use a recursive CTE to get the subtree under a given node and use the previous query on it (untested):

    WITH Subtree(node_nbr)

    AS

    (SELECT T1.node_nbr

    FROM Binary_Tree AS T1

    WHERE T1.node_nbr = @my_root_nbr

    UNION

    SELECT T2.node_nbr

    FROM Binary_Tree AS T2, Subtree AS S1

    WHERE T2.node_nbr IN ((2*S1.node_nbr), (2*S1.node_nbr +1))

    SELECT @my_root_nbr

    SUM(CASE WHEN node_nbr % 2 = 0

    THEN 1 ELSE 0 END AS rgt_cnt,

    SUM(CASE WHEN node_nbr % 2 = 1

    THEN 1 ELSE 0 END AS lft_cnt

    FROM Subtree AS S2

    WHERE S2.node_nbr > 1;

    If you want to exclude the subtree root from the counts, then change the WHERE clause in the base query to “WHERE S2.node_nbr <> @my_root_nbr” instead.

    What I am trying to remember is whether there is a closed form for subtrees that avoids recursion. I thik there was, but I cannot find it in my notes.

    You entered this thread with some big words about wrong design/wrong solution. Now you come up yourself with a recursive CTE solution which still produces the wrong result! Compare the following solutions.

    Test setup

    if object_id('dbo.Log2', 'IF') is not null

    drop function dbo.Log2

    go

    create function dbo.Log2(@n int)

    returns table

    as return

    (

    select floor(log(@n) / log(2)) n

    )

    go

    if object_id('dbo.User_registrations', 'U') is not null

    drop table dbo.User_registrations

    go

    create table dbo.User_registrations

    (

    binary_place_nbr int not null,

    user_first_name varchar(100) not null,

    primary key(binary_place_nbr)

    )

    go

    insert into dbo.User_registrations(binary_place_nbr, user_first_name)

    values

    (1,'john'),

    (2,'jack'),

    (3,'jam'),

    (4,'sam'),

    (5,'sat'),

    (6,'jay'),

    (7,'rai'),

    (8,'ram'),

    (9,'dan')

    go

    Joe's solution (after fixing all the errors and some proper aligning)

    declare @my_root_nbr int = 1

    ;WITH Subtree(node_nbr)

    AS

    (

    SELECT T1.binary_place_nbr FROM dbo.User_registrations AS T1

    WHERE T1.binary_place_nbr = @my_root_nbr

    UNION ALL

    SELECT T2.binary_place_nbr

    FROM dbo.User_Registrations AS T2, Subtree AS S1

    WHERE T2.binary_place_nbr IN ((2*S1.node_nbr), (2*S1.node_nbr +1))

    )

    SELECT

    @my_root_nbr,

    SUM(CASE WHEN node_nbr % 2 = 0 THEN 1 ELSE 0 END) AS rgt_cnt,

    SUM(CASE WHEN node_nbr % 2 = 1 THEN 1 ELSE 0 END) AS lft_cnt

    FROM

    Subtree AS S2

    WHERE

    S2.node_nbr > @my_root_nbr;

    As you said yourself, with some arithmetic it easy to address the required nodes using the heapsort method to label nodes. No need for a recursive CTE solution. Disadvantage of the method is that there is a restriction on the depth of the tree regardless of the number of nodes in it. When using INT the maximum depth is 30.

    select

    binary_place_nbr, user_first_name,

    (

    select

    count(*)

    from

    dbo.User_Registrations

    cross apply

    dbo.Log2(binary_place_nbr) l1

    cross apply

    dbo.Log2(2 * u.binary_place_nbr) l2

    where

    binary_place_nbr >= 2 * u.binary_place_nbr

    and 2 * u.binary_place_nbr = binary_place_nbr / power(2, l1.n - l2.n)

    ) LeftChildren,

    (

    select

    count(*)

    from

    dbo.User_Registrations

    cross apply

    dbo.Log2(binary_place_nbr) l1

    cross apply

    dbo.Log2(2 * u.binary_place_nbr + 1) l2

    where

    binary_place_nbr >= 2 * u.binary_place_nbr + 1

    and 2 * u.binary_place_nbr + 1 = binary_place_nbr / power(2, l1.n - l2.n)

    ) RightChildren

    from

    dbo.User_Registrations u

    Peter

  • Peter Brinkhaus (5/13/2010)


    select

    binary_place_nbr, user_first_name,

    (

    select

    count(*)

    from

    dbo.User_Registrations

    cross apply

    dbo.Log2(binary_place_nbr) l1

    cross apply

    dbo.Log2(2 * u.binary_place_nbr) l2

    where

    binary_place_nbr >= 2 * u.binary_place_nbr

    and 2 * u.binary_place_nbr = binary_place_nbr / power(2, l1.n - l2.n)

    ) LeftChildren,

    (

    select

    count(*)

    from

    dbo.User_Registrations

    cross apply

    dbo.Log2(binary_place_nbr) l1

    cross apply

    dbo.Log2(2 * u.binary_place_nbr + 1) l2

    where

    binary_place_nbr >= 2 * u.binary_place_nbr + 1

    and 2 * u.binary_place_nbr + 1 = binary_place_nbr / power(2, l1.n - l2.n)

    ) RightChildren

    from

    dbo.User_Registrations u

    Peter

    Before anybody starts complaining about RBAR, I realized that. I just wanted to show the outcome for all nodes in the tree, not just for one as the OP asked for. Basically, the two subqueries are the solution to problem.

    Below a more optimized version to show all the number of nodes in the left and right branch of any node. However, still a triangular join.

    select

    u.binary_place_nbr, u.user_first_name,

    sum(coalesce(x.InLeftBranch, 0)) LeftChildren,

    sum(coalesce(x.InRightBranch, 0)) RightChildren

    from

    dbo.User_Registrations u

    outer apply

    (

    select

    case when 2 * u.binary_place_nbr = binary_place_nbr / power(2, l1.n - l2.n) then 1 else 0 end InLeftBranch,

    case when 2 * u.binary_place_nbr + 1 = binary_place_nbr / power(2, l1.n - l2.n) then 1 else 0 end InRightBranch

    from

    dbo.User_Registrations

    cross apply

    dbo.Log2(binary_place_nbr) l1

    cross apply

    dbo.Log2(2 * u.binary_place_nbr) l2

    where

    binary_place_nbr >= 2 * u.binary_place_nbr

    and u.binary_place_nbr = binary_place_nbr / power(2, l1.n - l2.n + 1)

    ) x

    group by

    u.binary_place_nbr, u.user_first_name

    Edit: moved coalesce inside sum to get rid of a warning

  • Peter Brinkhaus (5/13/2010)


    Below a more optimized version to show all the number of nodes in the left and right branch of any node. However, still a triangular join.

    Very cool, Peter 😎

    It's not easy to avoid a triangular join here, given the nature of the problem. I'd be interested to hear more from mahbub422 about the real problem that is being solved here. It's a very interesting technical issue, but I can't help wondering if we would not offer different design advice if we knew more details...?

    Paul

  • For completeness, here's one way to return the child node counts and list of names from those nodes using hierarchyid (same set-up code I posted earlier):

    SELECT SQ1.node_id,

    SQ1.name,

    MAX(CASE WHEN SQ1.child_position = 'L' THEN SQ1.child_count ELSE 0 END) AS left_count,

    MAX(CASE WHEN SQ1.child_position = 'R' THEN SQ1.child_count ELSE 0 END) AS right_count,

    MAX(CASE WHEN SQ1.child_position = 'L' THEN SQ1.child_list ELSE SPACE(0) END) AS left_list,

    MAX(CASE WHEN SQ1.child_position = 'R' THEN SQ1.child_list ELSE SPACE(0) END) AS right_list

    FROM (

    SELECT T1.node_id,

    T1.name,

    child_position = CASE ROW_NUMBER() OVER (PARTITION BY T1.hid ORDER BY T2.hid) WHEN 1 THEN 'L' ELSE 'R' END,

    child_count = (SELECT COUNT(*) FROM dbo.Tree T3 WHERE T3.hid.IsDescendantOf(T2.hid) = 1),

    child_list = STUFF((SELECT ',' + T3.name FROM dbo.Tree T3 WHERE T3.hid.IsDescendantOf(T2.hid) = 1 ORDER BY T3.hid FOR XML PATH(''), TYPE).value('./text()[1]', 'VARCHAR(8000)'), 1, 1, SPACE(0))

    FROM dbo.Tree T1

    LEFT

    JOIN dbo.Tree T2

    ON T2.hid.GetAncestor(1) = T1.hid

    ) SQ1

    GROUP BY

    SQ1.node_id,

    SQ1.name;

  • Dear Peter and Paul,

    Thank you very much for your great support. Recently I got a client who's business is MLM (Multilevel Marketing). A person can join to the company and can only refer two persons in his left and right side. In every week that person get benefit on the basis of total person joined in his left and right side. My client demanded several reports. Here I have attached two. First report is the weekly statement and second report is the list of left and right side members of a person. I was in deep trouble of finding out the solution. Finally I got first solution from peter. Then Paul also gave the solution. I have used Peters procedure. Initially it took 49 seconds to process data and now using peter's solution it takes only 3 seconds.

    I really thankful to both of you. And you will be happy to know our client is really happy to see the report.

    Thanks.

    Mahbub

  • Mahbub,

    That's really good news - and full credit to Peter for the time he spent on this too.

    It's great to see the reports, but there is some pretty sensitive personal information there. I would ask you to obscure those details or remove the attachments if you wouldn't mind...?

    Thanks

    Paul

  • Peter Brinkhaus (3/20/2010)


    Unfortunately we don't have a realistic testset (and I'm too lazy now to generate one), but I'm almost certain that Paul's solution outperforms my solution by a factor of 'a large number'. Well done, Paul

    Don't be too sure... First, as good as it is (well done, Paul), Paul's code didn't convert the Adjacency List to the Hierarchical Path that HierarchyID needs. It was hard coded, instead.

    Second, the HierarchicalID method is a lot more difficult to maintain than the almost unusable Adjacency List.

    Your code, on the other hand, has the best of both worlds.

    I'll also add that there have been reports on the internet of some performance problems with HierarchyID. I can't substantiate nor refute any of those reports because I haven't installed 2k8 on my nice new portable, yet, so do some testing.

    The following won't build a pure binary tree (but could be easily modified to do so), but it WILL build a million row random hierarchy in under 22 seconds including the creation of some necessary indexes...

    --===== Do this in a nice safe place that everyone has.

    USE tempdb

    ;

    drop procedure dbo.BuildTestHierarchy

    ;

    GO

    CREATE PROCEDURE dbo.BuildTestHierarchy

    /******************************************************************************

    Create a randomized "clean" hierarchy. Each EmployeeID (except the first one,

    of course) is assigned a random ManagerID number which is always less than the

    current EmployeeID. This code runs nasty fast and is great for testing

    hierarchical processing code.

    Usage: (both examples build a million row Adjacency List Hierarchy)

    EXEC dbo.BuildTestHierarchy 1000000

    EXEC dbo.BuildTestHierarchy @pRowsToBuild = 1000000

    Revision History:

    Rev 00 - 28 Apr 2010 - Jeff Moden - Initial creation and test.

    Rev 01 - 15 May 2010 - Jeff Moden - Abort if current DB isn't "tempdb" to

    protect users that want to "play".

    ******************************************************************************/

    --===== Declare the I/O parameters

    @pRowsToBuild INT

    AS

    --===== Make sure that we're in a safe place to run this...

    IF DB_NAME() <> N'tempdb'

    BEGIN

    RAISERROR('Current DB is NOT tempdb. Run aborted.',11,1);

    RETURN;

    END

    ;

    --===== Conditionaly drop the test table so we can do reruns more easily

    IF OBJECT_ID('TempDB.dbo.Employee','U') IS NOT NULL

    DROP TABLE TempDB.dbo.Employee

    ;

    --===== Build the test table and populate it on the fly.

    -- Everything except ManagerID is populated here.

    SELECT TOP (@pRowsToBuild)

    ISNULL(ROW_NUMBER() OVER (ORDER BY (SELECT 1)),0) AS EmployeeID,

    CAST(0 AS INT) AS ManagerID,

    CAST(NEWID() AS VARCHAR(36)) AS EmployeeName,

    (ABS(CHECKSUM(NEWID()))%12+1)*1000 AS Sales

    INTO TempDB.dbo.Employee

    FROM Master.sys.All_Columns ac1

    CROSS JOIN Master.sys.All_Columns ac2

    ;

    --===== Update the test table with ManagerID's. The ManagerID is some random

    -- value which is always less than the current EmployeeID to keep the

    -- hierarchy "clean" and free from "loop backs".

    UPDATE TempDB.dbo.Employee

    SET ManagerID = CASE

    WHEN EmployeeID > 1

    THEN ABS(CHECKSUM(NEWID())) % (EmployeeID-1) +1

    ELSE NULL

    END

    ;

    --===== Add some indexes that most folks would like have on such a table

    ALTER TABLE TempDB.dbo.Employee

    ADD CONSTRAINT PK_Employee

    PRIMARY KEY CLUSTERED (EmployeeID)

    ;

    CREATE INDEX IX_Employee_ManagerID

    ON TempDB.dbo.Employee (ManagerID)

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Paul for your suggestion. I have removed.

Viewing 15 posts - 16 through 30 (of 61 total)

You must be logged in to reply to this topic. Login to reply