Hierarchies on Steroids #1: Convert an Adjacency List to Nested Sets

Question

Hierarchies on Steroids #1: Convert an Adjacency List to Nested Sets

Viewing 15 posts - 76 through 90 (of 99 total)

You must be logged in to reply to this topic. Login to reply

Y.B. SSChampion Points: 11649 More actions · Answer 1

Just seeing this post pop up again reminded me I needed to thank you Jeff (again) for this. I've used this several times now at 3 different employers. Made me look like a rockstar each and everytime. The most recent use case was to determine the lowest common manager between 2 people. I believe they wanted to use this for employee transfer approval.

This is how I did it:
DECLARE @left1 INT, @left2 INT, @right1 INT, @right2 INT

SELECT @left1 = LeftBower, @right1 = RightBower FROM Forms.Hierarchy WHERE Position = @position1 SELECT @left2 = LeftBower, @right2 = RightBower FROM Forms.Hierarchy WHERE Position = @position2

SELECT TOP 1 h.Position, p.DESCR

FROM Forms.Hierarchy h JOIN Forms.Position p ON p.Position = h.Position

WHERE h.LeftBower < @left1 AND h.LeftBower < @left2 AND h.RightBower > @right1 AND h.RightBower > @right2

ORDER BY h.NodeNumber DESC

SELECT quote FROM brain WHERE original = 1
0 rows returned

Jeff Moden SSC Guru Points: 1004376 More actions · Answer 2

s.duvernay 71743 - Thursday, November 29, 2018 11:36 AM
Hi Jeff,
Thank you very much for this brilliant article which brings me some light on how to deal with nodes and edges. However in my situation 2 nodes can be merged into a single node. Then this latter can be split and each new branche will have its own "life". Of course one of the child branch can be merge again with another one.
In my case, a typical scenario will be some goods which are loaded on board of vessel, then those goods arrived at destination and are stored in different warehouses. Finally a customer come and take different type of goods from different warehouses and put them all in a truck.
How will you manage that with Nested Sets?
Thank you very much.
Sylvain

I guess my question would be, why do you think you need a hierarchical structure for this?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004376 More actions · Answer 3

Y.B. - Thursday, November 29, 2018 11:44 AM
Just seeing this post pop up again reminded me I needed to thank you Jeff (again) for this. I've used this several times now at 3 different employers. Made me look like a rockstar each and everytime. The most recent use case was to determine the lowest common manager between 2 people. I believe they wanted to use this for employee transfer approval.
This is how I did it:
DECLARE @left1 INT, @left2 INT, @right1 INT, @right2 INT
SELECT @left1 = LeftBower, @right1 = RightBower FROM Forms.Hierarchy WHERE Position = @position1 SELECT @left2 = LeftBower, @right2 = RightBower FROM Forms.Hierarchy WHERE Position = @position2
SELECT TOP 1 h.Position, p.DESCR
FROM Forms.Hierarchy h JOIN Forms.Position p ON p.Position = h.Position
WHERE h.LeftBower < @left1 AND h.LeftBower < @left2 AND h.RightBower > @right1 AND h.RightBower > @right2
ORDER BY h.NodeNumber DESC

Awesome! Thanks for the incredible feedback.

Just to be sure, though, and like I said in the article, I didn't invent "Nested Sets". A fellow by the name of Michael J. Kamfonas invented them (link to a copy of his original work below) and was later made popular by Joe Celko.
http://kamfonas.com/wordpress/wp-content/uploads/2007/08/recursive-hierarchies-no-columns.htm

I also didn't come up with the formulas I used to calculate the Left and Right Bowers. Adam Machanic did that. The only stumbling block that Adam ran into was that he needed to know the full "downline" count for a node to calculate the Right Bower for the node. The obvious solution was to use a bit of RBAR for that and, although still faster than the steadfast "Push Stack" method, I thought I could make it faster still by calculating the Hierarchical Path first and then using a special "by 4" Tally Table to split the Hierarchical Path and simply sum the count of the IDs at each level for a given ID. Fortunately, that worked even better than I hoped.

The really cool part is that, since I have all three types of Hierarchical Structures in the same table, you get the advantages of all three... easy of maintenance and DRI in the Adjacency list, Full "Upline" reference in the Hierarchical Path to quickly solve the "Ancestor Problem" (which I didn't go into much but uses the "4 By" Tally table) and, of course, the nasty fast performance of the Nested Sets.

Hat's off to you for having the insight to expand on a basic tool to easily do things like finding the "Lowest Common Manager Between Two People". I didn't make you look like a rockstar. YOU made you look like a rockstar because you're the one that made the realization and followed through on that. 😀 Well done!

To that end and with full credit to you (you should PM me with you full name) , would it be ok if I shared your technique the next time I give a presentation on all of this? I've not seen anyone else even try such a problem and here you are posting a solution for it. Like I said, well done!

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

iarellano Newbie Points: 1 More actions · Answer 4

Hi Jeff Thanks for the article, very good and functional, as I understand every time that an employee changes parentId or there is a new one in the adjacent list, the Hierarchy table, should be rebuilt again, you have some example that updates the table of hierarchies when there are any of these events.

without the need to delete it and rebuild it ?

Steve Collins SSCrazy Points: 2160 More actions · Answer 5

An alternative to nested sets is nested json arrays. Here it is using Jeff's (small) Employee table (renamed to dbo.Test_Employee)

drop table if exists dbo.Test_Employee;
go
CREATE TABLE dbo.Test_Employee
        (
        EmployeeID      INT             NOT NULL,
        ManagerID       INT             NULL,
        EmployeeName    VARCHAR(10)     NOT NULL,
        CONSTRAINT  PK_Employee          
                    PRIMARY KEY CLUSTERED (EmployeeID),
        CONSTRAINT  FK_Employee_Employee 
                    FOREIGN KEY (ManagerID)
                    REFERENCES dbo.Test_Employee (EmployeeID)
                    ON UPDATE NO ACTION 
                    ON DELETE NO ACTION
        )
;
--===========================================================================
        RAISERROR('Populate the hierarchy table...',0,1) WITH NOWAIT;
--===========================================================================
--===== Populate the test table with test data. Each child ID has a parent ID
     -- adjacent to it on the same row which is why it is called an "Adjacency
     -- List".
 INSERT INTO dbo.Test_Employee
        (EmployeeID, ManagerID, EmployeeName)
 SELECT 26,NULL,'Jim'     UNION ALL
 SELECT  2,  26,'Lynne'   UNION ALL
 SELECT  3,  26,'Bob'     UNION ALL
 SELECT  6,  17,'Eric'    UNION ALL
 SELECT  8,   3,'Bill'    UNION ALL
 SELECT  7,   3,'Vivian'  UNION ALL
 SELECT 12,   8,'Megan'   UNION ALL
 SELECT 14,   8,'Kim'     UNION ALL
 SELECT 17,   2,'Butch'   UNION ALL
 SELECT 18,  39,'Lisa'    UNION ALL
 SELECT 20,   3,'Natalie' UNION ALL
 SELECT 21,  39,'Homer'   UNION ALL
 SELECT 39,  26,'Ken'     UNION ALL
 SELECT 40,  26,'Marge'  
;
RAISERROR('There are %u rows in the hierarchy.',0,1,@@ROWCOUNT) WITH NOWAIT;
--===========================================================================
        RAISERROR('Adding an additional index ...',0,1) WITH NOWAIT;
--===========================================================================
--===== Create an additional index to speed things up
 CREATE UNIQUE INDEX By_ManagerID_EmployeeID 
     ON dbo.Test_Employee (ManagerID,EmployeeID);


drop table if exists #Employee_Levels;
go
create table #Employee_Levels(
EmployeeIDint unique not null,
ManagerIDint,
EmployeeNamevarchar(10),
HLevelint not null);
go

 WITH cteBuildPath AS 
( --=== This is the "anchor" part of the recursive CTE.
     -- The only thing it does is load the Root Node.
 SELECT anchor.EmployeeID, 
        anchor.ManagerID, 
anchor.EmployeeName,
        HLevel   = 1
   FROM dbo.Test_Employee AS anchor
  WHERE ManagerID IS NULL --Only the Root Node has a NULL ManagerID
  UNION ALL 
 --==== This is the "recursive" part of the CTE that adds 1 for each level
     -- and concatenates each level of EmployeeID to the SortPath column.  
 SELECT recur.EmployeeID, 
        recur.ManagerID, 
recur.EmployeeName,
        HLevel   =  cte.HLevel + 1)
   FROM dbo.Test_Employee AS recur
  INNER JOIN cteBuildPath AS cte 
          ON cte.EmployeeID = recur.ManagerID)
insert #Employee_Levels(EmployeeID, ManagerID, EmployeeName, HLevel)
select * from cteBuildPath
--select * from #Employee_Levels

drop function if exists dbo.fnDownlines;
go
create function dbo.fnDownlines(
  @EmployeeIdint)
returns table as 
return
with inner_recur_cte(EmployeeId, ManagerId, Hlevel) as(
select 
  EmployeeId, 
  ManagerId,
  cast(0 as int)
from
  dbo.Test_Employee 
where EmployeeId=@EmployeeId
union all
select
  cat.EmployeeId, cat.ManagerId, rc.Hlevel+1 
from
  dbo.Test_Employee cat
 join
  inner_recur_cte rc on cat.ManagerId=rc.EmployeeId)
select
  sum(iif(Hlevel=1,1,0)) lvl_1_count,
  sum(iif(Hlevel=2,1,0)) lvl_2_count,
  sum(iif(Hlevel=3,1,0)) lvl_3_count,
  sum(iif(Hlevel=4,1,0)) lvl_4_count,
  sum(iif(Hlevel=5,1,0)) lvl_5_count,
  count(*)-1 lvl_all_count,
  (select * from inner_recur_cte for json path, root('downlines')) json_downlines
from
  inner_recur_cte;
go

select
  *
from
  #Employee_Levels el
 cross apply
  dbo.fnDownlines(el.EmployeeID);

To see 1 employee and all their downlines, here's Bob's:

select
  el.*, d.*, h.*, te.EmployeeName
from
  #Employee_Levels el
 cross apply
  dbo.fnDownlines(el.EmployeeID) d
 cross apply
  openjson(d.json_downlines, N'strict $.downlines')  with (EmployeeId int, ManagerId int, Hlevel int) h
 join
  dbo.Test_Employee te on h.EmployeeId=te.EmployeeId
where
  el.EmployeeId=3
order by
  el.EmployeeId

To find the lowest common manager between 2 employees (EmployeeId=3, and EmployeeId=20) select from the hierarchy where EmployeeId in (3, 20) and neither employee is the manager of each other, ManagerId not in(3, 20).

select top 1
  h.ManagerId
from
  #Employee_Levels el
 cross apply
  dbo.fnDownlines(el.EmployeeID) d
 cross apply
  openjson(d.json_downlines, N'strict $.downlines')  with (EmployeeId int, ManagerId int, Hlevel int) h
 join
  dbo.Test_Employee te on h.EmployeeId=te.EmployeeId
where
  h.EmployeeId in(3, 20)
  and h.ManagerId not in(3, 20)
order by
  el.Hlevel desc

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Jeff Moden SSC Guru Points: 1004376 More actions · Answer 6

Sorry... I missed that last post. Interesting idea. I'll have to test it against the million node adjacency list.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004376 More actions · Answer 7

It's not going to work against the million node test table because of the following code...

  sum(iif(Hlevel=1,1,0)) lvl_1_count,
  sum(iif(Hlevel=2,1,0)) lvl_2_count,
  sum(iif(Hlevel=3,1,0)) lvl_3_count,
  sum(iif(Hlevel=4,1,0)) lvl_4_count,
  sum(iif(Hlevel=5,1,0)) lvl_5_count,

The million node hierarchy has 30 levels.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Steve Collins SSCrazy Points: 2160 More actions · Answer 8

Those columns are not needed, I think. At the end of the 2nd article (allow me to quote you some Jeff Moden) it says:

Last but certainly not least, if you maintain the "dbo.Hierarchy" table as Nested Sets, there's no reason why you can't have the best of 3 worlds by keeping your Adjacency List as before, having the Nested Sets in the "Hierarchy" table, and having all the pre-aggregated answers to most hierarchical questions in the "PreAggregatedHierarchy" table.

Those columns were an attempt to add additional pre-aggregated meta data into the hierarchy tvf.

To do apples-to-apples comparison with the final script in article 2 would require creating the same 2 tables:

Hierarchy (columns: EmployeeId, ManagerId, Sales, SortPath)

PreaggregatedHierarchy (columns: EmployeeId, HLevel, RLevel, NodeCount, Sales)

The theory is/was the nested arrays are a "query-able (using SQL json functions)" sort path. So maybe the 2nd table is not required for the json solution?

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Jeff Moden SSC Guru Points: 1004376 More actions · Answer 9

Ok, so I commented the columns we spoke of out and ran your code. I had to make minor repairs because I already had an Employee table with a PK named PK_Employee. That's one of the standards I have at work... since PKs are constraints and must be unique in a database, our standard says that the PK of a table shall ALWAYS be named and in the PK_tablename format except for Temp Tables and, for those, they must NOT be named (the system needs to name them to avoid contention).

There was also an extra parenthesis in the following line that needed to be removed.

        HLevel   =  cte.HLevel + 1)

Once I got that working, I changed the table name back to "dbo.Employee" so I could more easily use it with the code I've already established.

I rebuilt the million row hierarchy using the tool from this article and then ran your code using an EmployeeID that has only 8 nodes in the downline, including itself. Here's the code I ran and the results from a statistics point of view...

select
  el.*, d.*, h.*,te.EmployeeName 
from
  #Employee_Levels el
 cross apply
  dbo.fnDownlines(el.EmployeeID) d
 cross apply
  openjson(d.json_downlines, N'strict $.downlines')  with (EmployeeId int, ManagerId int, Hlevel int) h
 join
  DBO.EMPloyee te on h.EmployeeId=te.EmployeeId
where
  el.EmployeeId=  1000 --199741
order by
  el.EmployeeId
;
------------------------------------------------------------------------------------------------------
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 4 ms.

(8 rows affected)
Table 'employee'. Scan count 16, logical reads 83, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 4, logical reads 98, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Employee_Levels____________________________________________________________________________________________________000000000007'. 
Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

Then I ran the code that does the same thing but using nested sets. Here's that code and the statistics results...

WITH cteToFind AS
(
 SELECT EmployeeID, ManagerID, EmployeeName, HLevel, LeftBower,RightBower,NodeCount
   FROM dbo.Hierarchy
 WHERE EmployeeID = 1000 --199741
)
 SELECT  f.EmployeeID, f.ManagerID, f.EmployeeName, f.HLevel,Lvl_All_Count = f.NodeCount 
        ,h.EmployeeId, h.ManagerId, h.EmployeeName, SubNodeHLevel = h.Hlevel-f.HLevel
   FROM dbo.Hierarchy h
   JOIN cteToFind f ON h.LeftBower BETWEEN f.LeftBower AND f.RightBower
  ORDER BY h.LeftBower
;
----------------------------------------------------------------------------------------------------------
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 1 ms.

(8 rows affected)
Table 'Hierarchy'. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

Your method consumed a total of 185 logical reads, which is more than 20 times more than 9 the Nested Sets method did. That seems like it's going to be a problem (and, if you look at the execution plans, you'll see the many reasons why).

So I ran it for an Employee id that has 662 nodes in its downline... here are the run stats for yours...

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 3 ms.

(662 rows affected)
Table 'employee'. Scan count 1324, logical reads 6018, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 4, logical reads 7946, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Employee_Levels____________________________________________________________________________________________________000000000007'.
Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 16 ms,  elapsed time = 601 ms.

... and here are the run stats for the Nested Sets method...

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

(662 rows affected)
Table 'Hierarchy'. Scan count 1, logical reads 19, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 86 ms.

Your method consumed 13,968 logical reads, 16 ms of CPU, and 601MS in duration. The Nested Sets method consumed 19 logical reads, 0ms of CPU, and 86ms of duration. Your code consumed more than 735 times more logical reads than the Nest Sets method and took almost 7 times longer to execute.

Shifting gears a bit, since the Nested Sets method has most of the goodies precalculated (and it only took 19 seconds to do so on the million rows on this new laptop I'm using and not the 54 in the article), it was easy to find nodes that had 8 nodes and more than 600 nodes in the downline. I would have had to run your code a million times to get the same info for that.

The bottom line is that, although your code is clever, the hidden RBAR in the downline function in the form of an rCTE and the overhead of parsing the JSON slows thing down quite a bit.

To answer your question, since it only takes 19 seconds to recreate the dbo.Hierarchy table and it has some things already calculated (especially in the second article), I believe I'd rather WORM it (write once, read many).

Just to be sure, that's not intended as a slam. It's good to test and compare virtually any method that someone posts because you never can tell when someone is going to come up with the proverbial "purple squirrel". Thank you for taking the time to science out a different method and then to share it. You're definitely one of the good guys.

This reply was modified 4 years, 7 months ago by Jeff Moden.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Steve Collins SSCrazy Points: 2160 More actions · Answer 10

That's freaking amazing. It's awesome to see real in depth analysis. Thank you for doing this.

Jeff Moden wrote:

Shifting gears a bit, since the Nested Sets method has most of the goodies precalculated (and it only took 19 seconds to do so on the million rows on this new laptop I'm using and not the 54 in the article), it was easy to find nodes that had 8 nodes and more than 600 nodes in the downline. I would have had to run your code a million times to get the same info for that.

I'm not sure I understand what this means. Which questions couldn't be answered by the downlines tvf?

The bottom line is that, although your code is clever, the hidden RBAR in the downline function in the form of an rCTE and the overhead of parsing the JSON slows thing down quite a bit.
To answer your question, since it only takes 19 seconds to recreate the dbo.Hierarchy table and it has some things already calculated (especially in the second article), I believe I'd rather WORM it (write once, read many).

I'm curious about how the length of json labels impacts performance. As a medium of storage json is horrendously inefficient. A piece of downline json looks like

{"downlines":[{"data_id":"1","h_level":0},{"data_id":"11","parent_id":"1","h_level":1},{"data_id":"22","parent_id":"1","h_level":1},{"data_id":"33","parent_id":"1","h_level":1},{"data_id":"44","parent_id":"1","h_level":1},{"data_id":"55","parent_id":"1","h_level":1},{"data_id":"33-2","parent_id":"33","h_level":2},{"data_id":"33-3","parent_id":"33","h_level":2},{"data_id":"33-4","parent_id":"33","h_level":2},{"data_id":"22-1","parent_id":"22","h_level":2},{"data_id":"22-2","parent_id":"22","h_level":2},{"data_id":"22-3","parent_id":"22","h_level":2},{"data_id":"22-4","parent_id":"22","h_level":2},{"data_id":"11-1","parent_id":"11","h_level":2},{"data_id":"11-2","parent_id":"11","h_level":2},{"data_id":"11-3","parent_id":"11","h_level":2},{"data_id":"11-3A","parent_id":"11-3","h_level":3},{"data_id":"11-2A","parent_id":"11-2","h_level":3},{"data_id":"11-2A_1","parent_id":"11-2A","h_level":4},{"data_id":"11-1A","parent_id":"11-1","h_level":3},{"data_id":"11-1A_1","parent_id":"11-1A","h_level":4}]}

Just by labeling things single characters the overall length could be reduced greatly. Instead of 'data_id' it could just be 'x'. It's always occurred to me that some sort of compression would be a good thing. Maybe the SQL is optimized so the labels are reference types and it doesn't make any difference.

Json in general has a lot going for it. To the extent it comes self-contained as a parse-able data structure which includes the data field labels it can be very useful. For example, the inputs to a stored procedure could be completely generic and yet specific parameters could still be passed in if they're "packed" in a nested array. This is another advantage of JsonAutoService. It automatically maps the inputs from application requests to stored procedures using a consistent interface so the SQL handlers are essentially 100% automated. With 3 pieces of information (1. the name of the stored procedure, 2. a method name, and 3. a request route) it wires up a web api endpoint as good or better than if you code the ADO.NET manually. Also, because the interfaces are generic it makes the C# impervious to de-compilation risk. That's the elephant in the room nobody talks about. With JsonAutoService the server code NEVER (unless you want it to) contains ANY information specific to a particular database.

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Jeff Moden SSC Guru Points: 1004376 More actions · Answer 11

Steve Collins wrote:

That's freaking amazing. It's awesome to see real in depth analysis. Thank you for doing this.
Jeff Moden wrote:
Shifting gears a bit, since the Nested Sets method has most of the goodies precalculated (and it only took 19 seconds to do so on the million rows on this new laptop I'm using and not the 54 in the article), it was easy to find nodes that had 8 nodes and more than 600 nodes in the downline. I would have had to run your code a million times to get the same info for that.
I'm not sure I understand what this means. Which questions couldn't be answered by the downlines tvf?

What I mean can be demonstrated by me asking you to use your code to do just one thing... produce a list of all nodes in the million row hierarchy that have a down-line of 600 or more nodes each. Then I'll ask you to find all leaf nodes in the hierarchy in the million node hierarchy.

As for the JSON itself, I'll let you be the one to experiment with that because, and I could be wrong but, ... I simply don't believe it's necessary here.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Steve Collins SSCrazy Points: 2160 More actions · Answer 12

Jeff Moden wrote:

Steve Collins wrote:
That's freaking amazing. It's awesome to see real in depth analysis. Thank you for doing this.
Jeff Moden wrote:
Shifting gears a bit, since the Nested Sets method has most of the goodies precalculated (and it only took 19 seconds to do so on the million rows on this new laptop I'm using and not the 54 in the article), it was easy to find nodes that had 8 nodes and more than 600 nodes in the downline. I would have had to run your code a million times to get the same info for that.
I'm not sure I understand what this means. Which questions couldn't be answered by the downlines tvf?
What I mean can be demonstrated by me asking you to use your code to do just one thing... produce a list of all nodes in the million row hierarchy that have a down-line of 600 or more nodes each. Then I'll ask you to find all leaf nodes in the hierarchy in the million node hierarchy.

Ok, it's definitely possible. Maybe there are tricky questions going the other way, I suspect that's the case. It may take a bit but I'll go through the whole thing again and try to draw as many equivalencies as possible. The outer recursion was shamelessly borrowed. If the inner recursion is slow maybe more borrowing is needed 🙂

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Jeff Moden SSC Guru Points: 1004376 More actions · Answer 13

iarellano wrote:

Hi Jeff Thanks for the article, very good and functional, as I understand every time that an employee changes parentId or there is a new one in the adjacent list, the Hierarchy table, should be rebuilt again, you have some example that updates the table of hierarchies when there are any of these events.
without the need to delete it and rebuild it ?

Oh crud... my apologies. I somehow missed your post.

The way to update this without dropping and rebuilding it is to make two hierarchy tables. Populate one while the other is in use. Use a synonym to point to the one in use for the outside world and one for the inside world to rebuild. Once you've done a rebuild, just repoint both synonyms. Unfortunately, synonyms can't be alter and so they must be dropped and rebuilt but the total downtime will be measured in very low milliseconds.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004376 More actions · Answer 14

One of the problem with the function is this bit of code...

  (select * from inner_recur_cte for json path, root('downlines')) json_downlines
from
  inner_recur_cte;

A lot of people don't understand that, with some exceptions, CTE's work the same way as if you had called a View twice... it gets executed twice or more, depending on how it is used.

Then, you do it again in the outer code that uses the function and so the CTE in the function actually fire 4 times (which explains why the execution plan looks like it does with 4 seeks to the employee table)...

 cross apply
  dbo.fnDownlines(el.EmployeeID) d
 cross apply
  openjson(d.json_downlines, N'strict $.downlines')  with (EmployeeId int, ManagerId int, Hlevel int) h

That's not the whole problem, though because if you divide the number of logical reads I cited by 4, they're still pretty bad.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

jcelko212 32090 SSCrazy Eights Points: 9303 More actions · Answer 15

I'm not quite sure why you need a hierarchy, I'm going to take a guess. One of the many advantages of a nested set model is that you should separate the nodes into one table (your goods or employees) and by a hierarchical structure in another table (the packaging or organizational chart). The nice part is you can rearrange the same nodes into many different hierarchies with a simple REFERENCES clause. Is that what you're trying to do?

Please post DDL and follow ANSI/ISO standards when asking for help.