May 15, 2007 at 10:30 pm
Comments posted here are about the content posted at http://www.sqlservercentral.com/columnists/cHatley/3001.asp
June 3, 2007 at 9:36 pm
Ahhh, ye olde adjacency list model. You might get better efficiency from a set-based solution, like the nested sets model or one of its variants. Here's an Q&D sample.
CREATE TABLE #Employees (ID INT PRIMARY KEY,
LastName VARCHAR(30),
FirstName VARCHAR(30),
LeftID INT,
RightID INT)
INSERT INTO #Employees (ID, LastName, FirstName, LeftID, RightID)
VALUES (1, 'Smith', 'John', 1, 24)
INSERT INTO #Employees (ID, LastName, FirstName, LeftID, RightID)
VALUES (2, 'Johnson', 'Bob', 2, 15)
INSERT INTO #Employees (ID, LastName, FirstName, LeftID, RightID)
VALUES (4, 'Fields', 'Sue', 3, 4)
INSERT INTO #Employees (ID, LastName, FirstName, LeftID, RightID)
VALUES (3, 'Burton', 'Debra', 16, 23)
INSERT INTO #Employees (ID, LastName, FirstName, LeftID, RightID)
VALUES (5, 'Jacobs', 'Todd', 11, 14)
INSERT INTO #Employees (ID, LastName, FirstName, LeftID, RightID)
VALUES (6, 'Sanders', 'Mary', 5, 10)
INSERT INTO #Employees (ID, LastName, FirstName, LeftID, RightID)
VALUES (10, 'Sims', 'Tom', 17, 18)
INSERT INTO #Employees (ID, LastName, FirstName, LeftID, RightID)
VALUES (11, 'Wright', 'Larry', 19, 20)
INSERT INTO #Employees (ID, LastName, FirstName, LeftID, RightID)
VALUES (12, 'Morgan', 'Betty', 21, 22)
INSERT INTO #Employees (ID, LastName, FirstName, LeftID, RightID)
VALUES (7, 'Teeter', 'Sandy', 12, 13)
INSERT INTO #Employees (ID, LastName, FirstName, LeftID, RightID)
VALUES (8, 'Morris', 'Megan', 6, 7)
INSERT INTO #Employees (ID, LastName, FirstName, LeftID, RightID)
VALUES (9, 'Binks', 'Randy', 8, 9)
GO
CREATE PROCEDURE dbo.GetOrgChart (@startID INT)
AS
BEGIN
SELECT e1.ID, e1.LastName, e1.FirstName, e1.LeftID, e1.RightID, COUNT(e2.ID) AS Level
FROM #Employees e1
INNER JOIN #Employees e2
ON e1.LeftID BETWEEN e2.LeftID AND e2.RightID
WHERE e1.LeftID BETWEEN (SELECT LeftID FROM #Employees WHERE ID = @startid)
AND (SELECT RightID FROM #Employees WHERE ID = @startid)
GROUP BY e1.ID, e1.LastName, e1.FirstName, e1.LeftID, e1.RightID
ORDER BY e1.LeftID
END
GO
EXEC dbo.GetOrgChart 1
June 4, 2007 at 2:54 am
Oracle has "Connect by prior"
select * from person connect by prior id=supervisor_id
.. it makes traversing hierarchies trivial.
I was hoping it would appear in SS2005...
June 4, 2007 at 3:49 am
Hi Mike,
apologies for this, but i am interested in your right-brain set based solution, but I am feeling a bit left-brain this Monday morning, probably lack of coffee, so please could you set me right about what left-id and right-id represent because at the moment I'm feeling a bit left-out.
Thanks
David
If it ain't broke, don't fix it...
June 4, 2007 at 4:05 am
Quite liked that one, though as surprised as you at the efficiency. Will give it a bash sometime.
June 4, 2007 at 7:18 am
Mike's example is actually a nested set model, not an adjecent list model per the article. Basically instead of having a "boss" column in your employee table that's the same as an employee, each employee has containment information. You use two columns which represent the range of containment, a set, an employee has. Joe Celko explains it better, http://www.developersdex.com/gurus/articles/112.asp.
June 4, 2007 at 7:49 am
Hi Tim,
Thanks for that link to Joe Celko's article, I had not come across this solution before. I can see that it is easier to model an org tree in SQL using set theory. In practice, does the data in the set correspond to an organisation's departmental structure, rather than individual relationships, otherwise you could get real problems if an employee switches managers and correspondingly needed to switch containments?
I like the idea that because the containment encapsulates all levels of the hierarchy, there is no break in the management chain if an employee in the middle-tier leaves, the model is effectively self-healing.
Presumably by creating overlapping sets you could also create organisation structures where individuals have multiple lines of management (many DBAs may be familiar with this )
David
If it ain't broke, don't fix it...
June 4, 2007 at 7:52 am
Hi Mike,
Although your solution is a viable option (and we considered it), we chose to identify each employee's manager as a single field on each record because we wanted the ability to easily update this information to accomodate the incredible growth of our organization. With the method we are using, we basically complete this single piece of information (manager's user ID) for each new employee during the account setup process and we are good to go.
June 4, 2007 at 8:14 am
Hi Mike,
found a further article at http://www.intelligententerprise.com/001020/celko1_1.jhtml
It appears that if your organisation structure fequently changes, you would still need an adjacency list to calculate the nested sets though? Joe provides the code to convert one to the other. I guess the practical application of this solution would be to write a trigger which recalculates the nested sets whenever the organisation structure is changed (i.e. you change an employee's manager id), storing the left and right values you calculate and using them as an aid to produce faster organisation structure reports.
If someone left the organisation, you would leave the existing calculated left and right values in place to provide a management hierarchy until a new manager was appointed.
PS: Have had several coffees now
David
If it ain't broke, don't fix it...
June 4, 2007 at 9:04 am
Hi,
I have received a private question asking about how to deal with employees that may report to multiple bosses and I wanted to provide my thoughts on the subject. I think the best approach to address this issue would be to move the 'Manager User ID' field to a separate table containing at least the 'User ID' and 'Manager User ID' columns which could be joined to the 'Employee' table via the 'User ID' column. You could then modify the stored procedure to include the new table via a join such as (Changes Highlighted In Yellow) . . .
CREATE PROCEDURE select_subordinates_by_manager
@user-id int
As
Set NoCount On
Declare @UserCount int
Set @UserCount = 1
Create Table #In (UserID int)
Create Table #Out (UserID int)
Create Table #Result (UserID int)
Insert Into #In (UserID) Values (@UserID)
Insert Into #Result (UserID) Values (@UserID)
While @UserCount > 0
Begin
Insert Into #Out (UserID)
Select Distinct UserID From Test..Employee as e
Join Test..ManagedBy as m on m.UserID = e.UserID
Where m.ManagerUserID In (Select UserID From #In)
Select @UserCount = (Select Count(UserID) From #Out)
If @UserCount > 0
Begin
Insert Into #Result (UserID)
Select UserID From #Out
End
Delete From #In
Insert Into #In
Select UserID From #Out
Delete From #Out
End
Set NoCount Off
Select UserID From #Result
June 4, 2007 at 9:21 am
Hi David and Craig,
The LeftID and RightID define a subset. In the case of John, his LeftID=1 and RightID=24. Anyone with a LeftID and RightID between these two numbers are defined by a subset of John's set. Bob's has the LeftID=2 and RightID=15, so his numbers are a subset of John's set {1..24} (apologies for the "fake" set notation .) Therefore, Bob reports to John. And so on...
Actually you don't need to use the adjacency list to renumber the left and right ID's of your nested sets. There are some other options to improve efficiency. In the example I gave the sets were a "tight" fit, with the left and right id numbers as close together as possible. However, that's not a requirement. You can actually spread them out further to start with, so long as they still properly "nest" within one another. For instance:
John Smith, LeftID=1, RightID=1000
Bob Johnson, LeftID=20, RightID=150
Sue Fields, LeftID=300, RightID=400
In this case, Bob and Sue are still nested within John Smith, but the range between LeftID and RightID has been widened. You can easily insert or delete individuals without recalculating (dependent on business rules). The recalculations come in when Bob's entire range is filled and he gets another subordinate.
Using this method you don't need to recalculate LeftID and RightID as often, which, as David pointed out can hurt performance on large data sets. Renumbering can also be done with just a few statements (albeit slightly more complex ones). This method also has the advantage that you can calculate things like the level of each employee in the hierarchy, total height of the tree, etc., relatively easily.
You can also convert adjacency list to nested sets, and vice versa.
Celko's written a lot on the topic of nested sets versus adjacency list. He also has a pretty good book on the subject - "Trees and Hierarchies in SQL". Although it's not SQL Server specific, his examples are fairly easy to port over.
Also, if you're going to use the adjacency list, you might consider using recursive CTEs (SQL 2K5) - they were defined for this type of thing
June 4, 2007 at 9:31 am
"In practice, does the data in the set correspond to an organisation's departmental structure, rather than individual relationships"
This is just a different model for storing hierarchical data in SQL tables. The origial article presented an adjacent set model, Mike presented a nested sets model, and there's another I've seen used called the maternalized path model. The maternalized path model is basically the same structure as a URL, folders and paths. In that model you store a node's entire ancestry in a column. To get to your question though, with both the nested sets and maternalized path models the information you use to maintain the tree structure can be, and usually is, entirely fictional with no relationship whatsoever with the data it's associated with. Kind of like inventing an ID column for a table instead of using a person's name as a key for instance.
I haven't had to model hierarchical data for a project in a while but I used the nested sets model to do it when I did. It really isn't as difficult to use or maintain as some articles I've read about it indicate -- of course I'll suffix that by saying each model has its advantages and disadvantages. Changing someone's boss, for instance, is essentially moving a node within the tree. In the nested sets model this boils down to updating rows that are between numerical ranges and adding or subtracting numbers. With the maternalized path model you update rows that are LIKE a certain string. In either case you could potentially update every row in the table. The adjacent set model wouldn't update as many rows to do this.
June 4, 2007 at 11:31 am
Here's a short CTE, assuming a table named #Employees with the data shown in the article:
WITH OrgChart (UserID, LastName, FirstName, ManagerUserID)
AS
(
SELECT UserID, LastName, FirstName, ManagerUserID
FROM #Employees
WHERE ManagerUserID = 0
UNION ALL
SELECT e.UserID, e.LastName, e.FirstName, e.ManagerUserID
FROM #Employees e
INNER JOIN OrgChart o
ON e.ManagerUserID = o.UserID
)
SELECT UserID, LastName, FirstName, ManagerUserID
FROM OrgChart;
Please excuse any typos - I had to retype this in here.
June 4, 2007 at 12:25 pm
"I was hoping it would appear in SS2005..."
SQL Server 2005 introduced Common Table Expressions (CTE) which can be recursive.
May 16, 2008 at 8:13 am
I like Craig's original approach because performance is good and it is straightforward. Another important consideration is many, if not most organizations that I am aware of tend to follow the 'one-employee-one-manager' model. I utilize essentially the same algorithm with three differences: 1) I encapsulate all of the code within the stored procedure, which simply outputs an ordered four-column dataset (SortName [Last, First], Employee ID, SupervisorID, Level), 2) I include the Level column (integer), which indicates how many levels down from the specified manager each employee is in the hierarchy, and 3) I have an additional input parameter for Level, which allows me to easily return only the direct reports for the manager (or some other number of levels deep that I may be interested in at the moment). Of course, my method makes certain assumptions about how the names will be used since I am controlling the name format (we do a lot of dropdown lists and reports using the output.)
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply