May 20, 2008 at 9:59 pm
Comments posted to this topic are about the item Two Useful Hierarchy Functions
May 21, 2008 at 2:47 am
very nicely done!
May 21, 2008 at 4:04 am
Hard to read due to poor web presentation. I'm using IE7 and the code segments are not readable unless they're cut and pasted elsewhere. I ran out of time and patience.
May 21, 2008 at 5:43 am
Does the new SQL2008 HeirarchyId pretty much replace the need for doing this?
May 21, 2008 at 6:45 am
Nice example, thanks for simplifying this complex subject. Would it have been better to make the node_id column in the table example an identity column for saving data (after a user selected one item) and reporting? What are the costs/benefits?
Also, I guess the hierarchy could change too, what if there was an org. change in that example, where Alan left and another became the start of the tree? In my application, do we want to save the pick of Alan in your hierarchy (hence that identity column idea above) or just the current starting node for the year, no matter who it is at that point in time? I guess that may depend on your application needs?
Thanks again for the nice article and great subject.
May 21, 2008 at 7:43 am
I've used hierarchies with as many as 7,000 nodes and as many as 50 levels. CTEs work well for those.
But one thing that should be brought up in any discussion of hierarchies is the "nested sets" model.
The type of hierarchy defined in this article is called an "adjacency hierarchy". A nested sets hierarchy works a little different.
Here's a sample table for a nested sets hierarchy:
create table NSHierarchy (
SetStart int not null,
SetEnd int not null,
constraint PK_NSHierarchy primary key (SetStart, SetEnd),
Name varchar(100))
The way it works is that the SetStart column and the SetEnd column define a range of values, and lower levels of the hierarchy are given values between those.
For example:
insert into dbo.NSHierarchy (SetStart, SetEnd, Name)
select 1, 10, 'Mandy' union all
select 2, 5, 'Bob' union all
select 6, 9, 'Sue' union all
select 3, 3, 'Doug' union all
select 4, 4, 'Sam' union all
select 7, 7, 'Terry' union all
select 8, 8, 'Sal'
In this example, Mandy is the top level of the hierarchy. Bob and Sue report directly to Mandy, since their numbers are inside her range (anything between 1 and 10 is in her hierarchy). Doug and Sam report to Bob (their numbers are between 2 and 5, which are Bob's numbers), while Sam, Terry and Sal report to Sue (between 6 and 9).
This is queried as:
create proc HierarchySelect
(@TopNode_in int)
as
declare @SEnd int
select @SEnd = SetEnd
from dbo.NSHierarchy
where SetStart = @TopNode_in
select Name
from dbo.NSHierarchy
where SetStart between @TopNode_in and @SEnd
and SetEnd between @TopNode_in and @SEnd
(Variations on this to deal with the name as an input parameter, or to allow a level to have two parents (SetStart in one range, SetEnd in another), etc., are pretty easy to extrapolate.)
Joe Celko has a much more complete and comprehensive description of this on his site and in his books. I won't try to go into that level of detail, because he's already done a better job of it than I will.
The advantage to this is that it's MUCH faster than an adjacency hierarchy.
For example, that 7,000-node, 50-level hierarchy, takes almost a second to run (on my test machine) using a CTE, exactly as described in the article. Not bad, but it also takes 7,000 scans (IO). That's a LOT of IO. With a nested sets hierarchy, it takes less than a millisecond to run, and takes 2 scans.
The disadvantage of a nested sets hierarchy is that it takes more work to update/insert than an adjacency hierarchy. If the data changes a lot, it becomes much more difficult to manage.
I've solved that for my situation by using a hybrid solution (partially suggested by Jeff Moden). I have tables that looks like this:
create table Hierarchies (
HierarchyID int identity primary key,
Name varchar(100))
go
create table HierarchiesNodes (
HierarchyID int not null references dbo.Hierarchies(HierarchyID),
NodeID int identity primary key,
ParentID int null references dbo.Hierarchy(NodeID),
SetStart int null,
SetEnd int null,
SetLevel int null,
CustomerID int not null references dbo.Customers(CustomerID),
ActiveDate datetime default(getdate()),
InactiveDate datetime null)
The update, insert and delete code for the nodes table sets the SetStart and SetEnd columns to null for any affected rows. The lookup code (select), checks to see if any level that's in the same hierarchy (HierarchyID) has a null SetStart or SetEnd. If so, it uses a CTE to pull the hierarchy. If not, it uses a nested sets query to pull it. Every hour, the table is scanned for null SetStart and SetEnd values, and if any are found, those hierarchies have their set data updated. Again, Joe Celko has an article on how to do that efficiently (it's not hard to figure out if you know how to use a CTE to resolve the hierarchy in the first place).
I use the HierarchyID column as the first column in the indexes on this table. It's selective enough, in my case, that it seriously speeds up the selects. It doesn't matter for the set queries, but it does for the CTEs.
I use the SetLevel column for the nested sets queries, because unlike a recursive CTE, I can't just have it use a "Depth + 1" calculation if I want levels. It also allows me to limit the number of levels, if that's appropriate for the query at hand. Just like the HierarchyID doesn't matter to the sets, the SetLevel column doesn't matter for the CTE, but I need both for this hybrid solution.
Thus far, this gives us the best of both models.
I thought it should be mentioned, since this data matters if you deal with hierarchies much.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
May 21, 2008 at 7:50 am
ddnrg (5/21/2008)
Does the new SQL2008 HeirarchyId pretty much replace the need for doing this?
That one, per what I've read, is fast to query, but slow to update/insert/delete. Adjacency hierarchies (like the one described in the article) are fast to update/insert/delete and somewhat slower to query. So it might, or might not, replace it, depending on specific needs.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
May 21, 2008 at 8:07 am
Nice and simple, pleasure to read. Well done, John! 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
May 21, 2008 at 8:43 am
Very well written..........
May 22, 2008 at 11:47 am
Got error "Invalid object name 'salaries'."
Table is not created anywhere
May 22, 2008 at 12:16 pm
I apologize, Omama -- I was just using that as an example if you had a Salaries table to subtotal against. Though not included in the article, we can create a quick simple table that will work:
create table salaries(
emp_id int not null,
salary money,
fica money
)
GO
insert into salaries values( 1, 110000, 11000 )
insert into salaries values( 2, 100000, 10000 )
insert into salaries values( 3, 70000, 7000 )
insert into salaries values( 4, 90000, 9000 )
insert into salaries values( 5, 95000, 9500 )
insert into salaries values( 6, 60000, 6000 )
insert into salaries values( 7, 75000, 75000 )
insert into salaries values( 8, 50000, 50000 )
insert into salaries values( 9, 55000, 55000 )
insert into salaries values( 10, 60000, 6000 )
GO
select
sum( s.salary ) departmental_salary_total,
sum( s.fica ) as departmental_fica_total
from subordinates( 2007, 2 ) base
left join salaries s
on s.emp_id = node_id
May 30, 2008 at 5:54 pm
For reporting purposes we would generally want to sort all the nodes within one parent by node name. Below is an extension to the subordinates function that returns a field suitable for this purpose. The added field is called "node_sort" so your query might look like this...
select * from subordinates(2007,1) order by node_sort, node_name
----- create Subordinates function using CTE with added sort field
if object_id( 'subordinates', 'IF' ) is not null drop function subordinates
GO
create function subordinates( @year int, @node_id int ) returns table as return
with subnodes( distance, year, node_id, parent_node, node_name, node_seq, node_sort) as
(
select 0, @year, h.node_id, h.parent_node, h.node_name,
convert( varchar(99), ltrim(str(h.node_id))) as node_seq,
convert( varchar(99), '')
from hierarchy h where h.node_id = @node_id and h.year = @year
union all
select distance+1, @year, h.node_id, h.parent_node, h.node_name,
convert( varchar(99), sn.node_seq + '.' + ltrim(str(h.node_id))),
convert( varchar(99), sn.node_sort + '.' + right('00000' + cast(isnull(h.parent_node,'0') as varchar(5)), 5) )
from hierarchy h inner join subnodes sn on h.year = @year and h.parent_node = sn.node_id
)
select distance, year, node_id, parent_node, node_name, node_seq, node_sort from subnodes
GO
October 2, 2008 at 6:22 am
EXCELENTE TU SUGERENCIA, TOD!!! REALMENTE EXCELENTE!!!
Enhorabuena Tod!
August 3, 2010 at 2:15 am
None of this "with" stuff is working for us on MSSQL 2000...
Server: Msg 156, Level 15, State 1, Procedure subordinates, Line 2
Incorrect syntax near the keyword 'with'.
Is it a feature of MSSQL 2005 only?
August 3, 2010 at 7:03 am
Jack-408956 (8/3/2010)
None of this "with" stuff is working for us on MSSQL 2000...Server: Msg 156, Level 15, State 1, Procedure subordinates, Line 2
Incorrect syntax near the keyword 'with'.
Is it a feature of MSSQL 2005 only?
Yes... "WITH" is a precursor to CTE's.
If you want to do this "stuff" in 2k, do an initial insert similar to the first part (above the UNION) of the CTE. Then, do a loop that does very similar to the second part of the CTE which will have nearly the same speed as the CTE and (haven't tried it with this particular table) will likely use about 1/3rd the number of reads. The second part of the CTE (or the loop) is NOT RBAR, in this case. Each iteration of the loop loads an entire level of the hierarchy no matter how wide a given level becomes.
That's all a recursive CTE does, by the way... it does an "initial load" and then loops the same way (and with the same impact on performance) as a While Loop.
In 2k, this method will give much better utility and performance than using the classic "Expanding Hierarchies" example given in 2k Books Online.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply