Displaying Sorted Hierarchies (SQL Spackle)

  • cedarhillr (3/10/2011)


    If one used the same method but inverted the list you have a roll up you can use for various accounting functions. Inverted list processing was one of the first methods the early dbs' used for performance. It's a nifty feature to learn.

    Do you, by any chance, have a link that you could share for the "Inverted list processing" methods you speak of? I've not personnaly used the method and wouldn't mind reading up on it. Thanks.

    I've developed another method which preaggregates much of the information that most folks would use a "Nested Set" for and it's directed at some of those accounting functions you speak of. I'm covering that new method in the larger article I'm writing.

    Thanks for stopping by and leaving a note. I appreciate it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (3/10/2011)


    Wesley Brown (3/10/2011)


    I prefer to use a two column approach to build a tree instead of the single column style that you have to split, it doesn't scale well. For smaller stuff like this HR representation it should be OK. Trees and Hierarchies in SQL from Joe Celko is one of my all time favorite books! Good article!

    your brother,

    Phil McCrevice

    If, by "two column approach", you mean "Nested Sets", I absolutely agree. In the larger article I'm writing on hierarchies, I'll actually demonstrate a new method for converting an "Adjaceny List" to a "Nested Set" that I think you'll like especially since it gets away from the RBAR of a "push stack" to build the "Nested Set". Ben-Gan also has an alternate method for doing the same thing.

    You'll also like the "warehouse" table that I'll build in the coming article which you might prefer to a "Nested Set" because the "warehouse" table contains preaggregated answers for the 4 of the more common hierarchical lookups that people seem to do on a regular basis.

    I've got a two-step script that can generate a nested sets "path" from an adjacency hierarchy, turning it into a nested sets hierarchy, using Cross Apply and a ranking function. Single query, and much more efficient than the method Joe originally proposed (but also uses features he didn't have when he originally wrote the article).

    I've found you can also short-circuit the conversion and get really good performance using the ASCII table and a binary colation for your set headers and ranges instead of numeric values. You don't need the binary colation for smaller hierarchies, but it comes in handy for bigger/complex ones with lots of top levels. (This method takes advantage of the fact that you can, in the recursive part of the CTE, concat a string together based on level and row_number within the level, and that "AAAB" is between "AAA" and "AAB" in a string sort. Much simpler than using integers for the ranges.)

    The string-range version is a bit of a pain to set up in the first place (making sure all top level and sub-level strings are guaranteed unique out of a limited dataset of ASCII characters that can be sorted on), but works nicely once it's in place.

    I'm interested in seeing how your article solves it.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (3/10/2011)


    I've got a two-step script that can generate a nested sets "path" from an adjacency hierarchy, turning it into a nested sets hierarchy, using Cross Apply and a ranking function.

    Is that method in the good article you wrote on hierarchies, Gus?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I use 'generated' TSQL like that all the time but never thought to make the leap and intermix straight/generated TSQL in a CTE like that. Oh the possibilities!! Nicely done Jeff, as usual!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Jeff Moden (3/10/2011)


    GSquared (3/10/2011)


    I've got a two-step script that can generate a nested sets "path" from an adjacency hierarchy, turning it into a nested sets hierarchy, using Cross Apply and a ranking function.

    Is that method in the good article you wrote on hierarchies, Gus?

    Nope. Developed after that.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Jeff Moden (3/10/2011)


    GSquared (3/10/2011)


    I've used a variation on this solution for sorting before, and found that it works best if you pad numerical values with leading zeroes to a fixed length, before sorting them as character data.

    Hi Gus,

    Thanks for stopping by. I agree... even better is the method of converting numeric ID data to BINARY(4) data and concatenating that information into a VARBINARY() column... sans slashes, of course.

    Yep.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Great article Jeff.

  • Nice article Jeff. I have used a very similar method to this for very large hierarchies. Nice examples and well explained and illustrated.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Nice article Jeff. I see that you promised a more in-depth/advanced discussion within the body of your article. Are you going to be discussing the HierarchyID data type in that article?

    I think that this demonstration piece of code shows the value of CTEs - and is great for small HR type applications that do not required scalability, but there is huge benefit to using the HierarchyID data type as an additional column or even to replace the Primary Key value with.

    Perhaps your next article could explore these alternatives?

  • CELKO (3/10/2011)


    I am doing the second edition of TREES & HIERARCHIES this year. If you have stuff that ought to be in the book, plese send it to me ASAP.

    You and I started to have this discussion before, Joe. You and I pretty much disagree when it comes to how to talk to/with people and just about everything T-SQL except that the data should be normalized. 😉 Heh... you even called me a "baby killer" in one of your famous rants years ago (even though it was your code that didn't work). 😀 Hell, I even disconnected from you on Linked-In because you're normally such a, ummm... less than considerate poster and your attitude on most of your posts.

    So let me ask... other than being in a "Joe Celko" book, what's in it for me and what's in it for your readers since I was going to publish the methods I've developed in a month or two anyway? Perhaps if you could explain those things, I might reconsider my previous answer from a month ago from "I'll have to pass" to something more affirmative.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • CELKO (3/10/2011)


    Jeff, GSquare, et al

    I am doing the second edition of TREES & HIERARCHIES this year. If you have stuff that ought to be in the book, plese send it to me ASAP.

    Think of the fame, the glory, the books and beer that comes with it!

    For the record and the other readers:

    To find the level of each emp_name, so you can print the tree as an indented listing.

    SELECT T1.node,

    SUM(CASE WHEN T2.lft <= T1.lft THEN 1 ELSE 0 END

    + CASE WHEN T2.rgt < T1.lft THEN -1 ELSE 0 END) AS lvl

    FROM Tree AS T1, Tree AS T2

    WHERE T2.lft <= T1.lft

    GROUP BY T1.node;

    An untested version of this using OLAP functions might be better able to use the ordering. Ut will not work in T-SQL becasue we don't have the RANGE sub-clause yet.

    SELECT T1.node,

    SUM(CASE WHEN T2.lft <= T1.lft THEN 1 ELSE 0 END

    + CASE WHEN T2.rgt < T1.lft THEN -1 ELSE 0 END)

    OVER (ORDER BY T1.lft

    RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS lvl

    FROM Tree AS T1, Tree AS T2

    WHERE T2.lft <= T1.lft;

    Although I certainly appreciate the code and the polite nature of your post, you need to explain to people that both methods are using a "Nested Set" hierarchy and really has nothing to do with the article this discussion is actually about.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Using HierarchyID Datatype with this solution

    If added a new column into the table called OrganisationLevel with a HierarchyID datatype, then updated this column with the Hierarchy data that you generated through the T-SQL statement ('/1/1/2....'), then you will end up with a table of data that stores the hierarchy info in what amounts to a hex value set. Obviously you would have to include the Cast('/1/2/3/....' as HierarchyID) conversion function into your update statement to make sure that the hierarchy info that forms your path is then correctly stored.

    NOW you can have some fun, because your entire CTE statement can now be simply replaced by the Select field1,field2,field3..., OrganisationLevel.ToString() from YourTable.

    That translates into a single line of code as opposed to your very elegant, but very code intensive CTE.

    Also, to now get to figure out who the manager of any given record is as simple as Select OrganisationLevel.GetAncestor(1) from TABLENAME and getting higher management levels is OrganisationLevel.GetAncestor(2) etc. These values will obviously return the Hex values which is the natural state of the HierarchyID datatype, unless your use the ToString() function as a suffix (OrganisationLevel.GetAncestor(1).ToString())

    Just watch out that all of the HierarchyID functions are ALL CASE SENSITIVE.

    Using the HierarchyID datatype is FAR more efficient than using the CTE, especially since you can create both Depth- and Breadth-based indexes. You can even choose to use the HierarchyID as your primary Key value and create your clustered index on this column, yielding far superior performance than what can be achieved from the CTE and adjacency model.

  • sean-495500 (3/10/2011)


    Using HierarchyID Datatype with this solution

    If added a new column into the table called OrganisationLevel with a HierarchyID datatype, then updated this column with the Hierarchy data that you generated through the T-SQL statement ('/1/1/2....'), then you will end up with a table of data that stores the hierarchy info in what amounts to a hex value set. Obviously you would have to include the Cast('/1/2/3/....' as HierarchyID) conversion function into your update statement to make sure that the hierarchy info that forms your path is then correctly stored.

    NOW you can have some fun, because your entire CTE statement can now be simply replaced by the Select field1,field2,field3..., OrganisationLevel.ToString() from YourTable.

    That translates into a single line of code as opposed to your very elegant, but very code intensive CTE.

    Also, to now get to figure out who the manager of any given record is as simple as Select OrganisationLevel.GetAncestor(1) from TABLENAME and getting higher management levels is OrganisationLevel.GetAncestor(2) etc. These values will obviously return the Hex values which is the natural state of the HierarchyID datatype, unless your use the ToString() function as a suffix (OrganisationLevel.GetAncestor(1).ToString())

    Just watch out that all of the HierarchyID functions are ALL CASE SENSITIVE.

    Using the HierarchyID datatype is FAR more efficient than using the CTE, especially since you can create both Depth- and Breadth-based indexes. You can even choose to use the HierarchyID as your primary Key value and create your clustered index on this column, yielding far superior performance than what can be achieved from the CTE and adjacency model.

    I've done a bunch of testing, what I find is that the hierarchyID datatype is slower to select than nested sets, and slower to update/insert/delete than adjacency. It's okay as a compromise, but I've found that a hybrid hierarchy works better for just about any actual application (nested sets and adjacency in the same table; generate the sets from the adjacency and use that to query, then use adjacency for simple update/insert).

    Have you done performance testing on hierarchyID that contradicts that?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • sean-495500 (3/10/2011)


    Using HierarchyID Datatype with this solution

    If added a new column into the table called OrganisationLevel with a HierarchyID datatype, then updated this column with the Hierarchy data that you generated through the T-SQL statement ('/1/1/2....'), then you will end up with a table of data that stores the hierarchy info in what amounts to a hex value set. Obviously you would have to include the Cast('/1/2/3/....' as HierarchyID) conversion function into your update statement to make sure that the hierarchy info that forms your path is then correctly stored.

    NOW you can have some fun, because your entire CTE statement can now be simply replaced by the Select field1,field2,field3..., OrganisationLevel.ToString() from YourTable.

    That translates into a single line of code as opposed to your very elegant, but very code intensive CTE.

    Also, to now get to figure out who the manager of any given record is as simple as Select OrganisationLevel.GetAncestor(1) from TABLENAME and getting higher management levels is OrganisationLevel.GetAncestor(2) etc. These values will obviously return the Hex values which is the natural state of the HierarchyID datatype, unless your use the ToString() function as a suffix (OrganisationLevel.GetAncestor(1).ToString())

    Just watch out that all of the HierarchyID functions are ALL CASE SENSITIVE.

    Using the HierarchyID datatype is FAR more efficient than using the CTE, especially since you can create both Depth- and Breadth-based indexes. You can even choose to use the HierarchyID as your primary Key value and create your clustered index on this column, yielding far superior performance than what can be achieved from the CTE and adjacency model.

    Absolutely correct... I'll remind everyone, though, that this article is an "SQL Spackle" article and not meant to be a complete solution. I'll also remind folks that the "Nested Set" solution (not covered in this article) has been touted to be head and shoulder's above even the use of the HierarchyID datatype. I'm currently in the process of testing that claim so I can't yet say whether that's right or wrong... only that it's a frequently touted claim. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Heh... Gus beat me to it but he's, apparently, actually done some testing.

    (Note to self... need performance testing code proofs in the article I'm writing.) 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 16 through 30 (of 59 total)

You must be logged in to reply to this topic. Login to reply