September 3, 2009 at 10:49 am
Hey everyone,
My organization uses heirarcies A LOT. As such, we use CTE's A LOT. However, we are getting close to hitting the number of maximum recurssions that a CTE can have (alost 33k in case you don't know).
I actually have two questions
1) Are a cursor/while loop the only work around in 2005 for this issue?
2) It is different in SS08?
3) What about using the new Heirarch ID data types? Does that change anything?
Thanks for the input.
Fraggle
September 3, 2009 at 12:05 pm
1) Nope, not directly.
2) The answer is going to be "it depends", because it depends on your Heiarchies and how your CTE's are setup.
That said, my gut feel is that you can probably do a lot better than Cursors or even your current CTE's because recursing 32,000 times is a pig by any measure and there is usually a way to rewrite these long recursion chains into something much better.
Even if you cannot, I would then strongly encourage to try CLR before reverting to cursors. But really, you should be able to much better than this (unless it just happens to be a CLR friendly function).
3) "It depends", again on your hierarchies and what you are actually trying to do with them in these CTEs.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
September 3, 2009 at 12:31 pm
1) So Sad
2) I am very interested in this. Do you have any reference or example that you would be willing to show as an example?
3) In general, it depends upon the specific query, but sometime I am trying to find everyone directly above me, and sometime I am trying to find everyone directly below me.
Thanks,
Fraggle
September 3, 2009 at 12:52 pm
2) No, the possibilities are too divers. You need to clarify this by showing us what you are really dealing with. For instance: there are *extremely* few legitimate cases of true & valid data hierarchies that extend beyond a few hundred levels. I cannot think of *any* legitimate real-world need for 32,000 levels of hierarchy, so that raises a lot of questions about what is really going on here.
3) Makes no sense. What kind of real-world organizational structure has 32,000 levels? You could not even store the data for the structure unless the branching level were within 4 9's of 1.0, and tehcnically, *that* should be treated as a chain or a list and not a tree.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
September 3, 2009 at 1:11 pm
My company deals with network marketing/direct sales/mulitlevel marking companies on an international level. These are true hierarchies that extend well beyond 100 levels in many cases depending upon the specific genealogy they client has setup.
So in answer to your question, I would venture a guess that most organizations do not. However, this is not a heirarchy inside of a traditional organization, but for an entire international sales network. Assuming you only had 33k people who each sponsored 1 person, then that would be 33,000 level. I would venture another guess that a company like Amway easly has 33,000 levels.
You could not even store the data for the structure unless the branching level were within 4 9's of 1.0, and tehcnically, *that* should be treated as a chain or a list and not a tree.
Not sure what you mean here.
Fraggle
September 3, 2009 at 2:11 pm
Right. We've actually had this discussion before. I think that the whole structure and are wrong for this.
The "branching level" of a tree is the average number of branches at each level (or children per node). The total number of nodes in a tree is approximately equal to (BR^(L)/(BR-1))-1, or even more approxiamtely: BR^(L-1). If your 32,000 level tree contained every human on the planet (7 billion) then the branching level (BR) would be equal to: ~= 7,000,000,000^(1/32000) ~= 1.0007086627324289, which is almost within 4-nines of 1.0.
The upshot is first, that the vast majority of your tree-nodes have only one child node below them, and secondly that a straight tree is a very inefficient data structure for this data.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
September 3, 2009 at 2:15 pm
A *much* more efficient data structure would be a "sibling-precedence" type of tree, wherein chains of single-child node levels are actually stored as siblings with some kind of precedence ordering, with each sibling in the chain pointing the first sibling in the chain as it's parent.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
September 3, 2009 at 5:06 pm
Fraggle (9/3/2009)
... I would venture another guess that a company like Amway easly has 33,000 levels...
Uh, no, that would be incorrect. I sincerely doubt that AmWay has even 200 levels, and certainly nothing like 32,000. It is impossible for any contemporaneous consensual human enterprise to have a genuine hierarchy 32,000 levels deep. And most definitely not one that is self-organized on the basis of recruitment as AmWay is.
Proving this is trivially easy and I leave it as an exercise for the reader.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
September 3, 2009 at 8:01 pm
:ermm: Sigh.We can argue all day. Fact is fact. At a minimum, you only need 33,000 people in one strait line to break the recursion. Yes, that is an unlikely scenario, but it is still a scenario. No, we can't call a strait line a tree anymore than we can call a apple and orange. However, A very common genealogy in the industry limits the maximum number of children in a node to 2. These types of genealogies lead to many levels for people at the top, and while it may take years to reach such a deep level, with 2 million sales reps world wide, I have a difficult time believing that you cannot believe that it has gotten to the point were we need to worry about it.
I will have to do some searching about the sibling-precendce that you stated. That sounds very interesting.
Fraggle
September 3, 2009 at 8:39 pm
The problem with using recurrsion to solve hierarchies using the typical "Adjacency Model" of parent/child relationships is that you have to do it over and over and over and... Comparatively speaking, although recursive CTE's are very convenient to write, they're just as slow as a cursor and sometimes slower.
If you have no cross branching, then the use of Nested Sets will likely blow away most other methods including the use of CLR's. What's a Nested Set? Glad you asked... read the following.
http://www.ibase.ru/devinfo/DBMSTrees/sqltrees.html
As a side bar, I'll usually maintain both an Adjacency Model and a Nested Set model within the same table just in case I ever have to do a full rebuild on the Nested Set model. Haven't had to do a full rebuild, yet, but like Yogi Berra would say... "When you come to the fork in the road, take it." 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
September 3, 2009 at 10:15 pm
Fraggle (9/3/2009)
:ermm: Sigh.We can argue all day. Fact is fact. At a minimum, you only need 33,000 people in one strait line to break the recursion. Yes, that is an unlikely scenario, but it is still a scenario.
No, it's not a scenario for what we've talking about. A line isn't a hierarchy, it''s a sequence. If you insist on modelling a sequence as a recursion or a hierarchy then you get exactly the problem that you have: unreasonable structures, grotesquely inefficient processes and unlikely constraints the don't accurately reflect realistic overhead.
No, we can't call a strait line a tree anymore than we can call a apple and orange. However, A very common genealogy in the industry limits the maximum number of children in a node to 2.
And if you included every living person on the planet (7 billion) in such a hierarchy where only 1/2 to 1/3 of them ever even had only 2 children, it should still be no more than 70-100 levels deep.
These types of genealogies lead to many levels for people at the top, ...
Um nope, do the math. Even assuming that every single one of your sales reps was so hot that they instantly recruited their first subordinate on the very next day, and every single one of these new recruits was so hot that they did exactly the same thing, it would *still* take 32,000 days for your founder to reach a hierarchy 32,000 levels high. So if your founder was a go-getter who started when he was 18 and lived to be 100 years old, and still worked every day of his life, he would still have been dead for 5 years before he reached the 32,000 level.
I do have some familiarity with MLM schemes and IIRC, it's not legal to leave dead people in the commissions chain.
And all of this is based on some additional extreme assumptions, like that this miraculousness chain of single day recruitments started 87 years ago and continued unbroken every day, seven days a week (though it would have been illegal to do this on Sundays 87 years ago) and that although every one recruited their first IPO on their first day of work,they burned out on the second day and less than one in 300 ever recruited that second person that they were legally allowed to. Because if only 1% of them ever did that, they would have accidentally recruited every living person on earth 30 to 60 years ago, and the chain would have been broken because there wouldn't be anyone else left to recruit, so they never would have gotten to 32,000.
and while it may take years to reach such a deep level, with 2 million sales reps world wide, I have a difficult time believing that you cannot believe that it has gotten to the point were we need to worry about it.
Review the math above, I am allowing you 7 billion sales reps, and it's still inconceivable. In fact there's only two ways that this scenario is possible:
1) Virtually everyone in this system has been secretly reinserting themselves into their own commissions chain, I would guess about 300 times each. Of course that would be fraud, so I am not going to assume that, even though my father, who worked in direct marketing for years used to do exactly that all the time using my name and my brothers names, etc., ad infinitum. Or, ...
2) Your data structures are actually modelling sequences as false hierarchies. The solution to this is to revert the sequence or serial part of this structure back into a list (ordered set) which would allow you to recollapse the true hierarchy remaining, down into something manageable. (this is what the sibling-precedence trick would do for you.)
I will have to do some searching about the sibling-precendce that you stated. That sounds very interesting.
I'll be happy to demonstrate it for you if you can give me a example table definition to work on and a sample stored procedure to convert to this method.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
September 4, 2009 at 8:02 am
I wanted to answer your question about HierarchyID a little bit more completely.
HiearchyID appears to have a real upper limit of about 8000 levels (and a practical upper limit of considerably less, depending on the branching level, but since your branching-level is so low, they're virtually the same limit in your case). This for your data model that would notphysically work, and would perform very poorly, even if they could work.
However, they *can* work pretty well if you convert the long recursive sequence chains to ordered siblings, which path-oriented tree implementations (which is what the HierarchyID is) support pretty well.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
September 6, 2009 at 8:06 am
Few weeks ago i finished a project where we were using a lot of hierarchical structures. We decide to use hierarchyId - it was a good joice! At the beginining you need to understand how it works but after this little step HierarchyId rocks 😉
HID has some limitations in tree size area - so i think that you should install SQL2008 and try to build your big tree.
HID is real fast in many queris: getparent, getparent nlevel higher, get decendant, get all subnodes two lvls below and so on - I really like it! But when you are trying to change a root for a pert of the tree - it can be slower because you have to update hid values for all moved nodes.
September 7, 2009 at 3:10 pm
to the OP: if you DO test your system out on SQL 2008 with HierarchyIDs, please post back (or even better write an article) about the experience, performance, etc. help us all learn from your trail blazing at the extreme edges of performance!!
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
September 7, 2009 at 7:06 pm
I'd love to see a performance test between HID's and the Nested Set Model for both up and downline queries as well as the addition and deletion of mid hierarchy nodes.
Guess I'm going to have to load 2k8 Dev and start practicing this stuff.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 24 total)
You must be logged in to reply to this topic. Login to reply