Nested Set Optimization Fun and Sunshine

  • Hello everyone.

    I have attached a zip file that contains the script that I am running, data to populate the tables (with the script using that data), the execution plan of the entire process on my system, and an execution plan of the first 2 levels.

    The issue is that the entire script takes roughly 30 minutes to execute.

    If you review the execution plan of the first two levels, you will notice that there in an Index Spool that for some reason is spitting out roughly 2.6billion records even though it receives a total of 103k records. I might be completely wrong here, but I believe this is where the issue is.

    Can any explain what is happening and what my options are for correcting it?

    Note that all the process is trying to do is find the left and right extent of individual on each level.

    Thanks,

    Fraggle

  • Hate to say it, but this is the Internet, and I don't open files that could contain auto-executable code from unknown sources.

    Can you either post the code in the forum, or as a .txt or other file that doesn't support executable code?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (12/28/2011)


    Hate to say it, but this is the Internet, and I don't open files that could contain auto-executable code from unknown sources.

    Can you either post the code in the forum, or as a .txt or other file that doesn't support executable code?

    Odd that a .sql file is not eligible for attachment. Either way, here are the 4 files.

  • Are you in a position whereby you can "fall back and punt" on this?

    This is definitely a case where a nested sets hierarchy will save the day performance-wise.

    Here's some data on nested sets: http://en.wikipedia.org/wiki/Nested_set_model

    What you're running into is that your loop is killing your performance, because it has to crawl the hierarchy over and over and over again. You're getting a huge multiplier on your logical reads (you're seeing that in the spool having such a huge rowcount).

    I'm not at all surprised it's a long-running query. Adjacency hierarchies usually are.

    Nested sets will, done correctly, take the runtime down to milliseconds on this small a table.

    (If you were using SQL 2008, I'd recommend also testing the HierarchyID datatype, but the forum you posted in indicates SQL 2005. So I'm assuming you don't have access to that datatype.)

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared,

    When you say "fall back and punt this" I assume you are referring to a refactoring of the code. Then yes.

    However, here is the deal. We currently work with nested sets. The code that maintains the Left and Right Extents (IndexStart and IndexStop in the code) currently uses a cursor to maintain those points for insertions, movements, and well as when we completely "rebuild" all of the extents for the tree. This was suppose to be a way to replace that cursor. Now that might sound odd considering that there is still a cursor, but originally we were doing for each individual, were as now we are doing it for each level.

    If you have a way that we can rebuild the entire genealogy regardless of width/depth in one set based operation without going through the levels, I would love to be able to see it. I have tried, searched, and tried some more. I can provide the current Left and Right extents. However, by a rule, the entire tree has to be rebuilt each time "this" specific code is run. This specific code is run every 4 hours.

    Additionally, if you have a better option, I am open to them.

    Thanks,

    Fraggle

  • That makes sense.

    The way I do that is generate a "breadcrumb path" for each node, using a simple recursive CTE. Pad the parent IDs with leading 0s, and either put a delimiter in or not. Delimiters make it more human-readable, but aren't needed.

    Then use Row_Number() to assign a value to all the bottom-level nodes, ordered by their path. Work your way up the levels from there.

    It's not a single-pass solution. Those can be done with some moderately complex math, but I haven't found that needful.

    The breadcrumb method is usually plenty fast. I've handled multi-million nodes, n-depth hierarchies that way, in a couple of seconds on low-end hardware.

    The whole thing is even easier is SQL 2008, but that's beside the point here.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared,

    Not familiar with the "Bread Crumb" method. Any references on that one?

    Thanks,

    Fraggle

  • GSquared (12/28/2011)


    ...

    Here's some data on nested sets: http://en.wikipedia.org/wiki/Nested_set_model

    Nested Set Model is introduced By Joe Celko??? :w00t: It’s news to me.

    Sorry for being blunt, I don’t like him for his insulting posts in SSC (hope he is reading it). But it looks like he is freaking genius. :hehe:

  • Might look like:

    ;with HierarchyCTE as

    (select ID, NULL as ParentID, '000000' as Breadcrumb

    from dbo.MyHierarchyTable

    where ParentID is null

    union all

    select MyHierarchyTable.ID, MyHierarchyTable.ParentID, Breadcrumb + Right('000000' + cast(MyHierarchyTable.ID as varchar(100)), 6)

    from dbo.MyHierarchyTable

    inner join HierarchyCTE

    on MyHierarchyTable.ParentID = HierarchyCTE.ID)

    select *

    from HierarchyCTE

    order by Breadcrumb ;

    Join that back to the hierarchy table, with a Where clause that grabs IDs that don't have a referencing ParentID (bottom level nodes, in other words), use Row_Number() over (Order By Breadcrumb), and you have a set of "NodeIDs" that can be used for your nested sets hierarchy.

    Then iterate through the hierarchy table, joining it to itself, on ID = ParentID, where the upper node has a null NodeID value and the lower doesn't. That gets your second-to-bottom level, and simple Min() and Max() get you the Nested Sets range. Each iteration crawls one level up.

    In a very complex, deep, multi-path hierarchy, with a lot of rows (nodes) in the table, the time-consuming part is usually the breadcrumb generation. That can take a couple of seconds on big tables. Usually won't, but it can.

    Iterating up the levels is usually really, really fast, if you have the right indexes on the table.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Ok, so I understand the breadcrumbs now. However, I am not seeing how they will eventually get me the Left/Right node for the nested set? Or are you suggesting that we just use the breadcrumbs rather than the nested set?

    Thanks,

    Fraggle.

  • On a different tangent, as C#/VB is pretty darn fast for loops and such, any thoughts on potential performance gain/loss by doing the stuff in a CLR? By this I mean getting the left and right extents.

    Thanks,

    Fraggle

  • Fraggle-805517 (12/29/2011)


    On a different tangent, as C#/VB is pretty darn fast for loops and such, any thoughts on potential performance gain/loss by doing the stuff in a CLR? By this I mean getting the left and right extents.

    Thanks,

    Fraggle

    I've thought about it, but haven't tried it. Honestly, my .NET skills probably aren't up to the task to really give it an honest test. It would be like racing a Yugo (my .NET skills) against a Ferrari (T-SQL skills), and that's not fair to .NET, since it's not a Yugo.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Well I am going to give it a try. I have about the same .NET skills as you do, but I work with some smart people who can help make sure I am not doing anything super stupid and guide me. I'll let you know what comes of it.

    Fraggle

  • Just an update on this for everyone. Worked with a couple of the .NET guys in the office. Was able to take the entire process and move it into a CLR with a little effort. While we are still validating the results, we are seeing MASSIVE performance improvements to the process.

    We have taken the process of finding the Left and Right Extent (IndexStart and IndexStop) now takes less than 10 seconds. Assuming everything works as it should (need to finish unit testing and load testing), then I think we can see it works faster than the SQL Option.

    Once completed, I will post the solution.

    Thanks,

    Fraggle

  • Cool.

    Is it anything you can post here?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply