Nested Set Optimization Fun and Sunshine

  • Yes, I will be posting it once I have finished the unit testing and load testing.

    Thanks,

    Fraggle

  • CELKO (1/4/2012)


    I have the same fear of downloads as everyone else. But in the nested sets model, the tree structure is in one table and the nodes are in another. Yes, I know that this is basic data modeling by a lot of Noobs never read Dr. Chen or anything about E-R and cram both into one table.

    The Tree table is two INTEGERs for the (lft, rgt) coordinates and a FOREIGN KEY reference to a node. That means you have VERY short rows; a lot of them fit into a data page.

    Now use a step size of, say, ten with a high percentage of free space and updating is not a problem with large insertions.

    The real hard part is the constraints that prevent cycles and overlapping (lft, rgt) pairs.

    Celko, you are correct. That is basic and in our main tables, this is how it is setup. We have a primary table for the adjacency list and then a separate table for the nodes. However, since we deal with a large number of inserts and tree movements, we don't use an INT. Rather we use a DECIMAL where in we can place people between two whole numbers when inserted/moved. As such, the table is a little wider. Additionally, there is the always fun audit information which makes it a bit wider.

    The issue here is that the left and right nodes MUST be rebuilt each time this specific process is run. Our current logic for maintain those points is inside of a loop, which works, but we all dislike loops here in the database world by default. So I looked for a set based method. The set based method provided (which in the 2nd post is the unzipped files), is actually slower than the loop. So I was asking what was going on as for some reason we are pulling 6billion records at one point of the execution plan.

    Fraggle

  • CELKO (1/4/2012)


    Fraggle-805517 (12/29/2011)


    Ok, so I understand the breadcrumbs now. However, I am not seeing how they will eventually get me the Left/Right node for the nested set? .

    I prefer to use a procedural language and use the classic pushdown stack. in a loop. NOT recursion! Most adjacency list tables have no integrity constraints so they have unexpected garbage like orphans and cycles. It is nice to stop instead of getting endless recursion.

    Not familiar with the pushdown stack that you refer to. Can you elaborate? In our case, we do have integrity constraints, but not ones that prevent infinite loops.

    However, the current way the CLR is programmed, even creating the infinite loop as a test case, we are able to not only ensure that we don't get stuck in an infinite loop, but we also are able to pull out the nodes that have created the infinate loop. It does add extra overhead to find the infinate loop, but since the left/right extents won't rebuild with the infinate loop it is a better option anyway. The CLR is still less than 30 seconds on 600k tree vs 12 minutes on a loop or 35 minutes on the set based.

    If you have options that will get it down below that, I am open for those suggestions and improvements. Additionally, if you can show me why a CLR is not the better option (memory explosions, cpu death, etc), then I am also open to that discussion.

    Thanks,

    Fraggle

  • Well since I am still waiting on QA and didn't want anyone to wait much longer (ok, so before I forgot), I figured I would post what I have.

    If anyone finds a faster way or something that is less memory intensive, then please let me know it.

    Please note that it is a C# file, and not VB.

    Thanks,

    Fraggle

Viewing 4 posts - 16 through 18 (of 18 total)

You must be logged in to reply to this topic. Login to reply