November 17, 2010 at 9:22 am
I recently attended a class discussing various hierarchy implementations which was fascinating and well written. I wrote a hierarchy implementation a couple years ago that was crafted for our specific needs, and the presentation made me think about what I had done. In reviewing my work, I created a list of advantages/disadvantages and how they map to our environment. Do you see anything I've missed and are my conclusions solid?
In our system, query time is highest importance and the data is pretty stable with infrequent changes. Queries against the hierarchy average 600-1000 per minute and are typically only interested in one point in the tree at a time and the records down a few levels from that point (we don’t often expand all the way to the leaf nodes). Queries going back up the tree are infrequent and typically only go one or two levels. There are only a handful of changes per hour and inserts outnumber updates and deletes by about 1000 to 1.
I picked the Materialized Path model, since I was able to get good speed out of it by indexing the hierarchy column and using LIKE statements. To give it just a little extra umph, I added a “level” column to both the table and index to aid queries that were going to a specific depth in the tree. The speed was reasonable and performed much better than the model that was then in use.
The Nested Set model would have given us faster queries, but I was concerned about the tree modifications that would happen with every insert (blocking reads), or the logic necessary to maintain and use empty “fillfactor” space in the numbering so that large tree updates happen less often.
Does it seem like we’ve made a good choice for the structure? Are there other factors or options I might not have considered? I originally had a lot more detail in this post, but it was long enough that I didn’t think anyone would read it so I cut out the fluff. 😛
Thanks,
Chad
November 19, 2010 at 8:44 am
I covered hierarchies of a couple of types in http://www.sqlservercentral.com/articles/T-SQL/65540/.
Personally, with what you've described, I'd probably go with nested sets. The only thing they don't do well is frequent updates, and you say the data is relatively static.
Post more details of the problems you see with nested sets, and they can be addressed. There are tricks that can make them easier to deal with.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
November 19, 2010 at 11:06 am
It is relatively static, maybe one modification per minute or so. But the number of queries is so heavy that I was worried about what would happen when those updates did occur. Since it would typically have to update a significant portion of the tree, I could imagine locking being an issue. To be honest, I didn't test an implementation of nested sets in our environment since it seemed like the updates would be too prohibitive. In retrospect that was maybe too quick of a judgment and I think I should probably set that up and run some benchmarks. I'll take that as an assignment.
I like your modification to include the top-level parent and lazy-update the range values in a batch, but since our queries start from any point in the tree (usually not at a root node), I'm not sure that I would implement that particular optimization. I was really impressed though and it got me thinking about some other options. Nice article!
Thanks,
Chad
November 19, 2010 at 11:13 am
Would snapshot isolation handle your locking/concurrency issues? That's pretty much what it's designed for.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
November 19, 2010 at 1:09 pm
That is an excellent idea. I'm not completely familiar with the pros and cons so I have some reading to do, but I like what I've seen just glancing through it.
Thanks,
Chad
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply