Complex Hierarchies in SQL Server

Question

Post reply

Complex Hierarchies in SQL Server

Steve Brett

Mr or Mrs. 500

Points: 589
More actions
June 23, 2009 at 4:23 am

#222946

Hi
We have a problem in which we have 2 large pieces of work that are very similar. I'm looking for some best-practice pointers to find the optimal solution. I've fone quite a bit of research and have my own ideas as to implementation nand solkution but would really appreciate it if anyone has any business experience in this kind of problem and would we willing to offer their advice.
The problem is thus:
We have a dozen or so 'awards' that are awarded to users once they have completed a number of training modules. Each set of training modules is composed of different entities (we already have those) and form logical groups. Within each group a user might have to pass either 4 out of 5, or (1 and 2 and3) or (4 and 5) for example. An award is created when a user has passed all the groups and has met all of the criteria in each group.
The real problem we have is that the groups can be used many times in any number of awards. if a user completes a module they may get the award, but they may have met the criteria for another unspecified award that exists in the system. So we need to back calculate. We have approx 60,000 users, and maybe 50 awards that are composed of 10 or 15 out of 3 or 4 thousand modules.
The advice I need is this:
I naturally leaned towards the idea that the groups for sets and the awards form sets of groups so would go for a set based approach. Other guys in the team have suggested a treee based (left leaf, right leaf) structure on the basis that the searches would be faster.
After googling for soultions they both seem like good ideas but was wondering what you guys though.
I can add more detail if needed, it not reall the schema I need but peoples experience of both tree based and set based searching and admin and the benefits of both. The primary driver is speed.
Thanks
Steve

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

mayank jarsaniya Hall of Fame Points: 3532 More actions · Answer 1

mayank jarsaniya

Hall of Fame

Points: 3532

June 23, 2009 at 9:03 am

#1014019

boring to read your question.

Steve Brett Mr or Mrs. 500 Points: 589 More actions · Answer 2

what ?

The question is too boring to read or you simply don't know the answer ? If you don't know or have nothing to contribute please don't post.

I've used this forum for a few years now and have always found it a valuable source of information. Perhaps you should find a junior sql site you can post your pointless comments on

LutzM SSC Guru Points: 107049 More actions · Answer 3

Hi Steve,

as per the information you provided I can't really see the performance gain of a tree based solution. What would it look like?

If I understood you right then I would go for a table structure for module, group, awards, relation between them including user status per module.

I don't think a tree based solution would be faster than a relational model, since the largest table (users) also is high selective (assuming one row per user). Using proper indexing and query design you shouldn't have any problems. I actually doubt that the size of the tables you're talking about would cause any problems on the database side.

What you also should look into is a rough number of consecutive requests (currently it looks like it can be anything between 1 and 60000...).

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Steve Jones - SSC Editor SSC Guru Points: 736255 More actions · Answer 4

I'm a little confused about what you mean by the awards and then the two approaches. Can you post a bit of sample data and explain how you see the two approaches as being implemented?

Also, can you give an idea of data size? Are these thousands? Millions?

Steve Brett Mr or Mrs. 500 Points: 589 More actions · Answer 5

Thanks for the replies guys - we are in a design phase at the minute and the meeting kind of split into 2 camps - tree based and relational. I'll post some more details when I get into work tomorrow (again as I'm still herre now but have to leave !)

cheers

Steve

Steve Brett Mr or Mrs. 500 Points: 589 More actions · Answer 6

Cheers - this is what I was heading for but need to justify it's benefit over a tree based approach preferred by some of the other developers.

The problem we're going to encounter is that it might not be possible to store the status of the award as when they change the structure of an award then all awards would need to be calculated.

I'll post more details when I have them ...

Cheers

Steve

GSquared SSC Guru Points: 260824 More actions · Answer 7

Here's what I have on hierarchies in SQL[/url].

From what you're describing, I'd go for a table of the modules, and another table of the awards, and a many-to-many of the combinations.

If, for example, Award A consists of Modules 1, 2, 3, and 5, and Award B consists of modules 3, 4, 5, 6, and 7, then someone who has done 3 and 5 is part-way on both of those. Right?

That's best represented as a standard many-to-many, not as a hierarchy.

If you need to find out who has what modules and thus qualifies for which awards, you can do that with a relatively simple query of the join table.

Let's say the join table is like this:

create table dbo.Modules_Awards (

ModuleID int not null references dbo.Modules(ModuleID),

AwardID int not null references dbo.Awards(AwardID),

constraint PK_ModulesAwards primary key (ModuleID, AwardID));

Then if you have a table like this:

create table dbo.Modules_Users (

ModuleID int not null references dbo.Modules(ModuleID),

UserID int not null references dbo.Users(UserID),

constraint PK_ModulesUsers primary key (ModuleID, UserID));

The second table contains data on who has completed what modules. In the real database, it'll probably have more columns, like DateCompleted, maybe ExamScore, that kind of thing, but for the purposes here, it'll do to just have those two.

Now, if you need to find out who has completed what awards, it's like this:

;with AwardModules (AwardID, ModuleCount) as

(select AwardID, count(*)

from dbo.Modules_Awards

group by AwardID)

select UserID, AwardID

from dbo.Modules_Users MU

inner join dbo.Modules_Awards MA

on MU.ModuleID = MA.ModuleID

group by UserID, AwardID

having count(*) =

(select ModuleCount

from AwardModules

where AwardID = MA.AwardID);

(Note, I'm just typing this up as I think of it. I haven't tested it, it's just to show the concept.)

You can easily improve performance on that, most likely, with a little tweaking and such, but the idea should be pretty clear.

This should give you what you seem to need.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

LutzM SSC Guru Points: 107049 More actions · Answer 8

Steve Brett (6/23/2009)
Cheers - this is what I was heading for but need to justify it's benefit over a tree based approach preferred by some of the other developers.
The problem we're going to encounter is that it might not be possible to store the status of the award as when they change the structure of an award then all awards would need to be calculated.

As far as I can see there is no need to store the award status separately - it would violate normalization. The award status should be based on online "calculation" (e.g. using views or functions). If you'd need to know the award status per person from the past, use a separate history table.

Like I said before: based on the number of users and number of modules (both are candidates for high selective indexes) it should work fine (leaving the question of the number of concurrent users out of discussion...).

I'd follow GSquared's way.

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

mayank jarsaniya Hall of Fame Points: 3532 More actions · Answer 9

The question is too boring to read or you simply don't know the answer ?

You are right I dont know answer.actually i cant understand your question.

sorry for this and never mind.

If you don't know or have nothing to contribute please don't post.

--I want to contribute what I know therefore i have tried to read your question.

Ok again sorry

Steve Brett Mr or Mrs. 500 Points: 589 More actions · Answer 10

Thanks for your replies guys - I think we have a workable schema for both problems.

Thanks again

Steve

Carl Federl One Orange Chip Points: 25384 More actions · Answer 11

I naturally leaned towards the idea that the groups for sets and the awards form sets of groups so would go for a set based approach. Other guys in the team have suggested a tree based (left leaf, right leaf) structure on the basis that the searches would be faster.[

The first approach may be the Adjacency List approach and the left/right as a Nested Set. There are detail descriptions of both in Joe Celko's "Tree and Hierarchies in SQL for Smarties". I have used both approaches and found that both have their usages:

The Nested Set approach does have faster read access but much slower write and when the "slots" between a left and right are full, a major re-organization is needed. The SQL to maintain the left/right is also much more complicated and one bug will cause logical data loss. I used the Nested Sets approach for a US Tax Law classification database which had a full replacement once per year (no update logic needed). The db had about 40,000 users, so access speed was very important.

There is another solution to Trees named "Path Enumeration" that you have not mention which was developed by Itzik Ben-Gan and Tom Moreau. It works well and has a good balance between read speed, write speed, and ease of use although an "instead of" trigger is needed to move nodes. The table has an extra varchar(max) column that is the concatenation of the primary keys of all of the parent nodes which are separated by some delimiter. See the book "Advance Transact-SQL for SQL Server 2000"

Regarding your table design, you might have a hierarchy but database design via a posts is very difficult due to delays in answering questions. You could have a hierarchy where training module are composed of other training modules.

Some examples:

There is a training module named "Introduction to TransAct/SQL" which is part of the DBA training program and is also part of the Developer training program .

There is a "Master DBA" training program that is composed of these training programs: "OLTP DBA", "OLAP DBA", "Integration Services", and "Window OS Basic".

Your requirements could be implemented with these tables which uses the Adjacency model for the hierarchy. I have indicated "names" as the primary key columns but either identify or sequential identifier are suggested as the impact of a name change is significant and typo errors are common.

Create table TrainingProgram

(TrainingProgramName Primary Key

TrainingProgramName_Parent Foreign key references TrainingProgram )

Create table TrainingModule

(TrainingModuleName Primary Key)

create table TrainingProgramModule

(TrainingModuleName Foreign key references TrainingProgram

, TrainingProgramName Foreign key references TrainingModule

)

To convert to the Path Enumaration schema from the Adjacency List , use SQL in this pattern:

CREATE TABLE dbo.OrgUnitAdjacency

( OrgUnitIdBIGINTNOT NULL

, OrgUnitId_ParentBIGINTNOT NULL

, OrgUnitNameVARCHAR(255) NOT NULL

, CONSTRAINT OrgUnitAdjacency_U_OrgUnitIdPRIMARY KEY (OrgUnitId)

, CONSTRAINT OrgUnitAdjacency_U_OrgUnitNameUNIQUE ( OrgUnitName )

, CONSTRAINT OrgUnitAdjacency_C_OrgUnitNameCHECK ( OrgUnitName '' )

, CONSTRAINT OrgUnitAdjacency_F_OrgUnitAdjacency_Parent FOREIGN KEY (OrgUnitId_Parent)

REFERENCES OrgUnitAdjacency (OrgUnitId)

)

go

;WITH DirectReports

( OrgUnitId, OrgUnitId_Parent, OrgLevelDepth, OrgUnitName , OrgUnitPathEnum)

AS

(SELECTOrgUnitAdjacency.OrgUnitId

,OrgUnitAdjacency.OrgUnitId_Parent

,CAST ( 0 AS INTEGER ) AS OrgLevelDepth

,OrgUnitAdjacency.OrgUnitName

,CAST ( CAST( OrgUnitAdjacency.OrgUnitId AS VARCHAR(8) ) + '/' AS VARCHAR(4000) ) AS OrgUnitPathEnum

FROMdbo.OrgUnitAdjacency

UNION ALL

SELECTOrgUnitAdjacency.OrgUnitId

,OrgUnitAdjacency.OrgUnitId_Parent

,DirectReports.OrgLevelDepth + 1

,OrgUnitAdjacency.OrgUnitName

,CAST ( DirectReports.OrgUnitPathEnum + CAST(OrgUnitAdjacency.OrgUnitId AS VARCHAR(8) ) + '/' AS VARCHAR(4000) ) AS OrgUnitPathEnum

FROMdbo.OrgUnitAdjacency

JOINDirectReports

ON OrgUnitAdjacency.OrgUnitId_Parent = DirectReports.OrgUnitId

WHEREOrgUnitAdjacency.OrgUnitId OrgUnitAdjacency.OrgUnitId_Parent

)

SELECTOrgUnitId, OrgUnitId_Parent, OrgLevelDepth, OrgUnitName , OrgUnitPathEnum

FROMDirectReports

go[\code]

SQL = Scarcely Qualifies as a Language