hierarchical queries, performance problem.

Question

hierarchical queries, performance problem.

Mark-242008

Grasshopper

Points: 17
More actions
July 10, 2006 at 9:04 am

#118773

Hi,
We are trying to create a hierarchical structure of employees in an organisation.
We have 2 tables, one called DirectReports which holds UserId, SupervisorId. This holds information for each person to show who their manager is.
We have another table called CanSee which holds UserId, CanSeeUserId. This holds your id against everyone elses id who is lower down the hierarchy then you (therefore you have permission to view their details). This second table is mainly used to make reporting faster. I.e. not having to run through the full hierarchy everytime you run a report.
When a user logs in we want to run a query which goes through the DirectReports table and fills the CanSee table with the relevant information.
We have been looking at 2 different methods of doing this show in the articles below. Recursive queries (SQL Server 2005) and using a tmep table as a stack.
http://www.sqlservercentral.com/columnists/sSampath/recursivequeriesinsqlserver2005.asp
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/acdata/ac_8_qd_14_5yk3.asp

The problem with both of these is the speed. If in the first article yif ou are a top level manager loging into a system with 11000 users it takes 25 seconds to run.

With the second article it takes 2 minutes 35 seconds.

Has anyone ever done anything similar or have any ideas how we could speed this up.

I hope i have explained this ok..

Thanks for any help.

Mark.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

Eddie Wuerch SSChampion Points: 12535 More actions · Answer 1

If my usual search is going from top (parent) to bottom, then I cluster on the ParentID column, in this case, SupervisorID. I usually have more than one column as part of the self-join, in which case I will cluster on all of the parent join columns, with the ParentID is the first column of the clustered index.

That way, when you are searching for the children of a particular item, they are all sitting next to each other.

I can rip through 7,000,000-row hierarchies on a laptop (winXP, SQL2005) in a few seconds this way.

-Eddie

Eddie Wuerch
MCM: SQL

stax68 SSChampion Points: 11711 More actions · Answer 2

Agree with the indexing idea - but remember all other indexes use the clustering key for row access, so don't make it too wide. All the usual caveats about storing comma-divided lists apply, of course.

Here's the code I would use:

create

table relTEST(SupervisorId int, UserId int)

go

insert

relTEST

select

1,2 union all

select

1,3 union all

select

2,4 union all

select

90,5 union all

select

2,6 union all

select

2,7 union all

select

90,8 union all

select

6,9 union all

select

90,10 union all

select

3,11 union all

select

4,12 union all

select

4,13 union all

select

5,14 union all

select

5,15 union all

select

6,16 union all

select

6,17 union all

select

7,18 union all

select

7,19 union all

select

7,20 union all

select

9,21 union all

select

8,22 union all

select

8,23 union all

select

8,24 union all

select

10,25 union all

select

10,26 union all

select

12,27

go

create

function dbo.iterate(@seedid int)

returns

@t table(id int, lvl int)

begin

declare @rowcount int, @error int, @lvl int
insert @t(id,lvl) select @seedid, 0
select @lvl = 1, @rowcount = 1
while @rowcount > 0
begin

insert @t(id,lvl) select r.UserID, @lvl from relTEST r join @t t on t.id = r.SupervisorId where t.lvl = @lvl-1
select @rowcount = @@rowcount, @error = @@error, @lvl = @lvl + 1

end

return

end

go

declare

@children varchar(8000)

select

@children = ''

select

@children = @children + ',' + cast(id as varchar(10)) from dbo.iterate(1) where lvl > 0 order by id

select

@children = substring(@children,2,8000)

select

@children ThisMightBeMoreUsefulKeptInTabularFormat

go

drop

table relTEST

Tim Wilkinson

"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant

Mark-242008 Grasshopper Points: 17 More actions · Answer 3

Thanks for both the great responses. Adding the clustered index has greatly reduced the execution time.

We are now having a PeopleManagers table which holds who a persons supervisor is. This is updated everytime someone changes a persons supervisor through the system.

We are also having a PeopleCanSee table. This holds all the id's of all the people below you in the hierarchy and is used for reports.

We are trying to figure out when best to update the PeopleCanSee. Ideally we would update it everytime the PeopleManagers changes, but to do this we would have to run the sql described in the above posts for every user in the system (above the user whos details have changed in PeopleManagers) and this could be a big hit.

We were considering running it at login. So when a user logs in we use the PeopleManagers tbale to update the PeopleCanSee table for that user only. We only need to update this if people below the person (in the hierarchy) loging in have had their details changed in the PeopleManagers table. So evertime we update the PeopleManagers table we were going to fill a table with the Id's of the users above that user that need the CanSee table updating. Then when a user log's in and they have their id in this table, then we update their details in the CanSee table.

This is just a consideration we are having but if anyone has any tips in this area they are greatly appreciated.

Again, hope i have explained it ok and thanks for any thoughts on this. This is the first time we have used a hierarchical structure and there seems to be quit alot of considerations compared to the way we normally do it.