hierarchical queries, performance problem.

  • Hi,

    We are trying to create a hierarchical structure of employees in an organisation. 

    We have 2 tables, one called DirectReports which holds UserId, SupervisorId.  This holds information for each person to show who their manager is.

    We have another table called CanSee which holds UserId, CanSeeUserId.  This holds your id against everyone elses id who is lower down the hierarchy then you (therefore you have permission to view their details).  This second table is mainly used to make reporting faster.  I.e. not having to run through the full hierarchy everytime you run a report.

    When a user logs in we want to run a query which goes through the DirectReports table and fills the CanSee table with the relevant information.

    We have been looking at 2 different methods of doing this show in the articles below.  Recursive queries (SQL Server 2005) and using a tmep table as a stack.

    http://www.sqlservercentral.com/columnists/sSampath/recursivequeriesinsqlserver2005.asp

    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/acdata/ac_8_qd_14_5yk3.asp

     

    The problem with both of these is the speed.  If in the first article yif ou are a top level manager loging into a system with 11000 users it takes 25 seconds to run.

     

    With the second article it takes 2 minutes 35 seconds.

     

    Has anyone ever done anything similar or have any ideas how we could speed this up.

     

    I hope i have explained this ok..

     

    Thanks for any help.

     

    Mark.

     

  • If my usual search is going from top (parent) to bottom, then I cluster on the ParentID column, in this case, SupervisorID.  I usually have more than one column as part of the self-join, in which case I will cluster on all of the parent join columns, with the ParentID is the first column of the clustered index.

    That way, when you are searching for the children of a particular item, they are all sitting next to each other.

    I can rip through 7,000,000-row hierarchies on a laptop (winXP, SQL2005) in a few seconds this way.

    -Eddie 

    Eddie Wuerch
    MCM: SQL

  • Agree with the indexing idea - but remember all other indexes use the clustering key for row access, so don't make it too wide. All the usual caveats about storing comma-divided lists apply, of course.

    Here's the code I would use:

    create

    table relTEST(SupervisorId int, UserId int)

    go

    insert

    relTEST

    select

    1,2 union all

    select

    1,3 union all

    select

    2,4 union all

    select

    90,5 union all

    select

    2,6 union all

    select

    2,7 union all

    select

    90,8 union all

    select

    6,9 union all

    select

    90,10 union all

    select

    3,11 union all

    select

    4,12 union all

    select

    4,13 union all

    select

    5,14 union all

    select

    5,15 union all

    select

    6,16 union all

    select

    6,17 union all

    select

    7,18 union all

    select

    7,19 union all

    select

    7,20 union all

    select

    9,21 union all

    select

    8,22 union all

    select

    8,23 union all

    select

    8,24 union all

    select

    10,25 union all

    select

    10,26 union all

    select

    12,27

    go

    create

    function dbo.iterate(@seedid int)

    returns

    @t table(id int, lvl int)

    begin

    declare @rowcount int, @error int, @lvl int
    insert @t(id,lvl) select @seedid, 0
    select @lvl = 1, @rowcount = 1
    while @rowcount > 0
    begin

    insert @t(id,lvl) select r.UserID, @lvl from relTEST r join @t t on t.id = r.SupervisorId where t.lvl = @lvl-1
    select @rowcount = @@rowcount, @error = @@error, @lvl = @lvl + 1

    end
    return
    end
    go

    declare

    @children varchar(8000)

    select

    @children = ''

    select

    @children = @children + ',' + cast(id as varchar(10)) from dbo.iterate(1) where lvl > 0 order by id

    select

    @children = substring(@children,2,8000)

    select

    @children ThisMightBeMoreUsefulKeptInTabularFormat

    go

    drop

    table relTEST

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  • Thanks for both the great responses.  Adding the clustered index has greatly reduced the execution time.

    We are now having a PeopleManagers table which holds who a persons supervisor is.  This is updated everytime someone changes a persons supervisor through the system.

    We are also having a PeopleCanSee table.  This holds all the id's of all the people below you in the hierarchy and is used for reports.

    We are trying to figure out when best to update the PeopleCanSee.  Ideally we would update it everytime the PeopleManagers changes, but to do this we would have to run the sql described in the above posts for every user in the system (above the user whos details have changed in PeopleManagers) and this could be a big hit.

    We were considering running it at login.  So when a user logs in we use the PeopleManagers tbale to update the PeopleCanSee table for that user only.  We only need to update this if people below the person (in the hierarchy) loging in have had their details changed in the PeopleManagers table.  So evertime we update the PeopleManagers table we were going to fill a table with the Id's of the users above that user that need the CanSee table updating.  Then when a user log's in and they have their id in this table, then we update their details in the CanSee table.

    This is just a consideration we are having but if anyone has any tips in this area they are greatly appreciated.

    Again, hope i have explained it ok and thanks for any thoughts on this.  This is the first time we have used a hierarchical structure and there seems to be quit alot of considerations compared to the way we normally do it.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply