calculate hierarchical values

  • Hi all,

    I would like to ask for help regarding my problem ragarding aggregating hierarchical data based on their heirarchical groupings. I have been doing this for the past 3 days and hasn't really been getting anything right. You guys might have solved any related problem. I'll be very tahnakful.

    Here's the sample dataset:

    HIER_LEVELTEAMIDSUCCESSORPREDESSORDURATION
    1BA - Chris Bowe13521 Reporting - Cynthia Reece/48131PATH 1 & 2
    2Reporting - Cynthia Reece4813BA - Chris Bowe/13521FSCM Conversion - John Fangman/19311PATH 2
    2Reporting - Cynthia Reece4813BA - Chris Bowe/13521Interconnectivity Testing - Felice Schulz/45941PATH 1
    3FSCM Conversion - John Fangman1931Reporting - Cynthia Reece/4813Change Management - Pamela Hammond/6431PATH 2
    4Change Management - Pamela Hammond643FSCM Conversion - John Fangman/1931Data Cleansing - Michael Godette/14750PATH 2
    5Data Cleansing - Michael Godette147Change Management - Pamela Hammond/643 1PATH 2
    3Interconnectivity Testing - Felice Schulz4594Reporting - Cynthia Reece/4813FSCM Conversion - John Fangman/194167PATH 1
    4FSCM Conversion - John Fangman1941Interconnectivity Testing - Felice Schulz/4594Change Management - Pamela Hammond/6431PATH 1
    5Change Management - Pamela Hammond643FSCM Conversion - John Fangman/1941Data Cleansing - Michael Godette/14750PATH 1
    6Data Cleansing - Michael Godette147Change Management - Pamela Hammond/643 1PATH 1

    based from this dataset, i would like to calculate the duration for each path. The calculation below should be as follows but i need this to be done dynamically and to pinpoint the rows which compromise the greatest total durations.

    PATH 1DURATIONPATH 2DURATION
    135211135211
    13521 - 4813213521 - 48132
    4813 - 4594684813 - 19312
    4594 - 1941681931 - 64351
    1941 - 643 51643 - 14751
    643 - 14751TOTAL DURATION107
    TOTAL DURATION241

    You have to pardon my ignorance because Im just new to this. Hope to get any help from you guys. Thank you very much.

  • Hi

    have'nt understood ur requirements clearly. can u give a better clear picture regarding this.

    just a point though - For hierarchial data i use CTE's (Common Tabe Expressions).

    check them out in BOL.

    "Keep Trying"

  • Please provide DDL and sample data as described at http://www.aspfaq.com/etiquette.asp?id=5006

    SQL = Scarcely Qualifies as a Language

  • This is going to be based on a lot of assumptions, but I'm using your example as matrix.

    It seems to work, though it's 'old style', haven't messed with CTE's that much yet

    (which I belive may also be a way to solve the problem)

    Anyways, I'll assume that the sample shows what your data looks like?

    If it is, unfortunately it's in a terrible shape. Most of the code is going to be stringmanipulation stuff in order to rip out the important stuff, which seems to be the pathnumbers and id's, so it's not going to look very pretty.

    I'm also going to assume that the patterns in the 'text/id' strings are reliable..

    ie - the id is always after the '/' in every successor/predessor description etc...

    Not all is bad, though, the good news is that you already have the things for a materialized path in there.

    There is the level, Id, parentId and childId, wo we just need to connect the dots.

    (and ofc to 'clean up' getting rid of all junktext around the numbers)

    The code is kept a bit verbose, so it may be easier to follow what's happening along the road.

    At the end we're going to produce this resultset out of the sampledata:

    path hier_level  id          predessorid duration    p                      

    ---- ----------- ----------- ----------- ----------- -----------------------

    1    1           13521       4813        1           13521 - 13521

    1    2           4813        4594        2           13521 - 4813

    1    3           4594        1941        68          4813 - 4594

    1    4           1941        643         68          4594 - 1941

    1    5           643         147         51          1941 - 643

    1    6           147         NULL        51          643 - 147

    2    1           13521       4813        1           13521 - 13521

    2    2           4813        1931        2           13521 - 4813

    2    3           1931        643         2           4813 - 1931

    2    4           643         147         51          1931 - 643

    2    5           147         NULL        51          643 - 147

    (11 row(s) affected)

    Here's your exampledata we use:

    /*

    create table #x

    (  hier_level int not null, team varchar(90) not null, id int not null, successor varchar(90) null, predessor varchar(90) null,

     duration int not null, path varchar(10) not null )

    insert #x

    select  1, 'BA - Chris Bowe', 13521, null, 'Reporting - Cynthia Reece/4813', 1, 'PATH 1 & 2' union all

    select  2, 'Reporting - Cynthia Reece', 4813, 'BA - Chris Bowe/13521', 'FSCM Conversion - John Fangman/1931', 1, 'PATH 2' union all

    select  2, 'Reporting - Cynthia Reece', 4813, 'BA - Chris Bowe/13521', 'Interconnectivity Testing - Felice Schulz/4594', 1, 'PATH 1' union all

    select  3, 'FSCM Conversion - John Fangman', 1931, 'Reporting - Cynthia Reece/4813', 'Change Management - Pamela Hammond/643', 1, 'PATH 2' union all

    select  4, 'Change Management - Pamela Hammond', 643, 'FSCM Conversion - John Fangman/1931', 'Data Cleansing - Michael Godette/147', 50, 'PATH 2' union all

    select  5, 'Data Cleansing - Michael Godette', 147, 'Change Management - Pamela Hammond/643', null, 1, 'PATH 2' union all

    select  3, 'Interconnectivity Testing - Felice Schulz', 4594, 'Reporting - Cynthia Reece/4813', 'FSCM Conversion - John Fangman/1941', 67, 'PATH 1' union all

    select  4, 'FSCM Conversion - John Fangman', 1941, 'Interconnectivity Testing - Felice Schulz/4594', 'Change Management - Pamela Hammond/643', 1, 'PATH 1' union all

    select  5, 'Change Management - Pamela Hammond', 643, 'FSCM Conversion - John Fangman/1941', 'Data Cleansing - Michael Godette/147', 50, 'PATH 1' union all

    select  6, 'Data Cleansing - Michael Godette', 147, 'Change Management - Pamela Hammond/643', null,  1, 'PATH 1'

    */

    Now, our first mission is to fish out the id's for successors and predessors from the text..

    select hier_level, 

           id, 

           coalesce(substring(successor, charindex('/', successor) +1, 10), id) as successorId,

           substring(predessor, charindex('/', predessor) +1, 10) as predessorId,

           left(replace(replace(replace(path, 'path', ''), '&', ''), ' ', ''), 1) as path,

           duration

    from #x

    hier_level  id          successorId predessorId path duration   

    ----------- ----------- ----------- ----------- ---- -----------

    1           13521       13521       4813        1    1

    2           4813        13521       1931        2    1

    2           4813        13521       4594        1    1

    3           1931        4813        643         2    1

    4           643         1931        147         2    50

    5           147         643         NULL        2    1

    3           4594        4813        1941        1    67

    4           1941        4594        643         1    1

    5           643         1941        147         1    50

    6           147         643         NULL        1    1

    (10 row(s) affected)

    ...looks ok, almost.

    There's a problem with the first row (root node) in that the row should belong to both path 1 and 2, so we need an 'extra' row to reflect that path also. We can do that by unioning the above with an almost identical statement. The only difference is we take 'path' from the right instead of left as above...

    select hier_level, 

           id,

           coalesce(substring(successor, charindex('/', successor) +1, 10), id) as successorId,

           substring(predessor, charindex('/', predessor) +1, 10) as predessorId,

           left(replace(replace(replace(path, 'path', ''), '&', ''), ' ', ''), 1) as path,

           duration

    from #x

    union

    select hier_level, 

           id, 

           coalesce(substring(successor, charindex('/', successor) +1, 10), id) as successorId,

           substring(predessor, charindex('/', predessor) +1, 10) as predessorId,

           right(replace(replace(replace(path, 'path', ''), '&', ''), ' ', ''), 1) as rpath,

           duration

    from #x

    hier_level  id          successorId predessorId path duration   

    ----------- ----------- ----------- ----------- ---- -----------

    1           13521       13521       4813        1    1

    1           13521       13521       4813        2    1

    2           4813        13521       1931        2    1

    2           4813        13521       4594        1    1

    3           1931        4813        643         2    1

    3           4594        4813        1941        1    67

    4           643         1931        147         2    50

    4           1941        4594        643         1    1

    5           147         643         NULL        2    1

    5           643         1941        147         1    50

    6           147         643         NULL        1    1

    (11 row(s) affected)

    ..there.. this is what we can work with (and perhaps how your data should look like from the beginning.

    What we need to do now is a selfjoin against this 'table' and connect the parent with it's immediate child in order to calculate the duration, which is (I assume )

    'duration + duration of the row one level higher within the same path' - except for level 1, which has the whole duration on it's own. (that's why the case is in the duration sum below)

    If we pretend that we actually had this as the table to start with, then the whole thing would look like this:

    select x.hier_level,

           x.path,

           x.id,

           x.predessorid, 

           x.duration + case when x.hier_level = 1 then 0 else y.duration end as duration,

           cast(y.id as varchar(10)) + ' - ' + cast(x.id as varchar(10)) as p

    from   #z x

    join   #z y

    on     coalesce(x.successorid, x.id) = y.id

    and    x.path = y.path

    order by x.path, x.hier_level

    hier_level  path id          predessorid duration    p                      

    ----------- ---- ----------- ----------- ----------- -----------------------

    1           1    13521       4813        1           13521 - 13521

    2           1    4813        4594        2           13521 - 4813

    3           1    4594        1941        68          4813 - 4594

    4           1    1941        643         68          4594 - 1941

    5           1    643         147         51          1941 - 643

    6           1    147         NULL        51          643 - 147

    1           2    13521       4813        1           13521 - 13521

    2           2    4813        1931        2           13521 - 4813

    3           2    1931        643         2           4813 - 1931

    4           2    643         147         51          1931 - 643

    5           2    147         NULL        51          643 - 147

    (11 row(s) affected)

    Doesn't look too complicated?

    From here you can do whatever you want with regard to highest duration, sum it by path or whatever...

    If, you on the other hand, for some reason, are unable to work with your data in a 'clean' way, then this is how it looks to do it all on the fly - sensitive persons may need to look another way

    select x.hier_level,

           x.path,

           x.id,

           x.predessorid, 

           x.duration + case when x.hier_level = 1 then 0 else y.duration end as duration,

           cast(y.id as varchar(10)) + ' - ' + cast(x.id as varchar(10)) as p

    from

          (

           select hier_level,

                  id,

                  coalesce(substring(successor, charindex('/', successor) +1, 10), id) as successorId,

                  substring(predessor, charindex('/', predessor) +1, 10) as predessorId,

                  left(replace(replace(replace(path, 'path', ''), '&', ''), ' ', ''), 1) as path,

                  duration

            from  #x

        union

            select hier_level,

                   id,

                   coalesce(substring(successor, charindex('/', successor) +1, 10), id) as successorId,

                   substring(predessor, charindex('/', predessor) +1, 10) as predessorId,

                   right(replace(replace(replace(path, 'path', ''), '&', ''), ' ', ''), 1) as rpath,

                   duration

             from  #x

          ) x

    join

         (

           select hier_level,

                  id,

                  coalesce(substring(successor, charindex('/', successor) +1, 10), id) as successorId,

                  substring(predessor, charindex('/', predessor) +1, 10) as predessorId,

                  left(replace(replace(replace(path, 'path', ''), '&', ''), ' ', ''), 1) as path,

                  duration

            from  #x

        union 

            select hier_level,

                   id,

                   coalesce(substring(successor, charindex('/', successor) +1, 10), id) as successorId,

                   substring(predessor, charindex('/', predessor) +1, 10) as predessorId,

                   right(replace(replace(replace(path, 'path', ''), '&', ''), ' ', ''), 1) as rpath,

                   duration

             from  #x

          ) y

    on       coalesce(x.successorid, x.id) = y.id

    and      x.path = y.path

    order by x.path, x.hier_level

    Hope it helps you out.

    =;o)

    /Kenneth

     

  • i will substitute your use of the word hierachy and use order by. to me your problem seems to be a network path problem akin to project management. you are trying to find where along the network path you are most likely going to face problems i.e. the critical node.

    you will unfortunately have to run the script for each Path, but I will try to think of a more elegant solution

    ---------------------

    --declare a table to store the temp values

    DECLARE

    @values TABLE(

    Path1

    VARCHAR(15),

    Duration

    bigint

    )

    DECLARE

    @HIER_LEVEL int, @ID VARCHAR(10), DURATION BIGINT

    DECLARE

    @prev_duration BIGINT, @prev_id VARCHAR(10),@sumit bigint

    DECLARE

    node_cursor

    SELECT

    HIER_LEVEL, ID, DURATION

    FROM

    SomeTable

    WHERE

    PATHid = 1

    ORDER

    BY [HIER_LEVEL] ASC

    OPEN

    node_cursor

    SET

    @prev_id = '', @prev_duration = 0

    FETCH

    NEXT FROM node_cursor

    INTO

    @HIER_LEVEL, @ID, @DURATION

    WHILE

    @@FETCH_STATUS = 0

    BEGIN

    INSERT INTO @values

    (Pathrange,Duration)

    SELECT @ID + @prev_id, (@DURATION + @prev_duration)

    SET @prev_id = @ID, @prev_duration = @DURATION, @sumit = @prev_duration + @DURATION

    FETCH NEXT FROM node_cursor

    INTO @HIER_LEVEL, @ID, @DURATION

    END

    CLOSE

    node_cursor

    DEALLOCATE

    node_cursor

    INSERT

    INTO @values

    (

    Pathrange,Duration)

    SELECT

    'TOTAL DURATION', @sumit

    SELECT

    * FROM @values


    Everything you can imagine is real.

  • What a nice question. I hope the following will help you.

    I have taken a slightly different approach.

    As I found your non-normalized dataset confusing, I normalized it as follows:

    CREATE TABLE #Teams

    (

        TeamID int NOT NULL

        ,Team varchar(50) NOT NULL

    )

    GO

    INSERT INTO #Teams

    SELECT 147, 'Data Cleansing - Michael Godette' UNION ALL

    SELECT 643, 'Change Management - Pamela Hammond' UNION ALL

    SELECT 1931, 'FSCM Conversion - John Fangman' UNION ALL

    SELECT 4594, 'Interconnectivity Testing - Felice Schulz' UNION ALL

    SELECT 4813, 'Reporting - Cynthia Reece' UNION ALL

    SELECT 13521, 'BA - Chris Bowe'

    GO

    CREATE TABLE #Paths

    (

        TeamID int NOT NULL

        ,NextTeamID int NULL

        ,Duration int NOT NULL

    )

    GO

    INSERT INTO #Paths

    SELECT 13521, 4813, 1 UNION ALL

    SELECT 4813, 1931, 1 UNION ALL

    SELECT 4813, 4594, 1 UNION ALL

    SELECT 4594, 1931, 67 UNION ALL

    SELECT 1931, 643, 1 UNION ALL

    SELECT 643, 147, 50 UNION ALL

    SELECT 147, NULL, 1

    GO

    Now to get the result. This effectively requires two recursions, one to get the maximum total duration

    and one to get the path of the maximum total duration. (I have assumed that only one path can have the

    maximum duration in order to simplify the code.)

    -- 1. Get the path totals

    CREATE TABLE #PathTotals

    (

        TeamID int NOT NULL

        ,NextTeamID int NULL

        ,PrevTeamID int NULL

        ,Stage int NOT NULL

        ,StageDuration int NOT NULL

        ,Total int NOT NULL

    )

    ;WITH GetTotals (TeamID, NextTeamID, PrevTeamID, Duration, Stage, StageDuration, Total)

    AS

    (

        SELECT TeamID, NextTeamID, NULL, Duration, 1, Duration, Duration

        FROM #Paths

        WHERE TeamID = 13521

        UNION ALL

        SELECT P.TeamID, P.NextTeamID, T.TeamID, P.Duration

            ,Stage + 1

            ,T.Duration + P.Duration

            ,T.Total + T.Duration + P.Duration

        FROM GetTotals T

            JOIN #Paths P

                ON T.NextTeamID = P.TeamID

    )

    INSERT INTO #PathTotals

    SELECT TeamID, NextTeamID, PrevTeamID, Stage, StageDuration, Total

    FROM GetTotals

    -- Now show the longest path

    -- More complicated if have to allow for duplicates

    -- Works with sample data but will need testing with real data

    ;WITH LongestPath (TeamID, PrevTeamID, Stage, StageDuration, Total)

    AS

    (

        SELECT P.TeamID, P.PrevTeamID, P.Stage, P.StageDuration, P.Total

        FROM #PathTotals P

            CROSS JOIN (

                SELECT MAX(P1.Total) AS Total

                FROM #PathTotals P1

            ) D

        WHERE P.Total = D.Total

        UNION ALL

        SELECT P.TeamID, P.PrevTeamID, P.Stage, P.StageDuration, L.Total

        FROM LongestPath L

            JOIN #PathTotals P

                ON L.PrevTeamID = P.TeamID

                    AND L.TeamID = P.NextTeamID

                    AND L.Stage = P.Stage + 1

    )

    SELECT L.Stage, T.Team, L.StageDuration, L.Total AS TotalDuration

    FROM LongestPath L

        JOIN #Teams T

            ON L.TeamID = T.TeamID

    ORDER BY Stage

     

  • Would not be just easier to sit on the dataset we were shown and sayK,first step we do before any processing is NORMALIZATION...?

    There would be more meaningful tables:

    1.HIERARCHY with Hier_level,ID columns

    2.TEAM with ID,Members columns

    3.SUCCESSORS with ID,Successor columns

    4.PREDESSORS with ID,Predessor columns

    5. and DURATION with Path,ID, Duration columns

    after which you can use JOIN and get what you want.

    ben

  • goyers,

    One thing I still did not get, is what the Hier_level does stand for?

    I would say that should be a hierarchical level of departments or levels with different granted permissions.

    But than, there is "FSCM Conversion - John Fangman" in both level 3 and 4, because of different path(Path1 and Path2).

    So I assume the hier_level is dynamically changing, always the last successor is the highest level with value 1.But again, what is the main benefit to have this dynamically changing, rather than to have a fixed levels of hierarchy,ie:BA-1,Reporting-2,Interconnectivity Testing-3,etc.

    Can someone or maybe you goyers clarify the description of hier_level?

    Thanks

    Ben

  • I have just had a chance to have a quick look at this again. The following bit of code may help in showing duplicate maximum duration paths:

    -- Now show the longest path

    -- This should show duplicates

    ;WITH LongestPath (PathID, TeamID, PrevTeamID, Stage, StageDuration, Total)

    AS

    (

        SELECT ROW_NUMBER() OVER (PARTITION BY P.Total ORDER BY P.Stage)

            ,P.TeamID, P.PrevTeamID, P.Stage, P.StageDuration, P.Total

        FROM #PathTotals P

        WHERE P.Total = (

                SELECT MAX(P1.Total)

                FROM #PathTotals P1

            )

        UNION ALL

        SELECT L.PathID, P.TeamID, P.PrevTeamID, P.Stage, P.StageDuration, L.Total

        FROM LongestPath L

            JOIN #PathTotals P

                ON L.PrevTeamID = P.TeamID

                    AND L.TeamID = P.NextTeamID

                    AND L.Stage = P.Stage + 1

    )

    SELECT L.PathID, L.Stage, T.Team, L.StageDuration, L.Total AS TotalDuration

    FROM LongestPath L

        JOIN #Teams T

            ON L.TeamID = T.TeamID

    ORDER BY PathID, Stage

     

  • And this will show all the paths:

    ;WITH LongestPath (PathID, TeamID, PrevTeamID, Stage, StageDuration, Total)

    AS

    (

        SELECT ROW_NUMBER() OVER (ORDER BY P.Total, P.Stage)

            ,P.TeamID, P.PrevTeamID, P.Stage, P.StageDuration, P.Total

        FROM #PathTotals P

        WHERE P.NextTeamID is NULL

        UNION ALL

        SELECT L.PathID, P.TeamID, P.PrevTeamID, P.Stage, P.StageDuration, L.Total

        FROM LongestPath L

            JOIN #PathTotals P

                ON L.PrevTeamID = P.TeamID

                    AND L.TeamID = P.NextTeamID

                    AND L.Stage = P.Stage + 1

    )

    SELECT L.PathID, L.Stage, T.Team, L.StageDuration, L.Total AS TotalDuration

    FROM LongestPath L

        JOIN #Teams T

        ON L.TeamID = T.TeamID

    ORDER BY PathID, Stage

     

  • Ben,

    The relevance of the Hier_level is it shows the position of the task in the path. i.e.

    for PATH 1

    1. Reporting - Cynthia Reece/4813 ===>>> 2. Interconnectivity Testing - Felice Schulz/4594  ====>>>> 3. FSCM Conversion - John Fangman/1941

    But as you rightly say, this should be normalised


    Everything you can imagine is real.

  • sorry for replying just now.....the hier_level is based on the predecessor columns so if the predecessor is blank it is assumed that it is level 1

  • thank you very much for giving your precious time in helping me with this problem.....special thanks to Ken, Bledu, and Ben.....i'll keep you posted if it works! thanks guys.

  • this solution from Ken worked. Thanks Ken

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply