calculate hierarchical values

Question

Post reply

calculate hierarchical values

goyers-382153

SSC Veteran

Points: 250

July 20, 2007 at 12:36 am

#196984

Hi all,

I would like to ask for help regarding my problem ragarding aggregating hierarchical data based on their heirarchical groupings. I have been doing this for the past 3 days and hasn't really been getting anything right. You guys might have solved any related problem. I'll be very tahnakful.

Here's the sample dataset:

HIER_LEVEL	TEAM	ID	SUCCESSOR	PREDESSOR	DURATION
1	BA - Chris Bowe	13521		Reporting - Cynthia Reece/4813	1	PATH 1 & 2
2	Reporting - Cynthia Reece	4813	BA - Chris Bowe/13521	FSCM Conversion - John Fangman/1931	1	PATH 2
2	Reporting - Cynthia Reece	4813	BA - Chris Bowe/13521	Interconnectivity Testing - Felice Schulz/4594	1	PATH 1
3	FSCM Conversion - John Fangman	1931	Reporting - Cynthia Reece/4813	Change Management - Pamela Hammond/643	1	PATH 2
4	Change Management - Pamela Hammond	643	FSCM Conversion - John Fangman/1931	Data Cleansing - Michael Godette/147	50	PATH 2
5	Data Cleansing - Michael Godette	147	Change Management - Pamela Hammond/643		1	PATH 2
3	Interconnectivity Testing - Felice Schulz	4594	Reporting - Cynthia Reece/4813	FSCM Conversion - John Fangman/1941	67	PATH 1
4	FSCM Conversion - John Fangman	1941	Interconnectivity Testing - Felice Schulz/4594	Change Management - Pamela Hammond/643	1	PATH 1
5	Change Management - Pamela Hammond	643	FSCM Conversion - John Fangman/1941	Data Cleansing - Michael Godette/147	50	PATH 1
6	Data Cleansing - Michael Godette	147	Change Management - Pamela Hammond/643		1	PATH 1

based from this dataset, i would like to calculate the duration for each path. The calculation below should be as follows but i need this to be done dynamically and to pinpoint the rows which compromise the greatest total durations.

PATH 1	DURATION	PATH 2	DURATION
13521	1	13521	1
13521 - 4813	2	13521 - 4813	2
4813 - 4594	68	4813 - 1931	2
4594 - 1941	68	1931 - 643	51
1941 - 643	51	643 - 147	51
643 - 147	51	TOTAL DURATION	107
TOTAL DURATION	241

You have to pardon my ignorance because Im just new to this. Hope to get any help from you guys. Thank you very much.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

ChiragNS One Orange Chip Points: 26137 More actions · Answer 1

Hi

have'nt understood ur requirements clearly. can u give a better clear picture regarding this.

just a point though - For hierarchial data i use CTE's (Common Tabe Expressions).

check them out in BOL.

"Keep Trying"

Carl Federl One Orange Chip Points: 25384 More actions · Answer 2

Please provide DDL and sample data as described at http://www.aspfaq.com/etiquette.asp?id=5006

SQL = Scarcely Qualifies as a Language

Kenneth Wilhelmsson SSC-Dedicated Points: 30043 More actions · Answer 3

This is going to be based on a lot of assumptions, but I'm using your example as matrix.

It seems to work, though it's 'old style', haven't messed with CTE's that much yet

(which I belive may also be a way to solve the problem)

Anyways, I'll assume that the sample shows what your data looks like?

If it is, unfortunately it's in a terrible shape. Most of the code is going to be stringmanipulation stuff in order to rip out the important stuff, which seems to be the pathnumbers and id's, so it's not going to look very pretty.

I'm also going to assume that the patterns in the 'text/id' strings are reliable..

ie - the id is always after the '/' in every successor/predessor description etc...

Not all is bad, though, the good news is that you already have the things for a materialized path in there.

There is the level, Id, parentId and childId, wo we just need to connect the dots.

(and ofc to 'clean up' getting rid of all junktext around the numbers)

The code is kept a bit verbose, so it may be easier to follow what's happening along the road.

At the end we're going to produce this resultset out of the sampledata:

path hier_level id predessorid duration p

---- ----------- ----------- ----------- ----------- -----------------------

1 1 13521 4813 1 13521 - 13521

1 2 4813 4594 2 13521 - 4813

1 3 4594 1941 68 4813 - 4594

1 4 1941 643 68 4594 - 1941

1 5 643 147 51 1941 - 643

1 6 147 NULL 51 643 - 147

2 1 13521 4813 1 13521 - 13521

2 2 4813 1931 2 13521 - 4813

2 3 1931 643 2 4813 - 1931

2 4 643 147 51 1931 - 643

2 5 147 NULL 51 643 - 147

(11 row(s) affected)

Here's your exampledata we use:

/*

create table #x

( hier_level int not null, team varchar(90) not null, id int not null, successor varchar(90) null, predessor varchar(90) null,

duration int not null, path varchar(10) not null )

insert #x

select 1, 'BA - Chris Bowe', 13521, null, 'Reporting - Cynthia Reece/4813', 1, 'PATH 1 & 2' union all

select 2, 'Reporting - Cynthia Reece', 4813, 'BA - Chris Bowe/13521', 'FSCM Conversion - John Fangman/1931', 1, 'PATH 2' union all

select 2, 'Reporting - Cynthia Reece', 4813, 'BA - Chris Bowe/13521', 'Interconnectivity Testing - Felice Schulz/4594', 1, 'PATH 1' union all

select 3, 'FSCM Conversion - John Fangman', 1931, 'Reporting - Cynthia Reece/4813', 'Change Management - Pamela Hammond/643', 1, 'PATH 2' union all

select 4, 'Change Management - Pamela Hammond', 643, 'FSCM Conversion - John Fangman/1931', 'Data Cleansing - Michael Godette/147', 50, 'PATH 2' union all

select 5, 'Data Cleansing - Michael Godette', 147, 'Change Management - Pamela Hammond/643', null, 1, 'PATH 2' union all

select 3, 'Interconnectivity Testing - Felice Schulz', 4594, 'Reporting - Cynthia Reece/4813', 'FSCM Conversion - John Fangman/1941', 67, 'PATH 1' union all

select 4, 'FSCM Conversion - John Fangman', 1941, 'Interconnectivity Testing - Felice Schulz/4594', 'Change Management - Pamela Hammond/643', 1, 'PATH 1' union all

select 5, 'Change Management - Pamela Hammond', 643, 'FSCM Conversion - John Fangman/1941', 'Data Cleansing - Michael Godette/147', 50, 'PATH 1' union all

select 6, 'Data Cleansing - Michael Godette', 147, 'Change Management - Pamela Hammond/643', null, 1, 'PATH 1'

*/

Now, our first mission is to fish out the id's for successors and predessors from the text..

select hier_level,

id,

coalesce(substring(successor, charindex('/', successor) +1, 10), id) as successorId,

substring(predessor, charindex('/', predessor) +1, 10) as predessorId,

left(replace(replace(replace(path, 'path', ''), '&', ''), ' ', ''), 1) as path,

duration

from #x

hier_level id successorId predessorId path duration

----------- ----------- ----------- ----------- ---- -----------

1 13521 13521 4813 1 1

2 4813 13521 1931 2 1

2 4813 13521 4594 1 1

3 1931 4813 643 2 1

4 643 1931 147 2 50

5 147 643 NULL 2 1

3 4594 4813 1941 1 67

4 1941 4594 643 1 1

5 643 1941 147 1 50

6 147 643 NULL 1 1

(10 row(s) affected)

...looks ok, almost.

There's a problem with the first row (root node) in that the row should belong to both path 1 and 2, so we need an 'extra' row to reflect that path also. We can do that by unioning the above with an almost identical statement. The only difference is we take 'path' from the right instead of left as above...

select hier_level,

id,

coalesce(substring(successor, charindex('/', successor) +1, 10), id) as successorId,

substring(predessor, charindex('/', predessor) +1, 10) as predessorId,

left(replace(replace(replace(path, 'path', ''), '&', ''), ' ', ''), 1) as path,

duration

from #x

union

select hier_level,

id,

coalesce(substring(successor, charindex('/', successor) +1, 10), id) as successorId,

substring(predessor, charindex('/', predessor) +1, 10) as predessorId,

right(replace(replace(replace(path, 'path', ''), '&', ''), ' ', ''), 1) as rpath,

duration

from #x

hier_level id successorId predessorId path duration

----------- ----------- ----------- ----------- ---- -----------

1 13521 13521 4813 1 1

1 13521 13521 4813 2 1

2 4813 13521 1931 2 1

2 4813 13521 4594 1 1

3 1931 4813 643 2 1

3 4594 4813 1941 1 67

4 643 1931 147 2 50

4 1941 4594 643 1 1

5 147 643 NULL 2 1

5 643 1941 147 1 50

6 147 643 NULL 1 1

(11 row(s) affected)

..there.. this is what we can work with (and perhaps how your data should look like from the beginning.

What we need to do now is a selfjoin against this 'table' and connect the parent with it's immediate child in order to calculate the duration, which is (I assume )

'duration + duration of the row one level higher within the same path' - except for level 1, which has the whole duration on it's own. (that's why the case is in the duration sum below)

If we pretend that we actually had this as the table to start with, then the whole thing would look like this:

select x.hier_level,

x.path,

x.id,

x.predessorid,

x.duration + case when x.hier_level = 1 then 0 else y.duration end as duration,

cast(y.id as varchar(10)) + ' - ' + cast(x.id as varchar(10)) as p

from #z x

join #z y

on coalesce(x.successorid, x.id) = y.id

and x.path = y.path

order by x.path, x.hier_level

hier_level path id predessorid duration p

----------- ---- ----------- ----------- ----------- -----------------------

1 1 13521 4813 1 13521 - 13521

2 1 4813 4594 2 13521 - 4813

3 1 4594 1941 68 4813 - 4594

4 1 1941 643 68 4594 - 1941

5 1 643 147 51 1941 - 643

6 1 147 NULL 51 643 - 147

1 2 13521 4813 1 13521 - 13521

2 2 4813 1931 2 13521 - 4813

3 2 1931 643 2 4813 - 1931

4 2 643 147 51 1931 - 643

5 2 147 NULL 51 643 - 147

(11 row(s) affected)

Doesn't look too complicated?

From here you can do whatever you want with regard to highest duration, sum it by path or whatever...

If, you on the other hand, for some reason, are unable to work with your data in a 'clean' way, then this is how it looks to do it all on the fly - sensitive persons may need to look another way

select x.hier_level,

x.path,

x.id,

x.predessorid,

x.duration + case when x.hier_level = 1 then 0 else y.duration end as duration,

cast(y.id as varchar(10)) + ' - ' + cast(x.id as varchar(10)) as p

from

(

select hier_level,

id,

coalesce(substring(successor, charindex('/', successor) +1, 10), id) as successorId,

substring(predessor, charindex('/', predessor) +1, 10) as predessorId,

left(replace(replace(replace(path, 'path', ''), '&', ''), ' ', ''), 1) as path,

duration

from #x

union

select hier_level,

id,

coalesce(substring(successor, charindex('/', successor) +1, 10), id) as successorId,

substring(predessor, charindex('/', predessor) +1, 10) as predessorId,

right(replace(replace(replace(path, 'path', ''), '&', ''), ' ', ''), 1) as rpath,

duration

from #x

) x

join

(

select hier_level,

id,

coalesce(substring(successor, charindex('/', successor) +1, 10), id) as successorId,

substring(predessor, charindex('/', predessor) +1, 10) as predessorId,

left(replace(replace(replace(path, 'path', ''), '&', ''), ' ', ''), 1) as path,

duration

from #x

union

select hier_level,

id,

coalesce(substring(successor, charindex('/', successor) +1, 10), id) as successorId,

substring(predessor, charindex('/', predessor) +1, 10) as predessorId,

right(replace(replace(replace(path, 'path', ''), '&', ''), ' ', ''), 1) as rpath,

duration

from #x

) y

on coalesce(x.successorid, x.id) = y.id

and x.path = y.path

order by x.path, x.hier_level

Hope it helps you out.

=;o)

/Kenneth

bledu SSChampion Points: 12335 More actions · Answer 4

i will substitute your use of the word hierachy and use order by. to me your problem seems to be a network path problem akin to project management. you are trying to find where along the network path you are most likely going to face problems i.e. the critical node.

you will unfortunately have to run the script for each Path, but I will try to think of a more elegant solution

---------------------

--declare a table to store the temp values

DECLARE

@values TABLE(

Path1

VARCHAR(15),

Duration

bigint

)

DECLARE

@HIER_LEVEL int, @ID VARCHAR(10), DURATION BIGINT

DECLARE

@prev_duration BIGINT, @prev_id VARCHAR(10),@sumit bigint

DECLARE

node_cursor

SELECT

HIER_LEVEL, ID, DURATION

FROM

SomeTable

WHERE

PATHid = 1

ORDER

BY [HIER_LEVEL] ASC

OPEN

node_cursor

SET

@prev_id = '', @prev_duration = 0

FETCH

NEXT FROM node_cursor

INTO

@HIER_LEVEL, @ID, @DURATION

WHILE

@@FETCH_STATUS = 0

BEGIN

INSERT INTO @values

(Pathrange,Duration)

SELECT @ID + @prev_id, (@DURATION + @prev_duration)

SET @prev_id = @ID, @prev_duration = @DURATION, @sumit = @prev_duration + @DURATION

FETCH NEXT FROM node_cursor

INTO @HIER_LEVEL, @ID, @DURATION

END

CLOSE

node_cursor

DEALLOCATE

node_cursor

INSERT

INTO @values

(

Pathrange,Duration)

SELECT

'TOTAL DURATION', @sumit

SELECT

* FROM @values

Everything you can imagine is real.

Ken McKelvey SSCoach Points: 18843 More actions · Answer 5

What a nice question. I hope the following will help you.

I have taken a slightly different approach.

As I found your non-normalized dataset confusing, I normalized it as follows:

CREATE TABLE #Teams

(

TeamID int NOT NULL

,Team varchar(50) NOT NULL

)

GO

INSERT INTO #Teams

SELECT 147, 'Data Cleansing - Michael Godette' UNION ALL

SELECT 643, 'Change Management - Pamela Hammond' UNION ALL

SELECT 1931, 'FSCM Conversion - John Fangman' UNION ALL

SELECT 4594, 'Interconnectivity Testing - Felice Schulz' UNION ALL

SELECT 4813, 'Reporting - Cynthia Reece' UNION ALL

SELECT 13521, 'BA - Chris Bowe'

GO

CREATE TABLE #Paths

(

TeamID int NOT NULL

,NextTeamID int NULL

,Duration int NOT NULL

)

GO

INSERT INTO #Paths

SELECT 13521, 4813, 1 UNION ALL

SELECT 4813, 1931, 1 UNION ALL

SELECT 4813, 4594, 1 UNION ALL

SELECT 4594, 1931, 67 UNION ALL

SELECT 1931, 643, 1 UNION ALL

SELECT 643, 147, 50 UNION ALL

SELECT 147, NULL, 1

GO

Now to get the result. This effectively requires two recursions, one to get the maximum total duration

and one to get the path of the maximum total duration. (I have assumed that only one path can have the

maximum duration in order to simplify the code.)

-- 1. Get the path totals

CREATE TABLE #PathTotals

(

TeamID int NOT NULL

,NextTeamID int NULL

,PrevTeamID int NULL

,Stage int NOT NULL

,StageDuration int NOT NULL

,Total int NOT NULL

)

;WITH GetTotals (TeamID, NextTeamID, PrevTeamID, Duration, Stage, StageDuration, Total)

AS

(

SELECT TeamID, NextTeamID, NULL, Duration, 1, Duration, Duration

FROM #Paths

WHERE TeamID = 13521

UNION ALL

SELECT P.TeamID, P.NextTeamID, T.TeamID, P.Duration

,Stage + 1

,T.Duration + P.Duration

,T.Total + T.Duration + P.Duration

FROM GetTotals T

JOIN #Paths P

ON T.NextTeamID = P.TeamID

)

INSERT INTO #PathTotals

SELECT TeamID, NextTeamID, PrevTeamID, Stage, StageDuration, Total

FROM GetTotals

-- Now show the longest path

-- More complicated if have to allow for duplicates

-- Works with sample data but will need testing with real data

;WITH LongestPath (TeamID, PrevTeamID, Stage, StageDuration, Total)

AS

(

SELECT P.TeamID, P.PrevTeamID, P.Stage, P.StageDuration, P.Total

FROM #PathTotals P

CROSS JOIN (

SELECT MAX(P1.Total) AS Total

FROM #PathTotals P1

) D

WHERE P.Total = D.Total

UNION ALL

SELECT P.TeamID, P.PrevTeamID, P.Stage, P.StageDuration, L.Total

FROM LongestPath L

JOIN #PathTotals P

ON L.PrevTeamID = P.TeamID

AND L.TeamID = P.NextTeamID

AND L.Stage = P.Stage + 1

)

SELECT L.Stage, T.Team, L.StageDuration, L.Total AS TotalDuration

FROM LongestPath L

JOIN #Teams T

ON L.TeamID = T.TeamID

ORDER BY Stage

Benjamin Kovac Right there with Babe Points: 718 More actions · Answer 6

Would not be just easier to sit on the dataset we were shown and sayK,first step we do before any processing is NORMALIZATION...?

There would be more meaningful tables:

1.HIERARCHY with Hier_level,ID columns

2.TEAM with ID,Members columns

3.SUCCESSORS with ID,Successor columns

4.PREDESSORS with ID,Predessor columns

5. and DURATION with Path,ID, Duration columns

after which you can use JOIN and get what you want.

ben

Benjamin Kovac Right there with Babe Points: 718 More actions · Answer 7

goyers,

One thing I still did not get, is what the Hier_level does stand for?

I would say that should be a hierarchical level of departments or levels with different granted permissions.

But than, there is "FSCM Conversion - John Fangman" in both level 3 and 4, because of different path(Path1 and Path2).

So I assume the hier_level is dynamically changing, always the last successor is the highest level with value 1.But again, what is the main benefit to have this dynamically changing, rather than to have a fixed levels of hierarchy,ie:BA-1,Reporting-2,Interconnectivity Testing-3,etc.

Can someone or maybe you goyers clarify the description of hier_level?

Thanks

Ben

Ken McKelvey SSCoach Points: 18843 More actions · Answer 8

I have just had a chance to have a quick look at this again. The following bit of code may help in showing duplicate maximum duration paths:

-- Now show the longest path

-- This should show duplicates

;WITH LongestPath (PathID, TeamID, PrevTeamID, Stage, StageDuration, Total)

AS

(

SELECT ROW_NUMBER() OVER (PARTITION BY P.Total ORDER BY P.Stage)

,P.TeamID, P.PrevTeamID, P.Stage, P.StageDuration, P.Total

FROM #PathTotals P

WHERE P.Total = (

SELECT MAX(P1.Total)

FROM #PathTotals P1

)

UNION ALL

SELECT L.PathID, P.TeamID, P.PrevTeamID, P.Stage, P.StageDuration, L.Total

FROM LongestPath L

JOIN #PathTotals P

ON L.PrevTeamID = P.TeamID

AND L.TeamID = P.NextTeamID

AND L.Stage = P.Stage + 1

)

SELECT L.PathID, L.Stage, T.Team, L.StageDuration, L.Total AS TotalDuration

FROM LongestPath L

JOIN #Teams T

ON L.TeamID = T.TeamID

ORDER BY PathID, Stage

Ken McKelvey SSCoach Points: 18843 More actions · Answer 9

And this will show all the paths:

;WITH LongestPath (PathID, TeamID, PrevTeamID, Stage, StageDuration, Total)

AS

(

SELECT ROW_NUMBER() OVER (ORDER BY P.Total, P.Stage)

,P.TeamID, P.PrevTeamID, P.Stage, P.StageDuration, P.Total

FROM #PathTotals P

WHERE P.NextTeamID is NULL

UNION ALL

SELECT L.PathID, P.TeamID, P.PrevTeamID, P.Stage, P.StageDuration, L.Total

FROM LongestPath L

JOIN #PathTotals P

ON L.PrevTeamID = P.TeamID

AND L.TeamID = P.NextTeamID

AND L.Stage = P.Stage + 1

)

SELECT L.PathID, L.Stage, T.Team, L.StageDuration, L.Total AS TotalDuration

FROM LongestPath L

JOIN #Teams T

ON L.TeamID = T.TeamID

ORDER BY PathID, Stage

bledu SSChampion Points: 12335 More actions · Answer 10

Ben,

The relevance of the Hier_level is it shows the position of the task in the path. i.e.

for PATH 1

1. Reporting - Cynthia Reece/4813 ===>>> 2. Interconnectivity Testing - Felice Schulz/4594 ====>>>> 3. FSCM Conversion - John Fangman/1941

But as you rightly say, this should be normalised

Everything you can imagine is real.

goyers-382153 SSC Veteran Points: 250 More actions · Answer 11

sorry for replying just now.....the hier_level is based on the predecessor columns so if the predecessor is blank it is assumed that it is level 1

goyers-382153 SSC Veteran Points: 250 More actions · Answer 12

thank you very much for giving your precious time in helping me with this problem.....special thanks to Ken, Bledu, and Ben.....i'll keep you posted if it works! thanks guys.

goyers-382153 SSC Veteran Points: 250 More actions · Answer 13

goyers-382153

SSC Veteran

Points: 250

July 23, 2007 at 1:41 am

#721445

this solution from Ken worked. Thanks Ken