July 20, 2007 at 12:36 am
Hi all,
I would like to ask for help regarding my problem ragarding aggregating hierarchical data based on their heirarchical groupings. I have been doing this for the past 3 days and hasn't really been getting anything right. You guys might have solved any related problem. I'll be very tahnakful.
Here's the sample dataset:
HIER_LEVEL | TEAM | ID | SUCCESSOR | PREDESSOR | DURATION | |
1 | BA - Chris Bowe | 13521 | Reporting - Cynthia Reece/4813 | 1 | PATH 1 & 2 | |
2 | Reporting - Cynthia Reece | 4813 | BA - Chris Bowe/13521 | FSCM Conversion - John Fangman/1931 | 1 | PATH 2 |
2 | Reporting - Cynthia Reece | 4813 | BA - Chris Bowe/13521 | Interconnectivity Testing - Felice Schulz/4594 | 1 | PATH 1 |
3 | FSCM Conversion - John Fangman | 1931 | Reporting - Cynthia Reece/4813 | Change Management - Pamela Hammond/643 | 1 | PATH 2 |
4 | Change Management - Pamela Hammond | 643 | FSCM Conversion - John Fangman/1931 | Data Cleansing - Michael Godette/147 | 50 | PATH 2 |
5 | Data Cleansing - Michael Godette | 147 | Change Management - Pamela Hammond/643 | 1 | PATH 2 | |
3 | Interconnectivity Testing - Felice Schulz | 4594 | Reporting - Cynthia Reece/4813 | FSCM Conversion - John Fangman/1941 | 67 | PATH 1 |
4 | FSCM Conversion - John Fangman | 1941 | Interconnectivity Testing - Felice Schulz/4594 | Change Management - Pamela Hammond/643 | 1 | PATH 1 |
5 | Change Management - Pamela Hammond | 643 | FSCM Conversion - John Fangman/1941 | Data Cleansing - Michael Godette/147 | 50 | PATH 1 |
6 | Data Cleansing - Michael Godette | 147 | Change Management - Pamela Hammond/643 | 1 | PATH 1 |
based from this dataset, i would like to calculate the duration for each path. The calculation below should be as follows but i need this to be done dynamically and to pinpoint the rows which compromise the greatest total durations.
PATH 1 | DURATION | PATH 2 | DURATION | |
13521 | 1 | 13521 | 1 | |
13521 - 4813 | 2 | 13521 - 4813 | 2 | |
4813 - 4594 | 68 | 4813 - 1931 | 2 | |
4594 - 1941 | 68 | 1931 - 643 | 51 | |
1941 - 643 | 51 | 643 - 147 | 51 | |
643 - 147 | 51 | TOTAL DURATION | 107 | |
TOTAL DURATION | 241 |
You have to pardon my ignorance because Im just new to this. Hope to get any help from you guys. Thank you very much.
July 20, 2007 at 4:52 am
Hi
have'nt understood ur requirements clearly. can u give a better clear picture regarding this.
just a point though - For hierarchial data i use CTE's (Common Tabe Expressions).
check them out in BOL.
"Keep Trying"
July 20, 2007 at 5:44 am
Please provide DDL and sample data as described at http://www.aspfaq.com/etiquette.asp?id=5006
SQL = Scarcely Qualifies as a Language
July 20, 2007 at 6:35 am
This is going to be based on a lot of assumptions, but I'm using your example as matrix.
It seems to work, though it's 'old style', haven't messed with CTE's that much yet
(which I belive may also be a way to solve the problem)
Anyways, I'll assume that the sample shows what your data looks like?
If it is, unfortunately it's in a terrible shape. Most of the code is going to be stringmanipulation stuff in order to rip out the important stuff, which seems to be the pathnumbers and id's, so it's not going to look very pretty.
I'm also going to assume that the patterns in the 'text/id' strings are reliable..
ie - the id is always after the '/' in every successor/predessor description etc...
Not all is bad, though, the good news is that you already have the things for a materialized path in there.
There is the level, Id, parentId and childId, wo we just need to connect the dots.
(and ofc to 'clean up' getting rid of all junktext around the numbers)
The code is kept a bit verbose, so it may be easier to follow what's happening along the road.
At the end we're going to produce this resultset out of the sampledata:
path hier_level id predessorid duration p
---- ----------- ----------- ----------- ----------- -----------------------
1 1 13521 4813 1 13521 - 13521
1 2 4813 4594 2 13521 - 4813
1 3 4594 1941 68 4813 - 4594
1 4 1941 643 68 4594 - 1941
1 5 643 147 51 1941 - 643
1 6 147 NULL 51 643 - 147
2 1 13521 4813 1 13521 - 13521
2 2 4813 1931 2 13521 - 4813
2 3 1931 643 2 4813 - 1931
2 4 643 147 51 1931 - 643
2 5 147 NULL 51 643 - 147
(11 row(s) affected)
Here's your exampledata we use:
/*
create table #x
( hier_level int not null, team varchar(90) not null, id int not null, successor varchar(90) null, predessor varchar(90) null,
duration int not null, path varchar(10) not null )
insert #x
select 1, 'BA - Chris Bowe', 13521, null, 'Reporting - Cynthia Reece/4813', 1, 'PATH 1 & 2' union all
select 2, 'Reporting - Cynthia Reece', 4813, 'BA - Chris Bowe/13521', 'FSCM Conversion - John Fangman/1931', 1, 'PATH 2' union all
select 2, 'Reporting - Cynthia Reece', 4813, 'BA - Chris Bowe/13521', 'Interconnectivity Testing - Felice Schulz/4594', 1, 'PATH 1' union all
select 3, 'FSCM Conversion - John Fangman', 1931, 'Reporting - Cynthia Reece/4813', 'Change Management - Pamela Hammond/643', 1, 'PATH 2' union all
select 4, 'Change Management - Pamela Hammond', 643, 'FSCM Conversion - John Fangman/1931', 'Data Cleansing - Michael Godette/147', 50, 'PATH 2' union all
select 5, 'Data Cleansing - Michael Godette', 147, 'Change Management - Pamela Hammond/643', null, 1, 'PATH 2' union all
select 3, 'Interconnectivity Testing - Felice Schulz', 4594, 'Reporting - Cynthia Reece/4813', 'FSCM Conversion - John Fangman/1941', 67, 'PATH 1' union all
select 4, 'FSCM Conversion - John Fangman', 1941, 'Interconnectivity Testing - Felice Schulz/4594', 'Change Management - Pamela Hammond/643', 1, 'PATH 1' union all
select 5, 'Change Management - Pamela Hammond', 643, 'FSCM Conversion - John Fangman/1941', 'Data Cleansing - Michael Godette/147', 50, 'PATH 1' union all
select 6, 'Data Cleansing - Michael Godette', 147, 'Change Management - Pamela Hammond/643', null, 1, 'PATH 1'
*/
Now, our first mission is to fish out the id's for successors and predessors from the text..
select hier_level,
id,
coalesce(substring(successor, charindex('/', successor) +1, 10), id) as successorId,
substring(predessor, charindex('/', predessor) +1, 10) as predessorId,
left(replace(replace(replace(path, 'path', ''), '&', ''), ' ', ''), 1) as path,
duration
from #x
hier_level id successorId predessorId path duration
----------- ----------- ----------- ----------- ---- -----------
1 13521 13521 4813 1 1
2 4813 13521 1931 2 1
2 4813 13521 4594 1 1
3 1931 4813 643 2 1
4 643 1931 147 2 50
5 147 643 NULL 2 1
3 4594 4813 1941 1 67
4 1941 4594 643 1 1
5 643 1941 147 1 50
6 147 643 NULL 1 1
(10 row(s) affected)
...looks ok, almost.
There's a problem with the first row (root node) in that the row should belong to both path 1 and 2, so we need an 'extra' row to reflect that path also. We can do that by unioning the above with an almost identical statement. The only difference is we take 'path' from the right instead of left as above...
select hier_level,
id,
coalesce(substring(successor, charindex('/', successor) +1, 10), id) as successorId,
substring(predessor, charindex('/', predessor) +1, 10) as predessorId,
left(replace(replace(replace(path, 'path', ''), '&', ''), ' ', ''), 1) as path,
duration
from #x
union
select hier_level,
id,
coalesce(substring(successor, charindex('/', successor) +1, 10), id) as successorId,
substring(predessor, charindex('/', predessor) +1, 10) as predessorId,
right(replace(replace(replace(path, 'path', ''), '&', ''), ' ', ''), 1) as rpath,
duration
from #x
hier_level id successorId predessorId path duration
----------- ----------- ----------- ----------- ---- -----------
1 13521 13521 4813 1 1
1 13521 13521 4813 2 1
2 4813 13521 1931 2 1
2 4813 13521 4594 1 1
3 1931 4813 643 2 1
3 4594 4813 1941 1 67
4 643 1931 147 2 50
4 1941 4594 643 1 1
5 147 643 NULL 2 1
5 643 1941 147 1 50
6 147 643 NULL 1 1
(11 row(s) affected)
..there.. this is what we can work with (and perhaps how your data should look like from the beginning.
What we need to do now is a selfjoin against this 'table' and connect the parent with it's immediate child in order to calculate the duration, which is (I assume )
'duration + duration of the row one level higher within the same path' - except for level 1, which has the whole duration on it's own. (that's why the case is in the duration sum below)
If we pretend that we actually had this as the table to start with, then the whole thing would look like this:
select x.hier_level,
x.path,
x.id,
x.predessorid,
x.duration + case when x.hier_level = 1 then 0 else y.duration end as duration,
cast(y.id as varchar(10)) + ' - ' + cast(x.id as varchar(10)) as p
from #z x
join #z y
on coalesce(x.successorid, x.id) = y.id
and x.path = y.path
order by x.path, x.hier_level
hier_level path id predessorid duration p
----------- ---- ----------- ----------- ----------- -----------------------
1 1 13521 4813 1 13521 - 13521
2 1 4813 4594 2 13521 - 4813
3 1 4594 1941 68 4813 - 4594
4 1 1941 643 68 4594 - 1941
5 1 643 147 51 1941 - 643
6 1 147 NULL 51 643 - 147
1 2 13521 4813 1 13521 - 13521
2 2 4813 1931 2 13521 - 4813
3 2 1931 643 2 4813 - 1931
4 2 643 147 51 1931 - 643
5 2 147 NULL 51 643 - 147
(11 row(s) affected)
Doesn't look too complicated?
From here you can do whatever you want with regard to highest duration, sum it by path or whatever...
If, you on the other hand, for some reason, are unable to work with your data in a 'clean' way, then this is how it looks to do it all on the fly - sensitive persons may need to look another way
select x.hier_level,
x.path,
x.id,
x.predessorid,
x.duration + case when x.hier_level = 1 then 0 else y.duration end as duration,
cast(y.id as varchar(10)) + ' - ' + cast(x.id as varchar(10)) as p
from
(
select hier_level,
id,
coalesce(substring(successor, charindex('/', successor) +1, 10), id) as successorId,
substring(predessor, charindex('/', predessor) +1, 10) as predessorId,
left(replace(replace(replace(path, 'path', ''), '&', ''), ' ', ''), 1) as path,
duration
from #x
union
select hier_level,
id,
coalesce(substring(successor, charindex('/', successor) +1, 10), id) as successorId,
substring(predessor, charindex('/', predessor) +1, 10) as predessorId,
right(replace(replace(replace(path, 'path', ''), '&', ''), ' ', ''), 1) as rpath,
duration
from #x
) x
join
(
select hier_level,
id,
coalesce(substring(successor, charindex('/', successor) +1, 10), id) as successorId,
substring(predessor, charindex('/', predessor) +1, 10) as predessorId,
left(replace(replace(replace(path, 'path', ''), '&', ''), ' ', ''), 1) as path,
duration
from #x
union
select hier_level,
id,
coalesce(substring(successor, charindex('/', successor) +1, 10), id) as successorId,
substring(predessor, charindex('/', predessor) +1, 10) as predessorId,
right(replace(replace(replace(path, 'path', ''), '&', ''), ' ', ''), 1) as rpath,
duration
from #x
) y
on coalesce(x.successorid, x.id) = y.id
and x.path = y.path
order by x.path, x.hier_level
Hope it helps you out.
=;o)
/Kenneth
July 20, 2007 at 6:59 am
i will substitute your use of the word hierachy and use order by. to me your problem seems to be a network path problem akin to project management. you are trying to find where along the network path you are most likely going to face problems i.e. the critical node.
you will unfortunately have to run the script for each Path, but I will try to think of a more elegant solution
---------------------
--declare a table to store the temp values
DECLARE
@values TABLE(
Path1
VARCHAR(15),
Duration
bigint
)
DECLARE
@HIER_LEVEL int, @ID VARCHAR(10), DURATION BIGINT
DECLARE
@prev_duration BIGINT, @prev_id VARCHAR(10),@sumit bigint
DECLARE
node_cursor
SELECT
HIER_LEVEL, ID, DURATION
FROM
SomeTable
WHERE
PATHid = 1
ORDER
BY [HIER_LEVEL] ASC
OPEN
node_cursor
SET
@prev_id = '', @prev_duration = 0
FETCH
NEXT FROM node_cursor
INTO
@HIER_LEVEL, @ID, @DURATION
WHILE
@@FETCH_STATUS = 0
BEGIN
INSERT INTO @values
(Pathrange,Duration)
SELECT @ID + @prev_id, (@DURATION + @prev_duration)
SET @prev_id = @ID, @prev_duration = @DURATION, @sumit = @prev_duration + @DURATION
FETCH NEXT FROM node_cursor
INTO @HIER_LEVEL, @ID, @DURATION
END
CLOSE
node_cursor
DEALLOCATE
node_cursor
INSERT
INTO @values
(
Pathrange,Duration)
SELECT
'TOTAL DURATION', @sumit
SELECT
* FROM @values
July 20, 2007 at 7:05 am
What a nice question. I hope the following will help you.
I have taken a slightly different approach.
As I found your non-normalized dataset confusing, I normalized it as follows:
CREATE TABLE #Teams
(
TeamID int NOT NULL
,Team varchar(50) NOT NULL
)
GO
INSERT INTO #Teams
SELECT 147, 'Data Cleansing - Michael Godette' UNION ALL
SELECT 643, 'Change Management - Pamela Hammond' UNION ALL
SELECT 1931, 'FSCM Conversion - John Fangman' UNION ALL
SELECT 4594, 'Interconnectivity Testing - Felice Schulz' UNION ALL
SELECT 4813, 'Reporting - Cynthia Reece' UNION ALL
SELECT 13521, 'BA - Chris Bowe'
GO
CREATE TABLE #Paths
(
TeamID int NOT NULL
,NextTeamID int NULL
,Duration int NOT NULL
)
GO
INSERT INTO #Paths
SELECT 13521, 4813, 1 UNION ALL
SELECT 4813, 1931, 1 UNION ALL
SELECT 4813, 4594, 1 UNION ALL
SELECT 4594, 1931, 67 UNION ALL
SELECT 1931, 643, 1 UNION ALL
SELECT 643, 147, 50 UNION ALL
SELECT 147, NULL, 1
GO
Now to get the result. This effectively requires two recursions, one to get the maximum total duration
and one to get the path of the maximum total duration. (I have assumed that only one path can have the
maximum duration in order to simplify the code.)
-- 1. Get the path totals
CREATE TABLE #PathTotals
(
TeamID int NOT NULL
,NextTeamID int NULL
,PrevTeamID int NULL
,Stage int NOT NULL
,StageDuration int NOT NULL
,Total int NOT NULL
)
;WITH GetTotals (TeamID, NextTeamID, PrevTeamID, Duration, Stage, StageDuration, Total)
AS
(
SELECT TeamID, NextTeamID, NULL, Duration, 1, Duration, Duration
FROM #Paths
WHERE TeamID = 13521
UNION ALL
SELECT P.TeamID, P.NextTeamID, T.TeamID, P.Duration
,Stage + 1
,T.Duration + P.Duration
,T.Total + T.Duration + P.Duration
FROM GetTotals T
JOIN #Paths P
ON T.NextTeamID = P.TeamID
)
INSERT INTO #PathTotals
SELECT TeamID, NextTeamID, PrevTeamID, Stage, StageDuration, Total
FROM GetTotals
-- Now show the longest path
-- More complicated if have to allow for duplicates
-- Works with sample data but will need testing with real data
;WITH LongestPath (TeamID, PrevTeamID, Stage, StageDuration, Total)
AS
(
SELECT P.TeamID, P.PrevTeamID, P.Stage, P.StageDuration, P.Total
FROM #PathTotals P
CROSS JOIN (
SELECT MAX(P1.Total) AS Total
FROM #PathTotals P1
) D
WHERE P.Total = D.Total
UNION ALL
SELECT P.TeamID, P.PrevTeamID, P.Stage, P.StageDuration, L.Total
FROM LongestPath L
JOIN #PathTotals P
ON L.PrevTeamID = P.TeamID
AND L.TeamID = P.NextTeamID
AND L.Stage = P.Stage + 1
)
SELECT L.Stage, T.Team, L.StageDuration, L.Total AS TotalDuration
FROM LongestPath L
JOIN #Teams T
ON L.TeamID = T.TeamID
ORDER BY Stage
July 20, 2007 at 7:17 am
Would not be just easier to sit on the dataset we were shown and sayK,first step we do before any processing is NORMALIZATION...?
There would be more meaningful tables:
1.HIERARCHY with Hier_level,ID columns
2.TEAM with ID,Members columns
3.SUCCESSORS with ID,Successor columns
4.PREDESSORS with ID,Predessor columns
5. and DURATION with Path,ID, Duration columns
after which you can use JOIN and get what you want.
ben
July 20, 2007 at 10:14 am
goyers,
One thing I still did not get, is what the Hier_level does stand for?
I would say that should be a hierarchical level of departments or levels with different granted permissions.
But than, there is "FSCM Conversion - John Fangman" in both level 3 and 4, because of different path(Path1 and Path2).
So I assume the hier_level is dynamically changing, always the last successor is the highest level with value 1.But again, what is the main benefit to have this dynamically changing, rather than to have a fixed levels of hierarchy,ie:BA-1,Reporting-2,Interconnectivity Testing-3,etc.
Can someone or maybe you goyers clarify the description of hier_level?
Thanks
Ben
July 20, 2007 at 10:21 am
I have just had a chance to have a quick look at this again. The following bit of code may help in showing duplicate maximum duration paths:
-- Now show the longest path
-- This should show duplicates
;WITH LongestPath (PathID, TeamID, PrevTeamID, Stage, StageDuration, Total)
AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY P.Total ORDER BY P.Stage)
,P.TeamID, P.PrevTeamID, P.Stage, P.StageDuration, P.Total
FROM #PathTotals P
WHERE P.Total = (
SELECT MAX(P1.Total)
FROM #PathTotals P1
)
UNION ALL
SELECT L.PathID, P.TeamID, P.PrevTeamID, P.Stage, P.StageDuration, L.Total
FROM LongestPath L
JOIN #PathTotals P
ON L.PrevTeamID = P.TeamID
AND L.TeamID = P.NextTeamID
AND L.Stage = P.Stage + 1
)
SELECT L.PathID, L.Stage, T.Team, L.StageDuration, L.Total AS TotalDuration
FROM LongestPath L
JOIN #Teams T
ON L.TeamID = T.TeamID
ORDER BY PathID, Stage
July 20, 2007 at 10:30 am
And this will show all the paths:
;WITH LongestPath (PathID, TeamID, PrevTeamID, Stage, StageDuration, Total)
AS
(
SELECT ROW_NUMBER() OVER (ORDER BY P.Total, P.Stage)
,P.TeamID, P.PrevTeamID, P.Stage, P.StageDuration, P.Total
FROM #PathTotals P
WHERE P.NextTeamID is NULL
UNION ALL
SELECT L.PathID, P.TeamID, P.PrevTeamID, P.Stage, P.StageDuration, L.Total
FROM LongestPath L
JOIN #PathTotals P
ON L.PrevTeamID = P.TeamID
AND L.TeamID = P.NextTeamID
AND L.Stage = P.Stage + 1
)
SELECT L.PathID, L.Stage, T.Team, L.StageDuration, L.Total AS TotalDuration
FROM LongestPath L
JOIN #Teams T
ON L.TeamID = T.TeamID
ORDER BY PathID, Stage
July 20, 2007 at 12:29 pm
Ben,
The relevance of the Hier_level is it shows the position of the task in the path. i.e.
for PATH 1
1. Reporting - Cynthia Reece/4813 ===>>> 2. Interconnectivity Testing - Felice Schulz/4594 ====>>>> 3. FSCM Conversion - John Fangman/1941
But as you rightly say, this should be normalised
July 22, 2007 at 7:02 pm
sorry for replying just now.....the hier_level is based on the predecessor columns so if the predecessor is blank it is assumed that it is level 1
July 22, 2007 at 7:06 pm
thank you very much for giving your precious time in helping me with this problem.....special thanks to Ken, Bledu, and Ben.....i'll keep you posted if it works! thanks guys.
July 23, 2007 at 1:41 am
this solution from Ken worked. Thanks Ken
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply