January 29, 2011 at 12:41 pm
Comments posted to this topic are about the item Common table expressions and circular references
Semper in excretia, suus solum profundum variat
January 31, 2011 at 2:34 am
Hi, Cool stuff this CTE, but I have a question about the example.
Suppose I want to return not only the leaf-to-parent relation, but also any subparent-to-parent? For example, b -> c (level 0), b -> d (level 1) and c -> d (level 0)?
Thanks in advance!
January 31, 2011 at 2:41 am
WITH GroupMembers (Bottom, Child, ParentGroup, Level, hierarchypath)
AS
(
-- Anchor member definition
SELECT g.child as Bottom, g.child, g.parent, 0 AS Level, convert(varchar(max), g.child + '->' + g.parent) AS hierarchypath
FROM childrenandparents AS g
WHERE child not in (select parent from childrenandparents)
UNION ALL
-- Recursive member definition
SELECT gm.bottom, g.child, g.parent, Level + 1, hierarchypath + '->' + g.parent
FROM childrenandparents as g
INNER JOIN GroupMembers AS gm
ON gm.parentgroup = g.child
where hierarchypath not like '%'+g.child +'->' + g.parent + '%'
)
select bottom, parentgroup, level, hierarchypath
from groupmembers
where hierarchypath like '%'+parentgroup + '->' + '%'
order by level desc
option(maxrecursion 100);
will select just cycles. Needn't to calculate 'correct' max level beforehand.
eb3e->b->c->d->b
ab3a->b->c->d->b
January 31, 2011 at 3:57 am
CELKO (1/31/2011)
Nice piece! Now let's go one step further. How do you write constraints to prevent cycles (and orphans) in an adjacency list model?The best I have done for the adjacency list in a TRIGGER with a loop which is what your recursive CTE is, under the covers. But the loop does not have a built-in limit of 32.
Now, there's a question. The short answer is that I haven't. I know I've just written a technical article about CTEs, but I'm by no means an expert yet.
In fact, one of the biggest issues I have with hierarchal data is that most of the parent/child querying we do in my company is on data outside the control of the DBAs - most frequently group memberships in Active Directory. Indeed, in one or two cases, such a cyclic relationship has actually been valid. To my mind, no matter how difficult it may be to identify circular references in SQL, it pales into insignificance with the problem of "deciding" whether a particular instance should or should not be allowed.
I know it's a bit of a side-step answer, but I'd be tempted to throw this decision back towards the application. If it really had to be done at database level, I haven't yet come up with a better solution than what you've outlined.
Semper in excretia, suus solum profundum variat
January 31, 2011 at 4:10 am
Peter.Frissen (1/31/2011)
Hi, Cool stuff this CTE, but I have a question about the example.Suppose I want to return not only the leaf-to-parent relation, but also any subparent-to-parent? For example, b -> c (level 0), b -> d (level 1) and c -> d (level 0)?
Thanks in advance!
I think I understand your question, but apologies if not.
My example CTE starts off with ultimate children (records existing in the "child" column, but not in the "parent" column) and then wanders up the hierarchal tree. You could just as easily work the other way around (select all "parent" records which don't exist in the "child" column) and wander down the tree. In your case, it might even be worth starting off with all "parent" records irrespective of whether they're children or not. It'll increase the size of your CTE, but will allow you to see inherited relationships between parents and children within the middle of the hierarchal tree.
Hope that helps
Semper in excretia, suus solum profundum variat
January 31, 2011 at 4:13 am
gerg-520419 (1/31/2011)
WITH GroupMembers (Bottom, Child, ParentGroup, Level, hierarchypath)
AS
(
-- Anchor member definition
SELECT g.child as Bottom, g.child, g.parent, 0 AS Level, convert(varchar(max), g.child + '->' + g.parent) AS hierarchypath
FROM childrenandparents AS g
WHERE child not in (select parent from childrenandparents)
UNION ALL
-- Recursive member definition
SELECT gm.bottom, g.child, g.parent, Level + 1, hierarchypath + '->' + g.parent
FROM childrenandparents as g
INNER JOIN GroupMembers AS gm
ON gm.parentgroup = g.child
where hierarchypath not like '%'+g.child +'->' + g.parent + '%'
)
select bottom, parentgroup, level, hierarchypath
from groupmembers
where hierarchypath like '%'+parentgroup + '->' + '%'
order by level desc
option(maxrecursion 100);
will select just cycles. Needn't to calculate 'correct' max level beforehand.
eb3e->b->c->d->b
ab3a->b->c->d->b
Agreed.
I expect there are several other areas too where my CTEs could be tidied up; indeed, I'd be rather surprised if there weren't....
Semper in excretia, suus solum profundum variat
January 31, 2011 at 7:18 am
Hi,
Great article. On the same general topic but stepping back to the part about getting the AD data into tables:
It's 2011. Why the heck doesn't Microsoft store AD information in a true relational db to begin with?
--Jim
January 31, 2011 at 7:46 am
AD is a distributed database of sorts. I think it was built by a separate team, and optimized for a different purpose. Just like I'm not sure mail belongs in a RDBMS, not sure that directories should be.
January 31, 2011 at 7:56 am
Perhaps more pertinent in criticising AD is its accessibility. ADSI and LDAP are technically effective, but hardly user-friendly.
Semper in excretia, suus solum profundum variat
January 31, 2011 at 8:50 am
James Stephens (1/31/2011)
Hi,Great article. On the same general topic but stepping back to the part about getting the AD data into tables:
It's 2011. Why the heck doesn't Microsoft store AD information in a true relational db to begin with?
--Jim
Because AD is a hierarchical data store. Based on LDAP, it's very much like XML. Parent-child relationships, extensible object types, and the variety of attribute types mean that a relational data store is a poor fit -- especially since the primary use case is individual item transactional processing.
January 31, 2011 at 9:12 am
gerg-520419 (1/31/2011)
code ...
will select just cycles. Needn't to calculate 'correct' max level beforehand.
eb3e->b->c->d->b
ab3a->b->c->d->b
This is another alternative that will show the required hierarchy and highlight where circular references begin -
WITH GroupMembers (Child, ParentGroup, Level, hierarchypath)
AS
(-- Anchor member definition
SELECTg.child,
g.parent,
0 AS Level,
convert(varchar(max), g.child + '->' + g.parent) AS hierarchypath
FROM ChildrenAndParents AS g
WHERE child not in (select parent from ChildrenAndParents)
UNION ALL
-- Recursive member definition
SELECTg.child,
g.parent,
Level + 1,
hierarchypath + '->' + g.parent -- Add '-->...' to end when recursion found
+ Case When gm.hierarchypath like '%->'+g.parent+'->%' Then '-->...'
Else''
End
FROM ChildrenAndParents as g
INNER JOIN GroupMembers AS gm
ON gm.parentgroup = g.child
--Exclude if the hierarchypath text contains a recursion
where gm.hierarchypath not like '%->...'
)
which gives -
child parentgroup level hierarchypath
--------- ----------- ----------- -----------------------
a b 0 a->b
a c 1 a->b->c
a d 2 a->b->c->d
a b 3 a->b->c->d->b-->...
e b 0 e->b
e c 1 e->b->c
e d 2 e->b->c->d
e b 3 e->b->c->d->b-->...
-------
Something I can't get my head round - can anyone explain ...
    The result of running the first CTE before the circular relationship was added had extra rows reported that aren't there when the circular check is added. E.g.
child parentgroup level hierarchypath
--------- ----------- ----------- -----------------------
b c 0 b->c
c d 0 c->d
etc
Can anyone explain why they were there/why they were removed?
Andy
--------------------------------------------------------------
“Doubt is not a pleasant condition, but certainty is absurd.” Voltaire
January 31, 2011 at 10:38 am
andy.roberts (1/31/2011)
Something I can't get my head round - can anyone explain ...
    The result of running the first CTE before the circular relationship was added had extra rows reported that aren't there when the circular check is added. E.g.
child parentgroup level hierarchypath
--------- ----------- ----------- -----------------------
b c 0 b->c
c d 0 c->d
etc
Can anyone explain why they were there/why they were removed?
Andy
Since b and c both exist in the parent column of the underlying table, they shouldn't appear in the first pass. The only way I can get the extra rows you've mentioned into the CTE results is if I comment out the line
WHERE child not in (select parent from childrenandparents)
from the anchor definition. I don't suppose that could have happened whilst you were streamlining my CTE code, could it?
Semper in excretia, suus solum profundum variat
January 31, 2011 at 11:48 am
CELKO (1/31/2011)
Nice piece! Now let's go one step further. How do you write constraints to prevent cycles (and orphans) in an adjacency list model?The best I have done for the adjacency list in a TRIGGER with a loop which is what your recursive CTE is, under the covers. But the loop does not have a built-in limit of 32.
Orphans? No problem--that's what foreign key constraints are there for.
Cycles? A bit trickier, but easy enough. First step, a check constraint so that an employee can't be his or her own manager. (We'll say that the manager for the root node is NULL.) Next, a unique constraint on the employee column, so that the same employee can't appear in the hierarchy twice under different managers. That eliminates most cycles, but we still have a potential issue where we can insert the following data:
Mgr Emp
NULL A
A B
B C
And then do:
UPDATE tbl
SET Mgr = 'C'
WHERE Emp = 'B'
... getting rid of that requires a trigger; below is one I came up with when working on my book, "Expert SQL Server 2005 Development." The idea is to start at the updated node(s) and walk up the hierarchy toward the root. If, during that walk, we find a case where the parent "manager" is the same as the child "employee," we know that something is amiss:
CREATE TRIGGER tg_NoCycles
ON Employee_Temp
FOR UPDATE
AS
BEGIN
SET NOCOUNT ON
--Only check if the ManagerId column was updated
IF NOT UPDATE(ManagerId)
RETURN
--Avoid cycles
DECLARE @CycleExists BIT
SET @CycleExists = 0
--Traverse up the hierarchy toward the
--root node
;WITH e AS
(
SELECT EmployeeId, ManagerId
FROM inserted
UNION ALL
SELECT e.EmployeeId, et.ManagerId
FROM Employee_Temp et
JOIN e ON e.ManagerId = et.EmployeeId
WHERE
et.ManagerId IS NOT NULL
AND e.ManagerId <> e.EmployeeId
)
SELECT @CycleExists = 1
FROM e
WHERE e.ManagerId = e.EmployeeId
IF @CycleExists = 1
BEGIN
RAISERROR('The update introduced a cycle', 16, 1)
ROLLBACK
END
END
GO
--
Adam Machanic
whoisactive
January 31, 2011 at 11:55 am
James Stephens (1/31/2011)
It's 2011. Why the heck doesn't Microsoft store AD information in a true relational db to begin with?
The newest versions of AD use SQL Server (or at least some variation on SQL Server -- perhaps not a normal build) as the internal data store.
As to whether that's a "true relational DB," that's an entirely different question 😀
--
Adam Machanic
whoisactive
February 1, 2011 at 2:06 am
Adam Machanic (1/31/2011)
...Next, a unique constraint on the employee column, so that the same employee can't appear in the hierarchy twice under different managers...
This is correct for tree-like hierarchy only, which is not the case with AD adjacency model.
AD is rather network-like.
Viewing 15 posts - 1 through 15 (of 21 total)
You must be logged in to reply to this topic. Login to reply