June 19, 2015 at 3:07 am
Hi,
I have a bit of tricky question with regards to a query I want to write which tries to identify tree structure nodes that have more than one parent. This is to say that at any level of the tree structure the leaf in question should only have one parent. http://www.gigaflop.co.uk/comp/fig3_2_3-1.gif
Sample data: I have provided an example recordset below which has columns "Parent1", "Parent2", "Parent3", "Parent4" and "Parent5". In my real life situation I have a total of ten relationship columns.
Desired output: I have provided a select statement that outputs three columns, "ErrorRowNumber", "ErrorParent", "ErrorChild". This would be the desired output from the final query.
Logic of Query: The query would have identified the particular rows in the desired output because:
ErrorRowNumber 1 & 2 because Andy has three parents (see error row 10) and should only have one.
ErrorRowNumber 5 & 6 because Robert has two parents (Colin & Lewis) but should only have one.
ErrorRowNumber 10 As in error numbers 2 & 3, Andy should only have one parent but is shown here to have three in total, Patrick, John and Lewis
As you can see the query should be able to deal with scanning across multiple columns as well as rows.
Please see code below. All help hugly appreciated. 🙂
CREATE TABLE #Parents(RowNumber INT,Parent1 VARCHAR(20),Parent2 VARCHAR(20),Parent3 VARCHAR(20),Parent4 VARCHAR(20),Parent5 VARCHAR(20))
INSERT INTO #Parents SELECT 1, 'Lewis', 'Lewis', 'Lewis', 'Andy', 'Patrick'
INSERT INTO #Parents SELECT 2, 'Lewis', 'Lewis', 'John', 'Andy', 'Patrick'
INSERT INTO #Parents SELECT 3, 'Lewis', 'Lewis', 'Lewis', 'Andy', 'Patrick'
INSERT INTO #Parents SELECT 4, 'Lewis', 'Lewis', 'Andy', 'Colin', 'Colin'
INSERT INTO #Parents SELECT 5, 'Lewis', 'Andy', 'Colin', 'Colin', 'Robert'
INSERT INTO #Parents SELECT 6, 'Lewis', 'Lewis', 'Lewis', 'Robert', 'Robert'
INSERT INTO #Parents SELECT 7, 'Lewis', 'Andy', 'Andy', 'Andy', 'Patrick'
INSERT INTO #Parents SELECT 8, 'Lewis', 'Andy', 'Patrick', 'Patrick', 'Patrick'
SELECT Parent1, Parent2, Parent3, Parent4, Parent5 FROM #Parents
SELECT 1 AS ErrorRowNumber, 'Lewis' AS ErrorParent, 'Andy' as ErrorChild
UNION ALL
SELECT 2, 'John', 'Andy'
UNION ALL
SELECT 5, 'Colin', 'Robert'
UNION ALL
SELECT 6, 'Lewis', 'Robert'
DROP TABLE #parents
Many thanks.
June 19, 2015 at 7:27 am
Ahhh thanks for the data, that's much better.
I've come up with a solution but I'm not sure if it's necessarily the best. Also keep in mind this is based on the table only having 5 columns. So what I did was basically take each child parent pair and brought them together in two columns. This made it easier to check for pairs of matching children that had different parents.
It doesn't output quite as you'd hope and that was on purpose. It would be different if these rows had a unique ID to begin with but they don't. A row number here is meaningless when comparing to your REAL data. If you inserted an identifier you could easily change the query to output those ID's.
CREATE TABLE #Parents(RowNumber INT,Parent1 VARCHAR(20),Parent2 VARCHAR(20),Parent3 VARCHAR(20),Parent4 VARCHAR(20),Parent5 VARCHAR(20))
INSERT INTO #Parents SELECT 1, 'Lewis', 'Lewis', 'Lewis', 'Andy', 'Patrick'
INSERT INTO #Parents SELECT 2, 'Lewis', 'Lewis', 'John', 'Andy', 'Patrick'
INSERT INTO #Parents SELECT 3, 'Lewis', 'Lewis', 'Lewis', 'Andy', 'Patrick'
INSERT INTO #Parents SELECT 4, 'Lewis', 'Lewis', 'Andy', 'Colin', 'Colin'
INSERT INTO #Parents SELECT 5, 'Lewis', 'Andy', 'Colin', 'Colin', 'Robert'
INSERT INTO #Parents SELECT 6, 'Lewis', 'Lewis', 'Lewis', 'Robert', 'Robert'
INSERT INTO #Parents SELECT 7, 'Lewis', 'Andy', 'Andy', 'Andy', 'Patrick'
INSERT INTO #Parents SELECT 8, 'Lewis', 'Andy', 'Patrick', 'Patrick', 'Patrick'
SELECT Parent1, Parent2, Parent3, Parent4, Parent5 FROM #Parents
DECLARE @working TABLE (Parent VARCHAR(20), Child VARCHAR(20), RowNumber INT)
INSERT INTO @working SELECT Parent1 AS Parent, Parent2 AS Child, RowNumber FROM #Parents
INSERT INTO @working SELECT Parent2 AS Parent, Parent3 AS Child, RowNumber FROM #Parents
INSERT INTO @working SELECT Parent3 AS Parent, Parent4 AS Child, RowNumber FROM #Parents
INSERT INTO @working SELECT Parent4 AS Parent, Parent5 AS Child, RowNumber FROM #Parents
SELECT
Parent,
Child,
RowNumber,
LAG(Parent, 1, 'None') OVER (PARTITION BY Child ORDER BY Parent) AS CheckParent,
LAG(RowNumber, 1, 0) OVER (PARTITION BY Child ORDER BY Parent) AS CheckRowNum
INTO #Final
FROM
@working
WHERE
Parent <> Child
SELECT Parent, CheckParent AS OtherParent, Child, CAST(CheckRowNum AS VARCHAR) + '/' + CAST(RowNumber AS VARCHAR) AS ProblemRowNums FROM #Final WHERE CheckParent <> 'None' AND CheckParent <> Parent
DROP TABLE #Final
DROP TABLE #parents
Cheers,
June 19, 2015 at 7:33 am
Actually I didn't notice you have added a row number as I had started working on the problem from your first post. Give me a bit and I should be able to fix it up. Does your original table have an ID column?
June 19, 2015 at 7:44 am
Hi,
There is a unique Id column in the original table, but is not nessarily incremental, but is unique.
Thanks,
Lewis
June 19, 2015 at 7:51 am
Ok, well try my updated solution. The one problem I can see though is doesn't account for multiples problems relating to the same child. In your example you listed RowNumber 1 & 2 as being a problem but my output shows 2 & 3 which is correct if you look at your data because 1 & 3 have the same conflicts. I'll have to dwell on that one a bit.
June 19, 2015 at 7:54 am
Thanks for your help, however I get this message
'LAG' is not a recognized built-in function name.
June 19, 2015 at 8:02 am
lewisdow123 (6/19/2015)
Thanks for your help, however I get this message'LAG' is not a recognized built-in function name.
Did you post in the wrong forum? I assumed you were using SQL 2012. LEAD and LAG functions were introduced in that version.
June 19, 2015 at 8:19 am
I could kick myself, I am running 2008 R2... so sorry
June 19, 2015 at 8:23 am
lewisdow123 (6/19/2015)
I could kick myself, I am running 2008 R2... so sorry
Heh... not to worry. We wanna kick you, too! 😀
Here's a solution that will work in any version from 2005 and up. Details are in the comments.
WITH
cteUnpivot AS
( --=== Un-pivot Parent/Child Pairs and rank the pairs according to who the parent is for each child
SELECT p.RowNumber
,ca.*
,PairOccurance = ROw_NUMBER()OVER(PARTITION BY ca.Child,ca.Parent ORDER BY ca.Parent)
,ParentOccurance = DENSE_RANK()OVER(PARTITION BY ca.Child ORDER BY ca.Parent)
FROM #Parents p
CROSS APPLY
(
SELECT Parent1, Parent2, 1 UNION ALL
SELECT Parent2, Parent3, 2 UNION ALL
SELECT Parent3, Parent4, 3 UNION ALL
SELECT Parent4, Parent5, 4
) ca (Parent,Child,PairNumber) --Note that PairNumber will help you visually find the data in the original table.
WHERE ca.Parent <> ca.Child
) --=== Return the RowNumber and PairNumber of every row where the Child has more than 1 parent anywhere in the table.
SELECT *
FROM cteUnpivot
WHERE Child IN (SELECT Child FROM cteUnpivot WHERE ParentOccurance = 2) --This means "at least 2 occurances
ORDER BY Child,Parent,RowNumber
;
--Jeff Moden
Change is inevitable... Change for the better is not.
June 19, 2015 at 8:33 am
Thank you very much, will explore it now.
June 19, 2015 at 9:03 am
lewisdow123 (6/19/2015)
I could kick myself, I am running 2008 R2... so sorry
LOL...well for what it's worth here is the final edit to my solution (for 2012) which has the output in the format you expected.
DECLARE @working TABLE (Parent VARCHAR(20), Child VARCHAR(20), RowNumber INT)
INSERT INTO @working SELECT Parent1 AS Parent, Parent2 AS Child, RowNumber FROM #Parents
INSERT INTO @working SELECT Parent2 AS Parent, Parent3 AS Child, RowNumber FROM #Parents
INSERT INTO @working SELECT Parent3 AS Parent, Parent4 AS Child, RowNumber FROM #Parents
INSERT INTO @working SELECT Parent4 AS Parent, Parent5 AS Child, RowNumber FROM #Parents
SELECT
Parent,
Child,
RowNumber,
LEAD(Parent, 1, 'None') OVER (PARTITION BY Child ORDER BY Parent) AS CheckChild,
LAG(Parent, 1, 'None') OVER (PARTITION BY Child ORDER BY Parent) AS CheckParent
INTO #Final
FROM
@working
WHERE
Parent <> Child
SELECT * FROM #Final
SELECT RowNumber AS ErrorRowNumber, Parent AS ErrorParent, Child AS ErrorChild FROM #Final WHERE CheckParent <> 'None' AND CheckParent <> Parent
UNION ALL
SELECT RowNumber AS ErrorRowNumber, Parent AS ErrorParent, Child AS ErrorChild FROM #Final WHERE CheckChild <> 'None' AND CheckChild <> Parent ORDER BY ErrorRowNumber
DROP TABLE #Final
June 19, 2015 at 9:37 am
Thank you
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply