Tricky hierarchy query

  • Hi,

    I have a bit of tricky question with regards to a query I want to write which tries to identify tree structure nodes that have more than one parent. This is to say that at any level of the tree structure the leaf in question should only have one parent. http://www.gigaflop.co.uk/comp/fig3_2_3-1.gif

    Sample data: I have provided an example recordset below which has columns "Parent1", "Parent2", "Parent3", "Parent4" and "Parent5". In my real life situation I have a total of ten relationship columns.

    Desired output: I have provided a select statement that outputs three columns, "ErrorRowNumber", "ErrorParent", "ErrorChild". This would be the desired output from the final query.

    Logic of Query: The query would have identified the particular rows in the desired output because:

    ErrorRowNumber 1 & 2 because Andy has three parents (see error row 10) and should only have one.

    ErrorRowNumber 5 & 6 because Robert has two parents (Colin & Lewis) but should only have one.

    ErrorRowNumber 10 As in error numbers 2 & 3, Andy should only have one parent but is shown here to have three in total, Patrick, John and Lewis

    As you can see the query should be able to deal with scanning across multiple columns as well as rows.

    Please see code below. All help hugly appreciated. 🙂

    CREATE TABLE #Parents(RowNumber INT,Parent1 VARCHAR(20),Parent2 VARCHAR(20),Parent3 VARCHAR(20),Parent4 VARCHAR(20),Parent5 VARCHAR(20))

    INSERT INTO #Parents SELECT 1, 'Lewis', 'Lewis', 'Lewis', 'Andy', 'Patrick'

    INSERT INTO #Parents SELECT 2, 'Lewis', 'Lewis', 'John', 'Andy', 'Patrick'

    INSERT INTO #Parents SELECT 3, 'Lewis', 'Lewis', 'Lewis', 'Andy', 'Patrick'

    INSERT INTO #Parents SELECT 4, 'Lewis', 'Lewis', 'Andy', 'Colin', 'Colin'

    INSERT INTO #Parents SELECT 5, 'Lewis', 'Andy', 'Colin', 'Colin', 'Robert'

    INSERT INTO #Parents SELECT 6, 'Lewis', 'Lewis', 'Lewis', 'Robert', 'Robert'

    INSERT INTO #Parents SELECT 7, 'Lewis', 'Andy', 'Andy', 'Andy', 'Patrick'

    INSERT INTO #Parents SELECT 8, 'Lewis', 'Andy', 'Patrick', 'Patrick', 'Patrick'

    SELECT Parent1, Parent2, Parent3, Parent4, Parent5 FROM #Parents

    SELECT 1 AS ErrorRowNumber, 'Lewis' AS ErrorParent, 'Andy' as ErrorChild

    UNION ALL

    SELECT 2, 'John', 'Andy'

    UNION ALL

    SELECT 5, 'Colin', 'Robert'

    UNION ALL

    SELECT 6, 'Lewis', 'Robert'

    DROP TABLE #parents

    Many thanks.

  • Ahhh thanks for the data, that's much better.

    I've come up with a solution but I'm not sure if it's necessarily the best. Also keep in mind this is based on the table only having 5 columns. So what I did was basically take each child parent pair and brought them together in two columns. This made it easier to check for pairs of matching children that had different parents.

    It doesn't output quite as you'd hope and that was on purpose. It would be different if these rows had a unique ID to begin with but they don't. A row number here is meaningless when comparing to your REAL data. If you inserted an identifier you could easily change the query to output those ID's.

    CREATE TABLE #Parents(RowNumber INT,Parent1 VARCHAR(20),Parent2 VARCHAR(20),Parent3 VARCHAR(20),Parent4 VARCHAR(20),Parent5 VARCHAR(20))

    INSERT INTO #Parents SELECT 1, 'Lewis', 'Lewis', 'Lewis', 'Andy', 'Patrick'

    INSERT INTO #Parents SELECT 2, 'Lewis', 'Lewis', 'John', 'Andy', 'Patrick'

    INSERT INTO #Parents SELECT 3, 'Lewis', 'Lewis', 'Lewis', 'Andy', 'Patrick'

    INSERT INTO #Parents SELECT 4, 'Lewis', 'Lewis', 'Andy', 'Colin', 'Colin'

    INSERT INTO #Parents SELECT 5, 'Lewis', 'Andy', 'Colin', 'Colin', 'Robert'

    INSERT INTO #Parents SELECT 6, 'Lewis', 'Lewis', 'Lewis', 'Robert', 'Robert'

    INSERT INTO #Parents SELECT 7, 'Lewis', 'Andy', 'Andy', 'Andy', 'Patrick'

    INSERT INTO #Parents SELECT 8, 'Lewis', 'Andy', 'Patrick', 'Patrick', 'Patrick'

    SELECT Parent1, Parent2, Parent3, Parent4, Parent5 FROM #Parents

    DECLARE @working TABLE (Parent VARCHAR(20), Child VARCHAR(20), RowNumber INT)

    INSERT INTO @working SELECT Parent1 AS Parent, Parent2 AS Child, RowNumber FROM #Parents

    INSERT INTO @working SELECT Parent2 AS Parent, Parent3 AS Child, RowNumber FROM #Parents

    INSERT INTO @working SELECT Parent3 AS Parent, Parent4 AS Child, RowNumber FROM #Parents

    INSERT INTO @working SELECT Parent4 AS Parent, Parent5 AS Child, RowNumber FROM #Parents

    SELECT

    Parent,

    Child,

    RowNumber,

    LAG(Parent, 1, 'None') OVER (PARTITION BY Child ORDER BY Parent) AS CheckParent,

    LAG(RowNumber, 1, 0) OVER (PARTITION BY Child ORDER BY Parent) AS CheckRowNum

    INTO #Final

    FROM

    @working

    WHERE

    Parent <> Child

    SELECT Parent, CheckParent AS OtherParent, Child, CAST(CheckRowNum AS VARCHAR) + '/' + CAST(RowNumber AS VARCHAR) AS ProblemRowNums FROM #Final WHERE CheckParent <> 'None' AND CheckParent <> Parent

    DROP TABLE #Final

    DROP TABLE #parents

    Cheers,


    SELECT quote FROM brain WHERE original = 1
    0 rows returned

  • Actually I didn't notice you have added a row number as I had started working on the problem from your first post. Give me a bit and I should be able to fix it up. Does your original table have an ID column?


    SELECT quote FROM brain WHERE original = 1
    0 rows returned

  • Hi,

    There is a unique Id column in the original table, but is not nessarily incremental, but is unique.

    Thanks,

    Lewis

  • Ok, well try my updated solution. The one problem I can see though is doesn't account for multiples problems relating to the same child. In your example you listed RowNumber 1 & 2 as being a problem but my output shows 2 & 3 which is correct if you look at your data because 1 & 3 have the same conflicts. I'll have to dwell on that one a bit.


    SELECT quote FROM brain WHERE original = 1
    0 rows returned

  • Thanks for your help, however I get this message

    'LAG' is not a recognized built-in function name.

  • lewisdow123 (6/19/2015)


    Thanks for your help, however I get this message

    'LAG' is not a recognized built-in function name.

    Did you post in the wrong forum? I assumed you were using SQL 2012. LEAD and LAG functions were introduced in that version.


    SELECT quote FROM brain WHERE original = 1
    0 rows returned

  • I could kick myself, I am running 2008 R2... so sorry

  • lewisdow123 (6/19/2015)


    I could kick myself, I am running 2008 R2... so sorry

    Heh... not to worry. We wanna kick you, too! 😀

    Here's a solution that will work in any version from 2005 and up. Details are in the comments.

    WITH

    cteUnpivot AS

    ( --=== Un-pivot Parent/Child Pairs and rank the pairs according to who the parent is for each child

    SELECT p.RowNumber

    ,ca.*

    ,PairOccurance = ROw_NUMBER()OVER(PARTITION BY ca.Child,ca.Parent ORDER BY ca.Parent)

    ,ParentOccurance = DENSE_RANK()OVER(PARTITION BY ca.Child ORDER BY ca.Parent)

    FROM #Parents p

    CROSS APPLY

    (

    SELECT Parent1, Parent2, 1 UNION ALL

    SELECT Parent2, Parent3, 2 UNION ALL

    SELECT Parent3, Parent4, 3 UNION ALL

    SELECT Parent4, Parent5, 4

    ) ca (Parent,Child,PairNumber) --Note that PairNumber will help you visually find the data in the original table.

    WHERE ca.Parent <> ca.Child

    ) --=== Return the RowNumber and PairNumber of every row where the Child has more than 1 parent anywhere in the table.

    SELECT *

    FROM cteUnpivot

    WHERE Child IN (SELECT Child FROM cteUnpivot WHERE ParentOccurance = 2) --This means "at least 2 occurances

    ORDER BY Child,Parent,RowNumber

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you very much, will explore it now.

  • lewisdow123 (6/19/2015)


    I could kick myself, I am running 2008 R2... so sorry

    LOL...well for what it's worth here is the final edit to my solution (for 2012) which has the output in the format you expected.

    DECLARE @working TABLE (Parent VARCHAR(20), Child VARCHAR(20), RowNumber INT)

    INSERT INTO @working SELECT Parent1 AS Parent, Parent2 AS Child, RowNumber FROM #Parents

    INSERT INTO @working SELECT Parent2 AS Parent, Parent3 AS Child, RowNumber FROM #Parents

    INSERT INTO @working SELECT Parent3 AS Parent, Parent4 AS Child, RowNumber FROM #Parents

    INSERT INTO @working SELECT Parent4 AS Parent, Parent5 AS Child, RowNumber FROM #Parents

    SELECT

    Parent,

    Child,

    RowNumber,

    LEAD(Parent, 1, 'None') OVER (PARTITION BY Child ORDER BY Parent) AS CheckChild,

    LAG(Parent, 1, 'None') OVER (PARTITION BY Child ORDER BY Parent) AS CheckParent

    INTO #Final

    FROM

    @working

    WHERE

    Parent <> Child

    SELECT * FROM #Final

    SELECT RowNumber AS ErrorRowNumber, Parent AS ErrorParent, Child AS ErrorChild FROM #Final WHERE CheckParent <> 'None' AND CheckParent <> Parent

    UNION ALL

    SELECT RowNumber AS ErrorRowNumber, Parent AS ErrorParent, Child AS ErrorChild FROM #Final WHERE CheckChild <> 'None' AND CheckChild <> Parent ORDER BY ErrorRowNumber

    DROP TABLE #Final


    SELECT quote FROM brain WHERE original = 1
    0 rows returned

  • Thank you

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply