building groups out of references / eliminate cross references

  • Hi all,

    I got the following problem to solve in TSQL. I don't want to use a cursor. But with the set based solution I am stuck. Maybe someone knows a solution for the missing part. Or even a complete different solution.

    Here is my problem:

    DECLARE @Tmp TABLE (CustomerID INT, CustomerLink INT, PRIMARY KEY(CustomerID))

    INSERT @Tmp

    VALUES(100001,0)

    ,(100002,100001)

    ,(100003,100001)

    ,(100004,100003)

    ,(100005,100006)

    ,(100006,100005)

    ,(100007,100006)

    ,(100008,0)

    ,(100009,100008)

    ,(100010,100013)

    ,(100011,100008)

    ,(100012,100011)

    ,(100013,100012)

    ,(100014,0)

    ,(100015,100016)

    ,(100016,100015)

    /*

    building groups out of the couples ("chaining" is possilbe)

    desired result:

    [CustomerID of a group member],[smallest CustomerID per Group]

    |(GroupID)|CustomerID|CustomerLink|

    |100001 |100001 |100001 |

    |100001 |100002 |100001 |

    |100001 |100003 |100001 |

    |100001 |100004 |100001 |

    |100005 |100005 |100005 |

    |100005 |100006 |100005 |

    |100005 |100007 |100005 |

    |100008 |100008 |100008 |

    |100008 |100009 |100008 |

    |100008 |100010 |100008 |

    |100008 |100011 |100008 |

    |100008 |100012 |100008 |

    |100008 |100013 |100008 |

    |100014 |100014 |100014 |

    |100015 |100015 |100015 |

    |100016 |100015 |100015 |

    */

    SELECT * FROM @Tmp

    --Set CustomerLink to CustomerID when missing (it's a group for itself)

    UPDATE @Tmp SET CustomerLink = CustomerID WHERE CustomerLink = 0

    --Find and eliminate crossreferences

    --->>>--- Missing Part ---<<<---

    --update CustomerLink

    WHILE EXISTS (

    SELECT 1

    FROM @Tmp tmp1

    JOIN @Tmp tmp2

    ON tmp2.CustomerID = tmp1.CustomerLink

    WHERE tmp2.CustomerLink <> tmp1.CustomerLink

    )

    BEGIN

    UPDATE tmp1

    SET tmp1.CustomerLink = tmp2.CustomerLink

    FROM @Tmp tmp1

    JOIN @Tmp tmp2

    ON tmp2.CustomerID = tmp1.CustomerLink

    WHERE tmp2.CustomerLink <> tmp1.CustomerLink

    END

    --check result

    SELECT * FROM @Tmp

    /*

    doesn't work for crossreferences :.(

    RESULTSET:

    CustomerID CustomerLink

    100001 100001

    100002 100001

    100003 100001

    100004 100001

    100005 100005

    100006 100006 --wrong

    100007 100005

    100008 100008

    100009 100008

    100010 100008

    100011 100008

    100012 100008

    100013 100008

    100014 100014

    100015 100015

    100016 100016 --wrong

    */

    Thanks for any help with this little 'quiz'

    Reto E.

  • DECLARE @Tmp TABLE (CustomerID INT, CustomerLink INT, PRIMARY KEY(CustomerID))

    INSERT @Tmp VALUES

    (100001,0),(100002,100001),(100003,100001),(100004,100003),

    (100005,100006),(100006,100005),(100007,100006),(100008,0),

    (100009,100008),(100010,100013),(100011,100008),(100012,100011),

    (100013,100012),(100014,0),(100015,100016),(100016,100015)

    ;WITH FilledData AS (

    SELECT CustomerID, CustomerLink = ISNULL(NULLIF(CustomerLink,0),CustomerID) FROM @Tmp

    )

    SELECT x.GroupID, f.CustomerID, f.CustomerLink

    FROM FilledData f

    CROSS APPLY (

    SELECT GroupID = MIN(CustomerID)

    FROM FilledData fi

    WHERE fi.CustomerLink = f.CustomerLink

    ) x

    ORDER BY f.CustomerID

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thx ChrisM for your CROSS APPLY approach. I tried something like this, but sometimes cross apply is a bit misterious to me ...

    Your solution doesn't work, because you can't assume that MIN(CustomerID) will give you the correct GroupID.

    Your qurey returns GroupID 100004 for CustomerID 100004. But it should be GroupID 100001 because the CustomerLink is to CustomerID 100003 and CustomerID is linked to GroupID 100001.

    The CustomerLink is not always set to the correct GroupID. There can be more steps in between.

    Greetings Reto

  • reto.eggenberger (10/7/2014)


    Thx ChrisM for your CROSS APPLY approach. I tried something like this, but sometimes cross apply is a bit misterious to me ...

    Your solution doesn't work, because you can't assume that MIN(CustomerID) will give you the correct GroupID.

    Your qurey returns GroupID 100004 for CustomerID 100004. But it should be GroupID 100001 because the CustomerLink is to CustomerID 100003 and CustomerID is linked to GroupID 100001.

    The CustomerLink is not always set to the correct GroupID. There can be more steps in between.

    Greetings Reto

    Ah, it's recursive. Ok...

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Yes. But you don't know the level of the CustomerID because you only got a CustomerLink.

    (See attachment for the actual hierarchy)

  • The problem with cross reference is that it leads to indefinite recurisvenes...

    So, solution would be to resolve corss reference before going to recursive query.

    I have added one more test case. If you wish to display original CustomerLink, resolve the cross reference into another temp table, change recursive CTE to use it and join back to original @Tmp on CustomerId in the final query to pick up original CustomerLink it from there.

    DECLARE @Tmp TABLE (CustomerID INT, CustomerLink INT, PRIMARY KEY(CustomerID))

    INSERT @Tmp

    VALUES (100001,0)

    ,(100002,100001)

    ,(100003,100001)

    ,(100004,100003)

    ,(100005,100006)

    ,(100006,100005)

    ,(100007,100006)

    ,(100008,0)

    ,(100009,100008)

    ,(100010,100013)

    ,(100011,100008)

    ,(100012,100011)

    ,(100013,100012)

    ,(100014,0)

    ,(100015,100016)

    ,(100016,100015)

    ,(100020,100022)

    ,(100021,100020)

    ,(100022,0)

    UPDATE t1 SET CustomerLink = 0

    FROM @Tmp t1

    JOIN @Tmp t2

    ON t2.CustomerID = t1.CustomerLink

    AND t2.CustomerLink = t1.CustomerID

    WHERE t1.CustomerID < t2.CustomerID

    ;WITH linkedC

    AS

    (

    SELECT CustomerId, CustomerLink, NEWID() GRP

    FROM @Tmp

    WHERE CustomerLink = 0

    UNION ALL

    SELECT t.CustomerId, t.CustomerLink, GRP

    FROM @Tmp t

    JOIN linkedC l ON l.CustomerId = t.CustomerLink

    )

    SELECT MIN(CustomerId) OVER (PARTITION BY GRP) AS GroupID

    ,CustomerId

    ,CustomerLink

    FROM linkedC

    ORDER BY GroupId, CustomerId

    Actually, you don't need to use NEWID() function for pre-grouping, CustomerID would work as well eg:

    ...

    SELECT CustomerId, CustomerLink, CustomerId GRP

    FROM @Tmp

    WHERE CustomerLink = 0

    ...

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Thanks! Eliminating the cross reference did the job. Your query delivers exactly what I was trying to achieve.

  • To identify circles

    DECLARE @Tmp TABLE (CustomerID INT, CustomerLink INT, PRIMARY KEY(CustomerID))

    INSERT @Tmp

    VALUES(100001,0)

    ,(100002,100001)

    ,(100003,100001)

    ,(100004,100003)

    ,(100005,100006)

    ,(100006,100005)

    ,(100007,100006)

    ,(100008,0)

    ,(100009,100008)

    ,(100010,100013)

    ,(100011,100008)

    ,(100012,100011)

    ,(100013,100012)

    ,(100014,0)

    ,(100015,100016)

    ,(100016,100015)

    ;

    WITH childrenandparents (child,parent) AS

    (select '*'+CAST (CustomerID as varchar(20)) child, '*'+CAST (CustomerLink as varchar(20)) parent

    from @Tmp) ,

    GroupMembers (Bottom, Child, ParentGroup, Level, hierarchypath)

    AS

    (

    -- Anchor member definition

    SELECT g.child as Bottom, g.child, g.parent, 0 AS Level, convert(varchar(max), g.child + '->' + g.parent) AS hierarchypath

    FROM childrenandparents AS g

    UNION ALL

    -- Recursive member definition

    SELECT gm.bottom, g.child, g.parent, Level + 1, hierarchypath + '->' + g.parent

    FROM childrenandparents as g

    INNER JOIN GroupMembers AS gm

    ON gm.parentgroup = g.child

    where hierarchypath not like '%'+g.child +'->' + g.parent + '%'

    )

    select bottom, parentgroup, level, hierarchypath

    from groupmembers

    where hierarchypath like '%'+parentgroup + '->' + '%'

    order by level desc

    option(maxrecursion 10);

    To eliminate circles we need a criteria which is correct of contradicting (100005,100006) ,(100006,100005).

  • Eugene Elutin (10/7/2014)

    UPDATE t1 SET CustomerLink = 0

    FROM @Tmp t1

    JOIN @Tmp t2

    ON t2.CustomerID = t1.CustomerLink

    AND t2.CustomerLink = t1.CustomerID

    WHERE t1.CustomerID < t2.CustomerID

    ...

    Note this will not eliminate longer cross reference circles. Consider

    ,(100005,100006)

    ,(100006,100007)

    ,(100007,100005)

  • serg-52 (10/7/2014)


    Eugene Elutin (10/7/2014)

    UPDATE t1 SET CustomerLink = 0

    FROM @Tmp t1

    JOIN @Tmp t2

    ON t2.CustomerID = t1.CustomerLink

    AND t2.CustomerLink = t1.CustomerID

    WHERE t1.CustomerID < t2.CustomerID

    ...

    Note this will not eliminate longer cross reference circles. Consider

    ,(100005,100006)

    ,(100006,100007)

    ,(100007,100005)

    Yep, for more complicated circualr reference (eg: (100105,100107),(100106,100107),(100107,100105))

    , you need much more comrehensive resolution:

    DECLARE @Tmp TABLE (CustomerID INT, CustomerLink INT, PRIMARY KEY(CustomerID))

    INSERT @Tmp

    VALUES (100001,0)

    ,(100002,100001)

    ,(100003,100001)

    ,(100004,100003)

    ,(100008,0)

    ,(100009,100008)

    ,(100010,100013)

    ,(100011,100008)

    ,(100012,100011)

    ,(100013,100012)

    ,(100014,0)

    ,(100020,100022)

    ,(100021,100020)

    ,(100022,0)

    -- 1-level circular reference:

    ,(100015,100016)

    ,(100016,100015)

    -- 2-level circular reference:

    ,(100005,100006)

    ,(100006,100007)

    ,(100007,100005)

    -- mixed 2 level circular reference:

    ,(100105,100107)

    ,(100106,100107)

    ,(100107,100105)

    -- comprehensive circular reference resolution

    ;WITH circleL

    AS

    (

    SELECT CustomerId, CustomerLink, ',' + CAST(CustomerId AS NVARCHAR(MAX)) + ',' AllIds

    FROM @Tmp

    UNION ALL

    SELECT l.CustomerId, t.CustomerLink, AllIds + ',' + CAST(t.CustomerId AS NVARCHAR(MAX)) + ','

    FROM @Tmp t

    JOIN circleL l ON l.CustomerLink = t.CustomerId

    WHERE l.AllIds NOT LIKE '%,' + CAST(t.CustomerId AS NVARCHAR(MAX)) + ',%'

    )

    UPDATE T SET CustomerLink = 0

    FROM @Tmp T

    WHERE CustomerId IN (SELECT DISTINCT Q.MinId

    FROM circleL L1

    CROSS APPLY (SELECT MIN(CustomerId) FROM circleL L2 WHERE L2.AllIds LIKE '%,' + CAST(L1.CustomerId AS NVARCHAR(MAX)) + ',%') Q(MinId)

    WHERE L1.CustomerId = L1.CustomerLink AND L1.CustomerLink != 0)

    -- final recursive cte

    ;WITH linkedC

    AS

    (

    SELECT CustomerId, CustomerLink, CustomerId GRP

    FROM @Tmp

    WHERE CustomerLink = 0

    UNION ALL

    SELECT t.CustomerId, t.CustomerLink, GRP

    FROM @Tmp t

    JOIN linkedC l ON l.CustomerId = t.CustomerLink

    )

    SELECT MIN(CustomerId) OVER (PARTITION BY GRP) AS GroupID

    ,CustomerId

    ,CustomerLink

    FROM linkedC

    ORDER BY GroupId, CustomerId

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Hi both

    Thanks again!

    The latest solution works great. I gonna test it with our real-world data.

    Regards, Reto E.

  • Hi both,

    Thanks again!

    The latest solution works great. I gonna test it with our real-world data.

    Regards, Reto E.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply