Recursion - find last child

  • Never done recursion in MS SQL.

    Looked at few samples using CTE, but haven't found what I am looking for.

    I have a table that provides a cross reference of ID's (OldID and NewID).

    Starting with the OldID, I need to know what is the last NewID.

    Table could look something like this.

    OldID NewID

    1 12

    5 23

    7 34

    23 50

    34 55

    55 60

    This shows OldID 1 changed to NewID 12

    And so on, however, an Id can change more than once.

    Like OldID 5 went from 5 to 23, then 23 to 50.

    I need a list of the OldID and the last NewID.

    List would look like this:

    OldID NewID

    1 12

    5 50

    7 60

    23 50

    34 60

    55 60

    Note that 7 changed to 34 which changed to 55 which changed to 60. Result is OldID = 5 and Last NewID = 60

    Not even sure this can be done? Let me know.

    Thanks

  • I'm sure there are fancier and better ways, but here is a simple method with some self-joins (and a test table):

    CREATE TABLE #RefIDs (OldID int, NewerID int)

    INSERT INTO #RefIDs

    VALUES

    (1,12),

    (5,23),

    (7,34),

    (23,50),

    (34,55),

    (55,60)

    SELECT

    o1.OldID

    ,NewestID=coalesce(o3.newerID, o2.newerID, o1.newerID)

    FROM

    #RefIDs o1

    left outer join #RefIDs o2 on o2.newerID = o1.oldid

    left outer join #RefIDs o3 on o3.OldID = o1.NewerID

    [font="Courier New"]Looking for a Deadlock Victim Support Group..[/font]
  • Some DDL and a test set. Alas, I lost some valuable time writing this while you could have easily provided it yourself 🙁 -see the posting etiquette link in my signature for more info-

    create table #cref (

    OldID int not null,

    NewID int not null

    );

    insert #cref(OldID, NewID)

    values

    (1, 12),

    (5, 23),

    (7, 34),

    (23, 50),

    (34, 55),

    (55, 60);

    First of all I think you need some additional unique identification to trace the changes for your data. This unique id must be logged with each update. Without such a unique id, you will easily end up in problems. Like for example endless loops and inter-twined update paths.

    But since this is only an example I'll answer your question: it is very well possible to do this using a recursive cte. For example:

    with cteSteps as (

    select c1.OldID as OriginalID, c1.OldID, c1.NewID, 1 as lvl

    from #cref c1

    union all

    select cte.OriginalID, c2.OldID, c2.NewID, cte.lvl + 1

    from cteSteps cte

    inner join #cref c2 on (c2.OldID = cte.NewID)

    ),

    cteNumbered as (

    select row_number() over (partition by OriginalID order by lvl desc) as nr,

    *

    from cteSteps

    )

    select OriginalID as OldID, NewID

    from cteNumbered

    where nr = 1

    OldID NewID

    ----------- -----------

    1 12

    5 50

    7 60

    23 50

    34 60

    55 60

    (6 row(s) affected)

    -- edit: forgot the desc sort order on lvl... 🙁



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

  • Thanks to R.P.Rozema for the sample data that the OP "forgot" 😉

    Here's another way:

    WITH CTE

    AS (SELECT OldID, NewID, NewID AS CurrentID

    FROM #cref

    UNION ALL

    SELECT a.OldID, a.NewID, b.NewID

    FROM CTE a

    INNER JOIN #cref b ON a.CurrentID = b.OldID)

    SELECT OldID, MAX(CurrentID) AS NewID

    FROM CTE

    GROUP BY OldID;

    Returns: -

    OldID NewID

    ----------- -----------

    1 12

    5 50

    7 60

    23 50

    34 60

    55 60

    They're identical on this small data-set.

    BEGIN TRAN

    SET NOCOUNT ON;

    CREATE TABLE #cref (OldID INT NOT NULL, NewID INT NOT NULL);

    INSERT #cref (OldID, NewID)

    VALUES (1, 12), (5, 23), (7, 34), (23, 50), (34, 55), (55, 60);

    PRINT REPLICATE('-',80);

    PRINT 'Cad Method';

    PRINT REPLICATE('-',80);

    DBCC FREEPROCCACHE;

    DBCC DROPCLEANBUFFERS;

    SET STATISTICS IO ON;

    SET STATISTICS TIME ON;

    WITH CTE

    AS (SELECT OldID, NewID, NewID AS CurrentID

    FROM #cref

    UNION ALL

    SELECT a.OldID, a.NewID, b.NewID

    FROM CTE a

    INNER JOIN #cref b ON a.CurrentID = b.OldID)

    SELECT OldID, MAX(CurrentID) AS NewID

    FROM CTE

    GROUP BY OldID;

    SET STATISTICS TIME OFF;

    SET STATISTICS IO OFF;

    PRINT REPLICATE('-',80);

    PRINT 'R.P.Rozema Method';

    PRINT REPLICATE('-',80);

    DBCC FREEPROCCACHE;

    DBCC DROPCLEANBUFFERS;

    SET STATISTICS IO ON;

    SET STATISTICS TIME ON;

    with cteSteps as (

    select c1.OldID as OriginalID, c1.OldID, c1.NewID, 1 as lvl

    from #cref c1

    union all

    select cte.OriginalID, c2.OldID, c2.NewID, cte.lvl + 1

    from cteSteps cte

    inner join #cref c2 on (c2.OldID = cte.NewID)

    ),

    cteNumbered as (

    select row_number() over (partition by OriginalID order by lvl desc) as nr,

    *

    from cteSteps

    )

    select OriginalID as OldID, NewID

    from cteNumbered

    where nr = 1

    SET STATISTICS TIME OFF;

    SET STATISTICS IO OFF;

    ROLLBACK

    --------------------------------------------------------------------------------

    Cad Method

    --------------------------------------------------------------------------------

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    Table 'Worktable'. Scan count 2, logical reads 51, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#cref_______________________________________________________________________________________________________________0000000000E8'. Scan count 2, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    --------------------------------------------------------------------------------

    R.P.Rozema Method

    --------------------------------------------------------------------------------

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    Table 'Worktable'. Scan count 2, logical reads 51, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#cref_______________________________________________________________________________________________________________0000000000E8'. Scan count 2, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • The Cad-method assumes that the newID always higher is than the oldID. This may very well be true, given that the ID is likely implemented as an identity column, but it was not mentioned in the requirements. 😉



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

  • R.P.Rozema (3/7/2012)


    The Cad-method assumes that the newID always higher is than the oldID. This may very well be true, given that the ID is likely implemented as an identity column, but it was not mentioned in the requirements. 😉

    Very true. I've made assumptions based on the limited sample data that the OP provided, which is why we normally insist on readily consumable DDL and sample data from the OP.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • I guess I don't understand why you need recursion for this at all. If the NEWID doesn't appear in the OLDID column, then it is a leaf level node.

    {EDIT}... Never mind. I didn't read the original problem close enough.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (3/7/2012)


    I guess I don't understand why you need recursion for this at all. If the NEWID doesn't appear in the OLDID column, then it is a leaf level node.

    I've interpreted it as he wants the final child of each ID.

    So, taking OldID 7.

    7 has a child of 34.

    34 has a child of 55.

    55 has a child of 60.

    Final child of 5 7 is 60, so we'd want OldID 5 7 and NewID 60.

    I've then assumed that the NewID is always higher than the OldID, which may have been a mistake and needs clarifying by the OP.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Hang on a minute while I get the egg off my face. 🙂 I saw that just before you posted and corrected my post.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Burninator

    This is good stuff! I never thought about just joining to the same table. 😛

    Thanks for the idea!!

    Burninator (3/6/2012)


    I'm sure there are fancier and better ways, but here is a simple method with some self-joins (and a test table):

    CREATE TABLE #RefIDs (OldID int, NewerID int)

    INSERT INTO #RefIDs

    VALUES

    (1,12),

    (5,23),

    (7,34),

    (23,50),

    (34,55),

    (55,60)

    SELECT

    o1.OldID

    ,NewestID=coalesce(o3.newerID, o2.newerID, o1.newerID)

    FROM

    #RefIDs o1

    left outer join #RefIDs o2 on o2.newerID = o1.oldid

    left outer join #RefIDs o3 on o3.OldID = o1.NewerID

  • R.P.Rozema, thanks for supplying the code to gen. the sample data. Noob mistake. Lesson learned.

    Cadavre

    Although not a requirement given, the NewID would always be higher than the OldID. So, the Cad-method would work as well.

    Your CTE example is brilliant as well! I am digging into the CTE a bit more to learn cool new feature.

    Thanks to all who commented. It was a learning experience 🙂

  • This shows OldID 1 changed to NewID 12

    And so on, however, an Id can change more than once.

    Like OldID 5 went from 5 to 23, then 23 to 50.

    I did not see in the requirements that there was a restriction that no more changes could be made after a record has been changed 3 times? In Burninator's solution you have to add a new join for each deeper level that you want to add, so the example works for just 3 levels deep, if you allow for more than 3 levels you'll need to add more joins, one per nesting level, making the query heavier and heavier with each level that you allow for...

    Given the structure you proposed you're stuck with some form of recursion. But if you can, you may want to adjust the way you store the information to better match your usage: if you're always only interested in getting the current id for any id entered ever, then add a column in your table, say 'CurrentID'. Initially set it to the same value as the row's ID. Then update it when the record is replaced by a more recent copy. This way you can always get for each ID the ID that holds the most recent copy by looking at the CurrentID column. You'll always only need a single join to get to the current version of any record.

    If you have to store the entire path I can suggest to use other storage structures instead of this adjecency list structure. What you've got here is a single path tree. For more information on tree structures read for example the book "trees and hierarchies in sql for smarties" by Joe Celko. It's not always an easy read but the information in the book is good. You wil certainly get new ideas on how to tackle this problem.



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

  • R.P. - Good Points

    I tried to give a simple set of data. Original data has over 300k rows.

    I don't have the liberty to change the structure. This is how I receive the data.

    Thanks for your help and details. I found "trees and hierarchies in sql for smarties" at Amazon. Thanks again!

  • Hmmm... I've been thinking about this off and on. Does the return need to be in the precise format that you specified or could you use the whole history for each "leaf" node in a single row?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply