Recursion - find last child

Question

Post reply

Recursion - find last child

dmayley

Mr or Mrs. 500

Points: 570
More actions
March 6, 2012 at 3:28 pm

#252045

Never done recursion in MS SQL.
Looked at few samples using CTE, but haven't found what I am looking for.
I have a table that provides a cross reference of ID's (OldID and NewID).
Starting with the OldID, I need to know what is the last NewID.
Table could look something like this.
OldID NewID
1 12
5 23
7 34
23 50
34 55
55 60
This shows OldID 1 changed to NewID 12
And so on, however, an Id can change more than once.
Like OldID 5 went from 5 to 23, then 23 to 50.
I need a list of the OldID and the last NewID.
List would look like this:
OldID NewID
1 12
5 50
7 60
23 50
34 60
55 60
Note that 7 changed to 34 which changed to 55 which changed to 60. Result is OldID = 5 and Last NewID = 60
Not even sure this can be done? Let me know.
Thanks

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Possinator Hall of Fame Points: 3452 More actions · Answer 1

I'm sure there are fancier and better ways, but here is a simple method with some self-joins (and a test table):

CREATE TABLE #RefIDs (OldID int, NewerID int)

INSERT INTO #RefIDs

VALUES

(1,12),

(5,23),

(7,34),

(23,50),

(34,55),

(55,60)

SELECT

o1.OldID

,NewestID=coalesce(o3.newerID, o2.newerID, o1.newerID)

FROM

#RefIDs o1

left outer join #RefIDs o2 on o2.newerID = o1.oldid

left outer join #RefIDs o3 on o3.OldID = o1.NewerID

[font="Courier New"]Looking for a Deadlock Victim Support Group..[/font]

R.P.Rozema SSChampion Points: 12317 More actions · Answer 2

Some DDL and a test set. Alas, I lost some valuable time writing this while you could have easily provided it yourself 🙁 -see the posting etiquette link in my signature for more info-

create table #cref (

OldID int not null,

NewID int not null

);

insert #cref(OldID, NewID)

values

(1, 12),

(5, 23),

(7, 34),

(23, 50),

(34, 55),

(55, 60);

First of all I think you need some additional unique identification to trace the changes for your data. This unique id must be logged with each update. Without such a unique id, you will easily end up in problems. Like for example endless loops and inter-twined update paths.

But since this is only an example I'll answer your question: it is very well possible to do this using a recursive cte. For example:

with cteSteps as (

select c1.OldID as OriginalID, c1.OldID, c1.NewID, 1 as lvl

from #cref c1

union all

select cte.OriginalID, c2.OldID, c2.NewID, cte.lvl + 1

from cteSteps cte

inner join #cref c2 on (c2.OldID = cte.NewID)

),

cteNumbered as (

select row_number() over (partition by OriginalID order by lvl desc) as nr,

*

from cteSteps

)

select OriginalID as OldID, NewID

from cteNumbered

where nr = 1

OldID NewID

----------- -----------

1 12

5 50

7 60

23 50

34 60

55 60

(6 row(s) affected)

-- edit: forgot the desc sort order on lvl... 🙁

Posting Data Etiquette - Jeff Moden[/url]
Posting Performance Based Questions - Gail Shaw[/url]
Hidden RBAR - Jeff Moden[/url]
Cross Tabs and Pivots - Jeff Moden[/url]
Catch-all queries - Gail Shaw[/url]

If you don't have time to do it right, when will you have time to do it over?

Cadavre SSC-Forever Points: 41690 More actions · Answer 3

Thanks to R.P.Rozema for the sample data that the OP "forgot" 😉

Here's another way:

WITH CTE

AS (SELECT OldID, NewID, NewID AS CurrentID

FROM #cref

UNION ALL

SELECT a.OldID, a.NewID, b.NewID

FROM CTE a

INNER JOIN #cref b ON a.CurrentID = b.OldID)

SELECT OldID, MAX(CurrentID) AS NewID

FROM CTE

GROUP BY OldID;

Returns: -

OldID NewID

----------- -----------

1 12

5 50

7 60

23 50

34 60

55 60

They're identical on this small data-set.

BEGIN TRAN

SET NOCOUNT ON;

CREATE TABLE #cref (OldID INT NOT NULL, NewID INT NOT NULL);

INSERT #cref (OldID, NewID)

VALUES (1, 12), (5, 23), (7, 34), (23, 50), (34, 55), (55, 60);

PRINT REPLICATE('-',80);

PRINT 'Cad Method';

PRINT REPLICATE('-',80);

DBCC FREEPROCCACHE;

DBCC DROPCLEANBUFFERS;

SET STATISTICS IO ON;

SET STATISTICS TIME ON;

WITH CTE

AS (SELECT OldID, NewID, NewID AS CurrentID

FROM #cref

UNION ALL

SELECT a.OldID, a.NewID, b.NewID

FROM CTE a

INNER JOIN #cref b ON a.CurrentID = b.OldID)

SELECT OldID, MAX(CurrentID) AS NewID

FROM CTE

GROUP BY OldID;

SET STATISTICS TIME OFF;

SET STATISTICS IO OFF;

PRINT REPLICATE('-',80);

PRINT 'R.P.Rozema Method';

PRINT REPLICATE('-',80);

DBCC FREEPROCCACHE;

DBCC DROPCLEANBUFFERS;

SET STATISTICS IO ON;

SET STATISTICS TIME ON;

with cteSteps as (

select c1.OldID as OriginalID, c1.OldID, c1.NewID, 1 as lvl

from #cref c1

union all

select cte.OriginalID, c2.OldID, c2.NewID, cte.lvl + 1

from cteSteps cte

inner join #cref c2 on (c2.OldID = cte.NewID)

),

cteNumbered as (

select row_number() over (partition by OriginalID order by lvl desc) as nr,

*

from cteSteps

)

select OriginalID as OldID, NewID

from cteNumbered

where nr = 1

SET STATISTICS TIME OFF;

SET STATISTICS IO OFF;

ROLLBACK

--------------------------------------------------------------------------------

Cad Method

--------------------------------------------------------------------------------

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

Table 'Worktable'. Scan count 2, logical reads 51, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#cref_______________________________________________________________________________________________________________0000000000E8'. Scan count 2, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

--------------------------------------------------------------------------------

R.P.Rozema Method

--------------------------------------------------------------------------------

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

Table 'Worktable'. Scan count 2, logical reads 51, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#cref_______________________________________________________________________________________________________________0000000000E8'. Scan count 2, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

Forever trying to learn
My blog - http://www.cadavre.co.uk/
For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

R.P.Rozema SSChampion Points: 12317 More actions · Answer 4

The Cad-method assumes that the newID always higher is than the oldID. This may very well be true, given that the ID is likely implemented as an identity column, but it was not mentioned in the requirements. 😉

Posting Data Etiquette - Jeff Moden[/url]
Posting Performance Based Questions - Gail Shaw[/url]
Hidden RBAR - Jeff Moden[/url]
Cross Tabs and Pivots - Jeff Moden[/url]
Catch-all queries - Gail Shaw[/url]

If you don't have time to do it right, when will you have time to do it over?

Cadavre SSC-Forever Points: 41690 More actions · Answer 5

R.P.Rozema (3/7/2012)
The Cad-method assumes that the newID always higher is than the oldID. This may very well be true, given that the ID is likely implemented as an identity column, but it was not mentioned in the requirements. 😉

Very true. I've made assumptions based on the limited sample data that the OP provided, which is why we normally insist on readily consumable DDL and sample data from the OP.

Forever trying to learn
My blog - http://www.cadavre.co.uk/
For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Jeff Moden SSC Guru Points: 1004471 More actions · Answer 6

~~I guess I don't understand why you need recursion for this at all. If the NEWID doesn't appear in the OLDID column, then it is a leaf level node.~~

{EDIT}... Never mind. I didn't read the original problem close enough.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Cadavre SSC-Forever Points: 41690 More actions · Answer 7

Jeff Moden (3/7/2012)
I guess I don't understand why you need recursion for this at all. If the NEWID doesn't appear in the OLDID column, then it is a leaf level node.

I've interpreted it as he wants the final child of each ID.

So, taking OldID 7.

7 has a child of 34.

34 has a child of 55.

55 has a child of 60.

Final child of 5 7 is 60, so we'd want OldID 5 7 and NewID 60.

I've then assumed that the NewID is always higher than the OldID, which may have been a mistake and needs clarifying by the OP.

Forever trying to learn
My blog - http://www.cadavre.co.uk/
For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Jeff Moden SSC Guru Points: 1004471 More actions · Answer 8

Hang on a minute while I get the egg off my face. 🙂 I saw that just before you posted and corrected my post.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

dmayley Mr or Mrs. 500 Points: 570 More actions · Answer 9

Burninator

This is good stuff! I never thought about just joining to the same table. 😛

Thanks for the idea!!

Burninator (3/6/2012)
I'm sure there are fancier and better ways, but here is a simple method with some self-joins (and a test table):
CREATE TABLE #RefIDs (OldID int, NewerID int)
INSERT INTO #RefIDs
VALUES
(1,12),
(5,23),
(7,34),
(23,50),
(34,55),
(55,60)
SELECT
o1.OldID
,NewestID=coalesce(o3.newerID, o2.newerID, o1.newerID)
FROM
#RefIDs o1
left outer join #RefIDs o2 on o2.newerID = o1.oldid
left outer join #RefIDs o3 on o3.OldID = o1.NewerID

dmayley Mr or Mrs. 500 Points: 570 More actions · Answer 10

R.P.Rozema, thanks for supplying the code to gen. the sample data. Noob mistake. Lesson learned.

Cadavre

Although not a requirement given, the NewID would always be higher than the OldID. So, the Cad-method would work as well.

Your CTE example is brilliant as well! I am digging into the CTE a bit more to learn cool new feature.

Thanks to all who commented. It was a learning experience 🙂

R.P.Rozema SSChampion Points: 12317 More actions · Answer 11

This shows OldID 1 changed to NewID 12
And so on, however, an Id can change more than once.
Like OldID 5 went from 5 to 23, then 23 to 50.

I did not see in the requirements that there was a restriction that no more changes could be made after a record has been changed 3 times? In Burninator's solution you have to add a new join for each deeper level that you want to add, so the example works for just 3 levels deep, if you allow for more than 3 levels you'll need to add more joins, one per nesting level, making the query heavier and heavier with each level that you allow for...

Given the structure you proposed you're stuck with some form of recursion. But if you can, you may want to adjust the way you store the information to better match your usage: if you're always only interested in getting the current id for any id entered ever, then add a column in your table, say 'CurrentID'. Initially set it to the same value as the row's ID. Then update it when the record is replaced by a more recent copy. This way you can always get for each ID the ID that holds the most recent copy by looking at the CurrentID column. You'll always only need a single join to get to the current version of any record.

If you have to store the entire path I can suggest to use other storage structures instead of this adjecency list structure. What you've got here is a single path tree. For more information on tree structures read for example the book "trees and hierarchies in sql for smarties" by Joe Celko. It's not always an easy read but the information in the book is good. You wil certainly get new ideas on how to tackle this problem.

Posting Data Etiquette - Jeff Moden[/url]
Posting Performance Based Questions - Gail Shaw[/url]
Hidden RBAR - Jeff Moden[/url]
Cross Tabs and Pivots - Jeff Moden[/url]
Catch-all queries - Gail Shaw[/url]

If you don't have time to do it right, when will you have time to do it over?

dmayley Mr or Mrs. 500 Points: 570 More actions · Answer 12

R.P. - Good Points

I tried to give a simple set of data. Original data has over 300k rows.

I don't have the liberty to change the structure. This is how I receive the data.

Thanks for your help and details. I found "trees and hierarchies in sql for smarties" at Amazon. Thanks again!

Jeff Moden SSC Guru Points: 1004471 More actions · Answer 13

Hmmm... I've been thinking about this off and on. Does the return need to be in the precise format that you specified or could you use the whole history for each "leaf" node in a single row?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)