March 6, 2012 at 3:28 pm
Never done recursion in MS SQL.
Looked at few samples using CTE, but haven't found what I am looking for.
I have a table that provides a cross reference of ID's (OldID and NewID).
Starting with the OldID, I need to know what is the last NewID.
Table could look something like this.
OldID NewID
1 12
5 23
7 34
23 50
34 55
55 60
This shows OldID 1 changed to NewID 12
And so on, however, an Id can change more than once.
Like OldID 5 went from 5 to 23, then 23 to 50.
I need a list of the OldID and the last NewID.
List would look like this:
OldID NewID
1 12
5 50
7 60
23 50
34 60
55 60
Note that 7 changed to 34 which changed to 55 which changed to 60. Result is OldID = 5 and Last NewID = 60
Not even sure this can be done? Let me know.
Thanks
March 6, 2012 at 3:50 pm
I'm sure there are fancier and better ways, but here is a simple method with some self-joins (and a test table):
CREATE TABLE #RefIDs (OldID int, NewerID int)
INSERT INTO #RefIDs
VALUES
(1,12),
(5,23),
(7,34),
(23,50),
(34,55),
(55,60)
SELECT
o1.OldID
,NewestID=coalesce(o3.newerID, o2.newerID, o1.newerID)
FROM
#RefIDs o1
left outer join #RefIDs o2 on o2.newerID = o1.oldid
left outer join #RefIDs o3 on o3.OldID = o1.NewerID
March 7, 2012 at 5:42 am
Some DDL and a test set. Alas, I lost some valuable time writing this while you could have easily provided it yourself 🙁 -see the posting etiquette link in my signature for more info-
create table #cref (
OldID int not null,
NewID int not null
);
insert #cref(OldID, NewID)
values
(1, 12),
(5, 23),
(7, 34),
(23, 50),
(34, 55),
(55, 60);
First of all I think you need some additional unique identification to trace the changes for your data. This unique id must be logged with each update. Without such a unique id, you will easily end up in problems. Like for example endless loops and inter-twined update paths.
But since this is only an example I'll answer your question: it is very well possible to do this using a recursive cte. For example:
with cteSteps as (
select c1.OldID as OriginalID, c1.OldID, c1.NewID, 1 as lvl
from #cref c1
union all
select cte.OriginalID, c2.OldID, c2.NewID, cte.lvl + 1
from cteSteps cte
inner join #cref c2 on (c2.OldID = cte.NewID)
),
cteNumbered as (
select row_number() over (partition by OriginalID order by lvl desc) as nr,
*
from cteSteps
)
select OriginalID as OldID, NewID
from cteNumbered
where nr = 1
OldID NewID
----------- -----------
1 12
5 50
7 60
23 50
34 60
55 60
(6 row(s) affected)
-- edit: forgot the desc sort order on lvl... 🙁
March 7, 2012 at 6:17 am
Thanks to R.P.Rozema for the sample data that the OP "forgot" 😉
Here's another way:
WITH CTE
AS (SELECT OldID, NewID, NewID AS CurrentID
FROM #cref
UNION ALL
SELECT a.OldID, a.NewID, b.NewID
FROM CTE a
INNER JOIN #cref b ON a.CurrentID = b.OldID)
SELECT OldID, MAX(CurrentID) AS NewID
FROM CTE
GROUP BY OldID;
Returns: -
OldID NewID
----------- -----------
1 12
5 50
7 60
23 50
34 60
55 60
They're identical on this small data-set.
BEGIN TRAN
SET NOCOUNT ON;
CREATE TABLE #cref (OldID INT NOT NULL, NewID INT NOT NULL);
INSERT #cref (OldID, NewID)
VALUES (1, 12), (5, 23), (7, 34), (23, 50), (34, 55), (55, 60);
PRINT REPLICATE('-',80);
PRINT 'Cad Method';
PRINT REPLICATE('-',80);
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
WITH CTE
AS (SELECT OldID, NewID, NewID AS CurrentID
FROM #cref
UNION ALL
SELECT a.OldID, a.NewID, b.NewID
FROM CTE a
INNER JOIN #cref b ON a.CurrentID = b.OldID)
SELECT OldID, MAX(CurrentID) AS NewID
FROM CTE
GROUP BY OldID;
SET STATISTICS TIME OFF;
SET STATISTICS IO OFF;
PRINT REPLICATE('-',80);
PRINT 'R.P.Rozema Method';
PRINT REPLICATE('-',80);
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
with cteSteps as (
select c1.OldID as OriginalID, c1.OldID, c1.NewID, 1 as lvl
from #cref c1
union all
select cte.OriginalID, c2.OldID, c2.NewID, cte.lvl + 1
from cteSteps cte
inner join #cref c2 on (c2.OldID = cte.NewID)
),
cteNumbered as (
select row_number() over (partition by OriginalID order by lvl desc) as nr,
*
from cteSteps
)
select OriginalID as OldID, NewID
from cteNumbered
where nr = 1
SET STATISTICS TIME OFF;
SET STATISTICS IO OFF;
ROLLBACK
--------------------------------------------------------------------------------
Cad Method
--------------------------------------------------------------------------------
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
Table 'Worktable'. Scan count 2, logical reads 51, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#cref_______________________________________________________________________________________________________________0000000000E8'. Scan count 2, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
--------------------------------------------------------------------------------
R.P.Rozema Method
--------------------------------------------------------------------------------
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
Table 'Worktable'. Scan count 2, logical reads 51, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#cref_______________________________________________________________________________________________________________0000000000E8'. Scan count 2, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
March 7, 2012 at 6:33 am
The Cad-method assumes that the newID always higher is than the oldID. This may very well be true, given that the ID is likely implemented as an identity column, but it was not mentioned in the requirements. 😉
March 7, 2012 at 6:43 am
R.P.Rozema (3/7/2012)
The Cad-method assumes that the newID always higher is than the oldID. This may very well be true, given that the ID is likely implemented as an identity column, but it was not mentioned in the requirements. 😉
Very true. I've made assumptions based on the limited sample data that the OP provided, which is why we normally insist on readily consumable DDL and sample data from the OP.
March 7, 2012 at 6:52 am
I guess I don't understand why you need recursion for this at all. If the NEWID doesn't appear in the OLDID column, then it is a leaf level node.
{EDIT}... Never mind. I didn't read the original problem close enough.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 7, 2012 at 6:57 am
Jeff Moden (3/7/2012)
I guess I don't understand why you need recursion for this at all. If the NEWID doesn't appear in the OLDID column, then it is a leaf level node.
I've interpreted it as he wants the final child of each ID.
So, taking OldID 7.
7 has a child of 34.
34 has a child of 55.
55 has a child of 60.
Final child of 5 7 is 60, so we'd want OldID 5 7 and NewID 60.
I've then assumed that the NewID is always higher than the OldID, which may have been a mistake and needs clarifying by the OP.
March 7, 2012 at 6:59 am
Hang on a minute while I get the egg off my face. 🙂 I saw that just before you posted and corrected my post.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 7, 2012 at 8:09 am
Burninator
This is good stuff! I never thought about just joining to the same table. 😛
Thanks for the idea!!
Burninator (3/6/2012)
I'm sure there are fancier and better ways, but here is a simple method with some self-joins (and a test table):
CREATE TABLE #RefIDs (OldID int, NewerID int)
INSERT INTO #RefIDs
VALUES
(1,12),
(5,23),
(7,34),
(23,50),
(34,55),
(55,60)
SELECT
o1.OldID
,NewestID=coalesce(o3.newerID, o2.newerID, o1.newerID)
FROM
#RefIDs o1
left outer join #RefIDs o2 on o2.newerID = o1.oldid
left outer join #RefIDs o3 on o3.OldID = o1.NewerID
March 7, 2012 at 8:21 am
R.P.Rozema, thanks for supplying the code to gen. the sample data. Noob mistake. Lesson learned.
Cadavre
Although not a requirement given, the NewID would always be higher than the OldID. So, the Cad-method would work as well.
Your CTE example is brilliant as well! I am digging into the CTE a bit more to learn cool new feature.
Thanks to all who commented. It was a learning experience 🙂
March 7, 2012 at 8:33 am
This shows OldID 1 changed to NewID 12
And so on, however, an Id can change more than once.
Like OldID 5 went from 5 to 23, then 23 to 50.
I did not see in the requirements that there was a restriction that no more changes could be made after a record has been changed 3 times? In Burninator's solution you have to add a new join for each deeper level that you want to add, so the example works for just 3 levels deep, if you allow for more than 3 levels you'll need to add more joins, one per nesting level, making the query heavier and heavier with each level that you allow for...
Given the structure you proposed you're stuck with some form of recursion. But if you can, you may want to adjust the way you store the information to better match your usage: if you're always only interested in getting the current id for any id entered ever, then add a column in your table, say 'CurrentID'. Initially set it to the same value as the row's ID. Then update it when the record is replaced by a more recent copy. This way you can always get for each ID the ID that holds the most recent copy by looking at the CurrentID column. You'll always only need a single join to get to the current version of any record.
If you have to store the entire path I can suggest to use other storage structures instead of this adjecency list structure. What you've got here is a single path tree. For more information on tree structures read for example the book "trees and hierarchies in sql for smarties" by Joe Celko. It's not always an easy read but the information in the book is good. You wil certainly get new ideas on how to tackle this problem.
March 7, 2012 at 11:48 am
R.P. - Good Points
I tried to give a simple set of data. Original data has over 300k rows.
I don't have the liberty to change the structure. This is how I receive the data.
Thanks for your help and details. I found "trees and hierarchies in sql for smarties" at Amazon. Thanks again!
March 14, 2012 at 11:02 pm
Hmmm... I've been thinking about this off and on. Does the return need to be in the precise format that you specified or could you use the whole history for each "leaf" node in a single row?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply