Recursive CTE for Item supersession question

Question

Recursive CTE for Item supersession question

Viewing 15 posts - 16 through 30 (of 35 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 1

CELKO (4/9/2012)
This is more of a temporal problem than a recursive problem.

At first glance, I thought so, as well. Then I looked at the values in the Period column and the sample output and realized that this was exactly what Shannon said it was. It's a hierarchical replacement problem where, say, a customer might have an old catalog and order item ABC8 or ABC9 and get the "new, improved" ABC10 item, instead. It could also just be a way to use up old inventory. Either way, Shannon wanted it to be reported as the item selected and to include all other items included in the "upline" of the tree. The important thing to do here was to run the tree "upline" instead of starting with the NULL parent and running "downline". Doing that allow for the "HRoot" column to easily be carried forward in a simple lookup table created by the rCTE and "Bob's your uncle". 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Dwain Camps SSC Guru Points: 86978 More actions · Answer 2

Jeff,

That neat little trick with HRoot that you did in the CTE is just what I needed to get rid of the silly OR that I did on my INNER JOIN.

As this is a forecasting application, I can see that the OP is going to need to run this against all unique current products. So the WHERE you added to the CTE is probably going to end up as a long list (or a IN SELECT from another table somewhere).

I'm unclear on how your solution is going to eliminate the Cartesian product in that case. Perhaps you could elaborate?

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 3

dwain.c (4/9/2012)
Jeff,
That neat little trick with HRoot that you did in the CTE is just what I needed to get rid of the silly OR that I did on my INNER JOIN.
As this is a forecasting application, I can see that the OP is going to need to run this against all unique current products. So the WHERE you added to the CTE is probably going to end up as a long list (or a IN SELECT from another table somewhere).
I'm unclear on how your solution is going to eliminate the Cartesian product in that case. Perhaps you could elaborate?

That's the real beauty of calculating the "HRoot" in the "anchor" of the rCTE. Just comment out the WHERE clause there and see what happens.

As to the Cartesian Product problem... You do get a sort of "Triangular Join" situation if you allow all products by removing the WHERE clause because the sums must be calculated for each product and its upline, but we're no longer getting a Cartesian Join for each product on top of the "Triangular Join" for all products.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Dwain Camps SSC Guru Points: 86978 More actions · Answer 4

Jeff,

Your explanation makes some sort of sense but methinks I'll need to try it out to fully understand.

One more question though. Who is Bob and why is he my uncle? 🙂

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 5

Actually, if it were my code, I'd probably use the recursive CTE just to number the rows in the correct order using a Hierarchical Path concatenation and dump it into a temp table. Then I'd preaggregate the data in that new table and use a Quirky Update to do running totals. It would totally eliminate all Triangular Joins and would be able to handle millions of rows in a couple of heartbeats instead of using a fully recursive single query solution. "Divide'n'Conquer" would really pay off in spades for performance here.

I've got to get some shuteye (it's 1:42AM here). This is an interesting problem and, if you don't beat me to it, I'll give the suggestion above a whirl tomorrow night. Might even throw a million row test harness on it just to be sure.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 6

dwain.c (4/9/2012)
One more question though. Who is Bob and why is he my uncle? 🙂

It's "British". It means the same as when we Americans say "and you're done"!

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

shannonjk SSCarpal Tunnel Points: 4286 More actions · Answer 7

Thanks Jeff your solution worked just fine! Well, with the exception on our side of making a clean 'current' data map of items but I solved that as that was an issue with our data 🙂

Thanks again I will keep you posted on how it is working!

Link to my blog http://notyelf.com/

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 8

Thanks for the feedback, Shannon. Heh... yeah... the "devil's in the data" on this type of stuff.

As a sidebar, this was a really interesting problem. Dwain (and, apparently, Cold Coffee) definitely had the right idea going in.

If you get the time, I'd really be interested in the business reason(s) you needed to do such a thing because, like I said, it was a really interesting problem with an interesting solution.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

shannonjk SSCarpal Tunnel Points: 4286 More actions · Answer 9

Yeah I thought this was an interesting problem as well :). I am glad I had the right idea, just not enough experience with rCTE's to get the solution. My original solution was pretty close to yours, but the addition of the hroot inside the CTE solved the problem!

So the business reason(s)

As I mentioned before we are implementing forecasting software. Part of the software is supersession. The problem is how they do it does not fit our needs. We introduce new product either once a year or twice a year, and sometimes more often. When we introduce new product we don't always stop selling the old product. Therefore we would need to forecast for both products, the new and the current. In the software we are implementing, when you supersede an item the previous item is frozen and can no longer be forecasted on, and only the new item can be forecasted. Obviously this creates an issue for us and is a limitation in the software we are implementing.

The solution we devised is two fold; 1) we map all 'old' versions of an item to the current selling item so we can get an accurate demand pattern to forecast that item. 2) We 'duplicate' that data in an additional forecast for the future item so we can get a demand pattern to sell for the future item as well.

In the end we can forecast on both the current and the new item without having to lock anything down.

Hopefully that made sense 🙂

Link to my blog http://notyelf.com/

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 10

It makes real sense. Thanks for taking the time to explain all of that.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

F.L Right there with Babe Points: 734 More actions · Answer 11

Hello,

Hijacking this thread since OP seems to be satisfied with the solution provided. (I hope that is OK)

I´m trying to use CTE to find the very top parent within a table without any NULLs

My test table seems to work but when applying the logic on a larger table I´m running into very poor performance due to the recursion.

create table #t (childid int,name nvarchar(50),parentid int)

insert into #t values (1,'a',10)

insert into #t values (2,'b',1)

insert into #t values (3,'c',1)

insert into #t values (4,'d',3)

insert into #t values (5,'e',2)

insert into #t values (6,'f',11)

insert into #t values (7,'g',6)

insert into #t values (8,'h',5)

GO

with cte1

as

(

select childid,name,parentid, 0 AS level from #t where childid =8

union all

select #t.childid,#t.name,#t.parentid, level +1 from #t inner join cte1 on #t.childid=cte1.parentid

)

select * from cte1 D1

where D1.level = (select MAX(level) from cte1)

--example:

--childid 4 = top parent: 10

--childid 8 = top parent: 10

--childid 7 = top parent: 11

drop table #t

Is there a better approach to achieve the same goal?

Many thanks

CapnHector SSCoach Points: 16143 More actions · Answer 12

F.L (4/12/2012)
Hello,
Hijacking this thread since OP seems to be satisfied with the solution provided. (I hope that is OK)
I´m trying to use CTE to find the very top parent within a table without any NULLs
My test table seems to work but when applying the logic on a larger table I´m running into very poor performance due to the recursion.
create table #t (childid int,name nvarchar(50),parentid int)
insert into #t values (1,'a',10)
insert into #t values (2,'b',1)
insert into #t values (3,'c',1)
insert into #t values (4,'d',3)
insert into #t values (5,'e',2)
insert into #t values (6,'f',11)
insert into #t values (7,'g',6)
insert into #t values (8,'h',5)
GO
with cte1
as
(
select childid,name,parentid, 0 AS level from #t where childid =8
union all
select #t.childid,#t.name,#t.parentid, level +1 from #t inner join cte1 on #t.childid=cte1.parentid
)
select * from cte1 D1
where D1.level = (select MAX(level) from cte1)
--example:
--childid 4 = top parent: 10
--childid 8 = top parent: 10
--childid 7 = top parent: 11
drop table #t
Is there a better approach to achieve the same goal?
Many thanks

thread jacking is a big forum etiquette no no on any forum im on. you will be better off starting your own thread and linking to this one if you think its applicable.

For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]

F.L Right there with Babe Points: 734 More actions · Answer 13

capn.hector (4/13/2012)
F.L (4/12/2012)
Hello,
Hijacking this thread since OP seems to be satisfied with the solution provided. (I hope that is OK)
I´m trying to use CTE to find the very top parent within a table without any NULLs
My test table seems to work but when applying the logic on a larger table I´m running into very poor performance due to the recursion.
create table #t (childid int,name nvarchar(50),parentid int)
insert into #t values (1,'a',10)
insert into #t values (2,'b',1)
insert into #t values (3,'c',1)
insert into #t values (4,'d',3)
insert into #t values (5,'e',2)
insert into #t values (6,'f',11)
insert into #t values (7,'g',6)
insert into #t values (8,'h',5)
GO
with cte1
as
(
select childid,name,parentid, 0 AS level from #t where childid =8
union all
select #t.childid,#t.name,#t.parentid, level +1 from #t inner join cte1 on #t.childid=cte1.parentid
)
select * from cte1 D1
where D1.level = (select MAX(level) from cte1)
--example:
--childid 4 = top parent: 10
--childid 8 = top parent: 10
--childid 7 = top parent: 11
drop table #t
Is there a better approach to achieve the same goal?
Many thanks
thread jacking is a big forum etiquette no no on any forum im on. you will be better off starting your own thread and linking to this one if you think its applicable.

Ok, sorry for being rude, that was not my intention.

The purpose was just to spare the forum from another thread on pretty much the same topic.

(And of course since OP already got an answer he was happy with)

My apologies, lesson learned

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 14

So where did you go??? I checked your profile for additional posts and didn't find one.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 15

CELKO (4/14/2012)
Unfortunately, Mr. Celko, this isn't MySQL. This is MS SQL Server and it doesn't have the same capabilities as MySQL or Oracle.
You do know that T-SQL now has a DATE type that defaults to the ISO-8601/ ANSI SQL display format? When I want to pass data to the front end for display -- not backend calculations -- I can now use the CAST() function to get a string and UNION the monthly and annual MySQL report period names.
Do you have a better way to display a month report period that is language free?

Quite aware. If I need a language free method I am sure I'd figure it out.