rCTE vs LIKE for Hierarchy

Question

rCTE vs LIKE for Hierarchy

Ariel Nessi

Old Hand

Points: 324
More actions
September 4, 2012 at 1:35 pm

#393096

Hi people,
I'm tunning a database and i've stumbled by some hierarchy queries, on the good old form of ID, ParentID, to work out the hierarchy the previous developers used a like logic as in
SELECT TB1.* FROM TABLE1 as TB1 ON TABLE1 as TB2 on TB2.Hierarchy + '.' LIKE TB1.Hierarchy + '.%'
where hierarchy field contains the "path" to the parent nodes for example
1.2.3.4.5 would mean, this row is 5, parent is 4, grandparent is 3 and so forth
i've rewritten that in a CTE, and the query plan shouts out at me that it takes something like 25% of the cost in batch when running the rCTE approach and the like approach ( which would lead me to think that the rCTE is by far faster than the original ) but the like clause seems to take way lesser reads, i've read somewhere that the query plan only looks at the cost of a first execution of a rCTE, but as I understand rCTE were meant for jobs like this one, so how can a nasty non-sargable like go by less reads than a rCTE ?
sry if the post is not clear enough,
I haven't posted the query plans in the belief that i'm just misunderstanding something, if it ain't the case, i'll promptly post those
Edit : As requested, here is the DDL of the tables (Please note, i've took away field that are irrelevant for the problem logic)
--Table DDL
Create Table #Hierarchy
(
[OUID] [int] Primary Key,
[OUParentID] [int] NULL,
[CustomerID] [int] NOT NULL,
[Enabled] [bit] NULL,
[Hierarchy] [varchar](256) NULL,
)
--Sample Insert of 26 rows
SELECT '364',NULL,'14','1','364' UNION ALL
SELECT '365','364','14','1','364.365' UNION ALL
SELECT '366','365','14','1','364.365.366' UNION ALL
SELECT '367','366','14','1','364.365.366.367' UNION ALL
SELECT '368','367','14','1','364.365.366.367.368' UNION ALL
SELECT '369','367','14','1','364.365.366.367.369' UNION ALL
SELECT '370','367','14','1','364.365.366.367.370' UNION ALL
SELECT '371','364','14','1','364.371' UNION ALL
SELECT '372','371','14','1','364.371.372' UNION ALL
SELECT '373','372','14','1','364.371.372.373' UNION ALL
SELECT '374','373','14','1','364.371.372.373.374' UNION ALL
SELECT '375','374','14','1','364.371.372.373.374.375' UNION ALL
SELECT '376','372','14','1','364.371.372.376' UNION ALL
SELECT '377','376','14','1','364.371.372.376.377' UNION ALL
SELECT '378','377','14','1','364.371.372.376.377.378' UNION ALL
SELECT '379','372','14','1','364.371.372.379' UNION ALL
SELECT '380','379','14','1','364.371.372.379.380' UNION ALL
SELECT '381','380','14','1','364.371.372.379.380.381' UNION ALL
SELECT '382','372','14','1','364.371.372.382' UNION ALL
SELECT '383','382','14','1','364.371.372.382.383' UNION ALL
SELECT '384','383','14','1','364.371.372.382.383.384' UNION ALL
SELECT '385','372','14','1','364.371.372.385' UNION ALL
SELECT '386','385','14','1','364.371.372.385.386' UNION ALL
SELECT '387','386','14','1','364.371.372.385.386.387' UNION ALL
SELECT '388','372','14','1','364.371.372.388' UNION ALL
SELECT '389','388','14','1','364.371.372.388.389'
--Original Query Approach
DECLARE @pCustomerID int,
@pUserID int,
@pOUIDs varchar(max),
@pLocationIds varchar(max)
SET@pCustomerID = 14
SET@pUserID = 1
SET@pOUIDs = NULL
SET@pLocationIDs = NULL
SELECT DISTINCT ou.OUID
FROM
(SELECT
ouChild.OUID,
ouChild.CustomerID,
ouChild.Enabled,
ou.OUID as OUIDAncestor
FROM #Hierarchy ou
INNER JOIN #Hierarchy ouChild ON ouChild.Hierarchy + '.' LIKE ou.Hierarchy + '.%'
)ou
WHERE
ou.CustomerID = @pCustomerID
AND ou.Enabled = 1
AND
(
@pOUIDs IS NULL
OR
@pOUIDs LIKE '%,' + CAST(OUIDAncestor AS VARCHAR(20)) + ',%'
);
-- CTE Approach
WITH LocationHierarchy As
(
SELECT
ouf.OUID,ouf.OUParentID,ouf.CustomerID,ouf.Hierarchy,ouf.Enabled
FROM
#Hierarchy ouf
Where
OUID in (select Value from dbo.SplitString(@pOUIDs,',')) or @pOUIDs is null
UNION ALL
SELECT
ouc.OUID,ouc.OUParentID,ouc.CustomerID,ouc.Hierarchy,ouc.Enabled
FROM
#Hierarchy ouc INNER JOIN
LocationHierarchy on ouc.OUParentID = LocationHierarchy.OUID
Where
ouc.Enabled = 1 and ouc.CustomerID = @pCustomerID
)
Select
DISTINCT ou.OUID
from
LocationHierarchy ou

Viewing 15 posts - 1 through 15 (of 32 total)

You must be logged in to reply to this topic. Login to reply

Sean Lange SSC Guru Points: 286590 More actions · Answer 1

ariel_mlk (9/4/2012)
Hi people,
I'm tunning a database and i've stumbled by some hierarchy queries, on the good old form of ID, ParentID, to work out the hierarchy the previous developers used a like logic as in
SELECT TB1.* FROM TABLE1 as TB1 ON TABLE1 as TB2 on TB2.Hierarchy + '.' LIKE TB1.Hierarchy + '.%'
where hierarchy field contains the "path" to the parent nodes for example
1.2.3.4.5 would mean, this row is 5, parent is 4, grandparent is 3 and so forth
i've rewritten that in a CTE, and the query plan shouts out at me that it takes something like 25% of the cost in batch when running the rCTE approach and the like approach ( which would lead me to think that the rCTE is by far faster than the original ) but the like clause seems to take way lesser reads, i've read somewhere that the query plan only looks at the cost of a first execution of a rCTE, but as I understand rCTE were meant for jobs like this one, so how can a nasty non-sargable like go by less reads than a rCTE ?
sry if the post is not clear enough,
I haven't posted the query plans in the belief that i'm just misunderstanding something, if it ain't the case, i'll promptly post those

God that is awful. At least they are former developers so they won't continue creating stuff like that for you to work with. That is an adjacency list that is denormalized. I would start by getting rid of anything other than the immediate parent. The rest of that is nothing but pain. Once you start joining on like '%' you are in for a horrible performance. Say goodbye to indexing.

I tossed together some very simplified ddl and and sample data to see if your problem is what I think it is.

;with BadAdj ( KeyVal, SomeVal, ParentKeyVal, AwfulPath)

as

(

select 1, 'Top Dog', null, '1' union all

select 2, 'Next Dog', 1, '1.2' union all

select 3, 'Grandchild', 2, '1.2.3' union all

select 4, 'Uber Grandchild', 3, '1.2.3.4' union all

select 5, 'Step Grandchild', 4, '1.2.3.4.5'

)

select * from BadAdj

If I understand what you are facing...you have something like AwfulPath instead of the simple ParentKeyVal? If so, do you have the ability to change the ddl into something more usable? Even if you have to leave the whole path adding a new column to identify the parent would go a long way to making this easier to work with.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Ariel Nessi Old Hand Points: 324 More actions · Answer 2

Celko and Sean,

Thanks for the responses, I'm sorry i don't think i was clear enough, what I have is something like

ID ParentID Hierarchy

1 NULL 1

2 1 1.2

3 2 1.2.3

4 1 1.4

...

what first querie does is a like on Hierarchy column, what mine does is a rcte from ID to ParentID pretty much like BOL examples... what I really don't understand is why query optimizer tells me the rcte is cheaper execution wise whilst it produces way more IO and CPU time than the like approach, by that i mean that this :

SELECT

stChild.OUID,

stChild.OUParentID,

stChild.CustomerID,

stChild.Name,

stChild.NameFull,

stChild.Hierarchy,

stChild.HierarchyDepth,

stChild.Enabled,

st.OUID as OUIDAncestor

FROM SomeTable st

INNER JOIN SomeTable stChild ON stChild.Hierarchy + '.' LIKE st.Hierarchy + '.%'

produces less IO and CPU time than a cookie cutter rcte having a anchor on top-level rows and recursing on childs

Edit : just setting sql code in the right container ( was in a quote )

Sean Lange SSC Guru Points: 286590 More actions · Answer 3

ariel_mlk (9/5/2012)
Celko and Sean,
Thanks for the responses, I'm sorry i don't think i was clear enough, what I have is something like
ID ParentID Hierarchy
1 NULL 1
2 1 1.2
3 2 1.2.3
4 1 1.4
...
what first querie does is a like on Hierarchy column, what mine does is a rcte from ID to ParentID pretty much like BOL examples... what I really don't understand is why query optimizer tells me the rcte is cheaper execution wise whilst it produces way more IO and CPU time than the like approach, by that i mean that this :
SELECT
stChild.OUID,
stChild.OUParentID,
stChild.CustomerID,
stChild.Name,
stChild.NameFull,
stChild.Hierarchy,
stChild.HierarchyDepth,
stChild.Enabled,
st.OUID as OUIDAncestor
FROM SomeTable st
INNER JOIN SomeTable stChild ON stChild.Hierarchy + '.' LIKE st.Hierarchy + '.%'
produces less IO and CPU time than a cookie cutter rcte having a anchor on top-level rows and recursing on childs
Edit : just setting sql code in the right container ( was in a quote )

Without ddl and sample data there is no way to know what is going on. If you need to know how to post that see the first link in my signature.

If you have the parentid as a column you really should join like I demonstrated. The way you are joining using like is going to cause this to be horribly slow.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Sean Lange SSC Guru Points: 286590 More actions · Answer 4

what first querie does is a like on Hierarchy column, what mine does is a rcte from ID to ParentID pretty much like BOL examples... what I really don't understand is why query optimizer tells me the rcte is cheaper execution wise whilst it produces way more IO and CPU time than the like approach, by that i mean that this :

It may require a little more IO and CPU but overall it will be WAY faster because it doesn't have to do a full scan just to join the data. The way you are doing with ID to ParentID will allow for the index on those columns to be used in your query. I am no guru when it comes to how the query engine works but I can say that your version will be far superior to the "Like join" approach. If you want it to go even faster you might consider moving to the nested sets model that CELKO talked about.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Ariel Nessi Old Hand Points: 324 More actions · Answer 5

Sean,

Sorry to not have posted that back in the first post, as i stated I though I was missing something obvious enough to discard ddl, i editted it with more information on the problem. On that small set the query optimizer will even identify the rCTE with as the most expensive piece, but when set gets bigger it shifts gears, but again, the rCTE has more IO/CPU on a Set Statistics IO/TIME ON comparison. So my question really is how can it be more IO/CPU time consuming and be rated as more performatic by the Actual Plan percentage?

Sean Lange SSC Guru Points: 286590 More actions · Answer 6

ariel_mlk (9/5/2012)
Sean,
Sorry to not have posted that back in the first post, as i stated I though I was missing something obvious enough to discard ddl, i editted it with more information on the problem. On that small set the query optimizer will even identify the rCTE with as the most expensive piece, but when set gets bigger it shifts gears, but again, the rCTE has more IO/CPU on a Set Statistics IO/TIME ON comparison. So my question really is how can it be more IO/CPU time consuming and be rated as more performatic by the Actual Plan percentage?

Thanks for the ddl and sample data. There is a critical piece in your cte that is NOT in the original query. You have the SplitString function in the cte. Can you post the code for that? I have a feeling I may know why your performance is taking a hit.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Ariel Nessi Old Hand Points: 324 More actions · Answer 7

ALTER FUNCTION [dbo].[SplitString]

(

@String VARCHAR(MAX),

@Delimeter Char(1)

)

RETURNS @RtnValue TABLE

(

Value VARCHAR(100)

)

AS

BEGIN

DECLARE @Cnt INT

SET @Cnt = 1

WHILE (CHARINDEX(@Delimeter, @String) > 0)

BEGIN

INSERT INTO @RtnValue (Value)

SELECT

Data = ltrim(rtrim(Substring(@String,1,Charindex(@Delimeter,@String)-1)))

SET @String = Substring(@String,Charindex(@Delimeter,@String)+1,len(@String))

SET @Cnt = @Cnt + 1

END

INSERT INTO @RtnValue (Value)

SELECT Data = ltrim(rtrim(@String))

RETURN

END

I'm aware this is RBAR and i'm aware of tally techinics but the strings that this functions breaks are usually of something about 200 characters? but mostly it comes as a null, can that make enourmous IO counts?

Sean Lange SSC Guru Points: 286590 More actions · Answer 8

ariel_mlk (9/5/2012)
ALTER FUNCTION [dbo].[SplitString] ( @String VARCHAR(MAX), @Delimeter Char(1) ) RETURNS @RtnValue TABLE ( Value VARCHAR(100) ) AS BEGIN DECLARE @Cnt INT SET @Cnt = 1 WHILE (CHARINDEX(@Delimeter, @String) > 0) BEGIN INSERT INTO @RtnValue (Value) SELECT Data = ltrim(rtrim(Substring(@String,1,Charindex(@Delimeter,@String)-1))) SET @String = Substring(@String,Charindex(@Delimeter,@String)+1,len(@String)) SET @Cnt = @Cnt + 1 END INSERT INTO @RtnValue (Value) SELECT Data = ltrim(rtrim(@String)) RETURN
END
I'm aware this is RBAR and i'm aware of tally techinics but the strings that this functions breaks are usually of something about 200 characters? but mostly it comes as a null, can that make enourmous IO counts?

No offense but if you are aware that this is subpar for performance why not change it out? This process and every other process that uses your splitter will gain an advantage.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Ariel Nessi Old Hand Points: 324 More actions · Answer 9

Sean,

No offense taken at all ! right now the procedure that i copied here is the only one that uses it, i just did some copy paste of existing code ( yeah I was lazy ) as this is still not in production, but it will be changed, i'm just yet looking for where the rCTE IO comes from right now, or if I'm misinterpreting results.

I don't have the actual numbers right now but the rCTE does something like a hundred table scans on the #hierarchy table ! the like approach doesn't gets even near that and still is outperformed by the rCTE. Right now I'm more worried on understanding that behavior than using a tally for string split, which I *think* will cause minor improvement on this, or RBARs can really be THAT bad on, such small strings (200 is small isn't it) ?

thanks for the attention so far =)

Sean Lange SSC Guru Points: 286590 More actions · Answer 10

ariel_mlk (9/5/2012)
Sean,
No offense taken at all ! right now the procedure that i copied here is the only one that uses it, i just did some copy paste of existing code ( yeah I was lazy ) as this is still not in production, but it will be changed, i'm just yet looking for where the rCTE IO comes from right now, or if I'm misinterpreting results.
I don't have the actual numbers right now but the rCTE does something like a hundred table scans on the #hierarchy table ! the like approach doesn't gets even near that and still is outperformed by the rCTE. Right now I'm more worried on understanding that behavior than using a tally for string split, which I *think* will cause minor improvement on this, or RBARs can really be THAT bad on, such small strings (200 is small isn't it) ?
thanks for the attention so far =)

Get rid of that RBAR function and then start your comparison. You are trying to read stats on a query that you know you need to drastically change. Change the query first and then look at what it is doing. I realize that with the code you posted it isn't doing anything but still...

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Ariel Nessi Old Hand Points: 324 More actions · Answer 11

Sean,

Thanks for the input ! i'll get that done by tomorrow and come back to buzz you guys again =)

hopefully i'll improve the procedure more than i expect it to

Sean Lange SSC Guru Points: 286590 More actions · Answer 12

ariel_mlk (9/5/2012)
Sean,
Thanks for the input ! i'll get that done by tomorrow and come back to buzz you guys again =)
hopefully i'll improve the procedure more than i expect it to

FWIW those two queries do not produce the same results.

Consider the following when your string to split is not null.

--Original Query Approach

DECLARE @pCustomerID int,

@pUserID int,

@pOUIDs varchar(max) = '367',

@pLocationIds varchar(max)

SET@pCustomerID = 14

SET@pUserID = 1

--SET@pOUIDs = NULL

SET@pLocationIDs = NULL

SELECT DISTINCT ou.OUID

FROM

(SELECT

ouChild.OUID,

ouChild.CustomerID,

ouChild.Enabled,

ou.OUID as OUIDAncestor

FROM #Hierarchy ou

INNER JOIN #Hierarchy ouChild ON ouChild.Hierarchy + '.' LIKE ou.Hierarchy + '.%'

)ou

WHERE

ou.CustomerID = @pCustomerID

AND ou.Enabled = 1

AND

(

@pOUIDs IS NULL

OR

@pOUIDs LIKE '%,' + CAST(OUIDAncestor AS VARCHAR(20)) + ',%'

);

-- CTE Approach

;WITH LocationHierarchy As

(

SELECT

ouf.OUID,ouf.OUParentID,ouf.CustomerID,ouf.Hierarchy,ouf.Enabled

FROM

#Hierarchy ouf

Where

OUID in (select val from dbo.Split(@pOUIDs,',')) or @pOUIDs is null

UNION ALL

SELECT

ouc.OUID,ouc.OUParentID,ouc.CustomerID,ouc.Hierarchy,ouc.Enabled

FROM

#Hierarchy ouc INNER JOIN

LocationHierarchy on ouc.OUParentID = LocationHierarchy.OUID

Where

ouc.Enabled = 1 and ouc.CustomerID = @pCustomerID

)

Select

DISTINCT ou.OUID

from

LocationHierarchy ou

The results are different in the two queries so your comparison is even further off track. 😉

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Ariel Nessi Old Hand Points: 324 More actions · Answer 13

Sean,

that would produce different results indeed, but the old code would expect ',367,' to work properly, i can't test it right now, but i assume that with those leading and ending comma's they produce the same output, or i have screwed up on the copy paste.

Jeff Moden SSC Guru Points: 1004447 More actions · Answer 14

How many rows in this hierarchy?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)