September 4, 2012 at 1:35 pm
Hi people,
I'm tunning a database and i've stumbled by some hierarchy queries, on the good old form of ID, ParentID, to work out the hierarchy the previous developers used a like logic as in
SELECT TB1.* FROM TABLE1 as TB1 ON TABLE1 as TB2 on TB2.Hierarchy + '.' LIKE TB1.Hierarchy + '.%'
where hierarchy field contains the "path" to the parent nodes for example
1.2.3.4.5 would mean, this row is 5, parent is 4, grandparent is 3 and so forth
i've rewritten that in a CTE, and the query plan shouts out at me that it takes something like 25% of the cost in batch when running the rCTE approach and the like approach ( which would lead me to think that the rCTE is by far faster than the original ) but the like clause seems to take way lesser reads, i've read somewhere that the query plan only looks at the cost of a first execution of a rCTE, but as I understand rCTE were meant for jobs like this one, so how can a nasty non-sargable like go by less reads than a rCTE ?
sry if the post is not clear enough,
I haven't posted the query plans in the belief that i'm just misunderstanding something, if it ain't the case, i'll promptly post those
Edit : As requested, here is the DDL of the tables (Please note, i've took away field that are irrelevant for the problem logic)
--Table DDL
Create Table #Hierarchy
(
[OUID] [int] Primary Key,
[OUParentID] [int] NULL,
[CustomerID] [int] NOT NULL,
[Enabled] [bit] NULL,
[Hierarchy] [varchar](256) NULL,
)
--Sample Insert of 26 rows
SELECT '364',NULL,'14','1','364' UNION ALL
SELECT '365','364','14','1','364.365' UNION ALL
SELECT '366','365','14','1','364.365.366' UNION ALL
SELECT '367','366','14','1','364.365.366.367' UNION ALL
SELECT '368','367','14','1','364.365.366.367.368' UNION ALL
SELECT '369','367','14','1','364.365.366.367.369' UNION ALL
SELECT '370','367','14','1','364.365.366.367.370' UNION ALL
SELECT '371','364','14','1','364.371' UNION ALL
SELECT '372','371','14','1','364.371.372' UNION ALL
SELECT '373','372','14','1','364.371.372.373' UNION ALL
SELECT '374','373','14','1','364.371.372.373.374' UNION ALL
SELECT '375','374','14','1','364.371.372.373.374.375' UNION ALL
SELECT '376','372','14','1','364.371.372.376' UNION ALL
SELECT '377','376','14','1','364.371.372.376.377' UNION ALL
SELECT '378','377','14','1','364.371.372.376.377.378' UNION ALL
SELECT '379','372','14','1','364.371.372.379' UNION ALL
SELECT '380','379','14','1','364.371.372.379.380' UNION ALL
SELECT '381','380','14','1','364.371.372.379.380.381' UNION ALL
SELECT '382','372','14','1','364.371.372.382' UNION ALL
SELECT '383','382','14','1','364.371.372.382.383' UNION ALL
SELECT '384','383','14','1','364.371.372.382.383.384' UNION ALL
SELECT '385','372','14','1','364.371.372.385' UNION ALL
SELECT '386','385','14','1','364.371.372.385.386' UNION ALL
SELECT '387','386','14','1','364.371.372.385.386.387' UNION ALL
SELECT '388','372','14','1','364.371.372.388' UNION ALL
SELECT '389','388','14','1','364.371.372.388.389'
--Original Query Approach
DECLARE @pCustomerID int,
@pUserID int,
@pOUIDs varchar(max),
@pLocationIds varchar(max)
SET@pCustomerID = 14
SET@pUserID = 1
SET@pOUIDs = NULL
SET@pLocationIDs = NULL
SELECT DISTINCT ou.OUID
FROM
(SELECT
ouChild.OUID,
ouChild.CustomerID,
ouChild.Enabled,
ou.OUID as OUIDAncestor
FROM #Hierarchy ou
INNER JOIN #Hierarchy ouChild ON ouChild.Hierarchy + '.' LIKE ou.Hierarchy + '.%'
)ou
WHERE
ou.CustomerID = @pCustomerID
AND ou.Enabled = 1
AND
(
@pOUIDs IS NULL
OR
@pOUIDs LIKE '%,' + CAST(OUIDAncestor AS VARCHAR(20)) + ',%'
);
-- CTE Approach
WITH LocationHierarchy As
(
SELECT
ouf.OUID,ouf.OUParentID,ouf.CustomerID,ouf.Hierarchy,ouf.Enabled
FROM
#Hierarchy ouf
Where
OUID in (select Value from dbo.SplitString(@pOUIDs,',')) or @pOUIDs is null
UNION ALL
SELECT
ouc.OUID,ouc.OUParentID,ouc.CustomerID,ouc.Hierarchy,ouc.Enabled
FROM
#Hierarchy ouc INNER JOIN
LocationHierarchy on ouc.OUParentID = LocationHierarchy.OUID
Where
ouc.Enabled = 1 and ouc.CustomerID = @pCustomerID
)
Select
DISTINCT ou.OUID
from
LocationHierarchy ou
September 4, 2012 at 2:38 pm
ariel_mlk (9/4/2012)
Hi people,I'm tunning a database and i've stumbled by some hierarchy queries, on the good old form of ID, ParentID, to work out the hierarchy the previous developers used a like logic as in
SELECT TB1.* FROM TABLE1 as TB1 ON TABLE1 as TB2 on TB2.Hierarchy + '.' LIKE TB1.Hierarchy + '.%'
where hierarchy field contains the "path" to the parent nodes for example
1.2.3.4.5 would mean, this row is 5, parent is 4, grandparent is 3 and so forth
i've rewritten that in a CTE, and the query plan shouts out at me that it takes something like 25% of the cost in batch when running the rCTE approach and the like approach ( which would lead me to think that the rCTE is by far faster than the original ) but the like clause seems to take way lesser reads, i've read somewhere that the query plan only looks at the cost of a first execution of a rCTE, but as I understand rCTE were meant for jobs like this one, so how can a nasty non-sargable like go by less reads than a rCTE ?
sry if the post is not clear enough,
I haven't posted the query plans in the belief that i'm just misunderstanding something, if it ain't the case, i'll promptly post those
God that is awful. At least they are former developers so they won't continue creating stuff like that for you to work with. That is an adjacency list that is denormalized. I would start by getting rid of anything other than the immediate parent. The rest of that is nothing but pain. Once you start joining on like '%' you are in for a horrible performance. Say goodbye to indexing.
I tossed together some very simplified ddl and and sample data to see if your problem is what I think it is.
;with BadAdj ( KeyVal, SomeVal, ParentKeyVal, AwfulPath)
as
(
select 1, 'Top Dog', null, '1' union all
select 2, 'Next Dog', 1, '1.2' union all
select 3, 'Grandchild', 2, '1.2.3' union all
select 4, 'Uber Grandchild', 3, '1.2.3.4' union all
select 5, 'Step Grandchild', 4, '1.2.3.4.5'
)
select * from BadAdj
If I understand what you are facing...you have something like AwfulPath instead of the simple ParentKeyVal? If so, do you have the ability to change the ddl into something more usable? Even if you have to leave the whole path adding a new column to identify the parent would go a long way to making this easier to work with.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 5, 2012 at 11:16 am
Celko and Sean,
Thanks for the responses, I'm sorry i don't think i was clear enough, what I have is something like
ID ParentID Hierarchy
1 NULL 1
2 1 1.2
3 2 1.2.3
4 1 1.4
...
what first querie does is a like on Hierarchy column, what mine does is a rcte from ID to ParentID pretty much like BOL examples... what I really don't understand is why query optimizer tells me the rcte is cheaper execution wise whilst it produces way more IO and CPU time than the like approach, by that i mean that this :
SELECT
stChild.OUID,
stChild.OUParentID,
stChild.CustomerID,
stChild.Name,
stChild.NameFull,
stChild.Hierarchy,
stChild.HierarchyDepth,
stChild.Enabled,
st.OUID as OUIDAncestor
FROM SomeTable st
INNER JOIN SomeTable stChild ON stChild.Hierarchy + '.' LIKE st.Hierarchy + '.%'
produces less IO and CPU time than a cookie cutter rcte having a anchor on top-level rows and recursing on childs
Edit : just setting sql code in the right container ( was in a quote )
September 5, 2012 at 12:20 pm
ariel_mlk (9/5/2012)
Celko and Sean,Thanks for the responses, I'm sorry i don't think i was clear enough, what I have is something like
ID ParentID Hierarchy
1 NULL 1
2 1 1.2
3 2 1.2.3
4 1 1.4
...
what first querie does is a like on Hierarchy column, what mine does is a rcte from ID to ParentID pretty much like BOL examples... what I really don't understand is why query optimizer tells me the rcte is cheaper execution wise whilst it produces way more IO and CPU time than the like approach, by that i mean that this :
SELECT
stChild.OUID,
stChild.OUParentID,
stChild.CustomerID,
stChild.Name,
stChild.NameFull,
stChild.Hierarchy,
stChild.HierarchyDepth,
stChild.Enabled,
st.OUID as OUIDAncestor
FROM SomeTable st
INNER JOIN SomeTable stChild ON stChild.Hierarchy + '.' LIKE st.Hierarchy + '.%'
produces less IO and CPU time than a cookie cutter rcte having a anchor on top-level rows and recursing on childs
Edit : just setting sql code in the right container ( was in a quote )
Without ddl and sample data there is no way to know what is going on. If you need to know how to post that see the first link in my signature.
If you have the parentid as a column you really should join like I demonstrated. The way you are joining using like is going to cause this to be horribly slow.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 5, 2012 at 12:25 pm
what first querie does is a like on Hierarchy column, what mine does is a rcte from ID to ParentID pretty much like BOL examples... what I really don't understand is why query optimizer tells me the rcte is cheaper execution wise whilst it produces way more IO and CPU time than the like approach, by that i mean that this :
It may require a little more IO and CPU but overall it will be WAY faster because it doesn't have to do a full scan just to join the data. The way you are doing with ID to ParentID will allow for the index on those columns to be used in your query. I am no guru when it comes to how the query engine works but I can say that your version will be far superior to the "Like join" approach. If you want it to go even faster you might consider moving to the nested sets model that CELKO talked about.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 5, 2012 at 1:34 pm
Sean,
Sorry to not have posted that back in the first post, as i stated I though I was missing something obvious enough to discard ddl, i editted it with more information on the problem. On that small set the query optimizer will even identify the rCTE with as the most expensive piece, but when set gets bigger it shifts gears, but again, the rCTE has more IO/CPU on a Set Statistics IO/TIME ON comparison. So my question really is how can it be more IO/CPU time consuming and be rated as more performatic by the Actual Plan percentage?
September 5, 2012 at 1:39 pm
ariel_mlk (9/5/2012)
Sean,Sorry to not have posted that back in the first post, as i stated I though I was missing something obvious enough to discard ddl, i editted it with more information on the problem. On that small set the query optimizer will even identify the rCTE with as the most expensive piece, but when set gets bigger it shifts gears, but again, the rCTE has more IO/CPU on a Set Statistics IO/TIME ON comparison. So my question really is how can it be more IO/CPU time consuming and be rated as more performatic by the Actual Plan percentage?
Thanks for the ddl and sample data. There is a critical piece in your cte that is NOT in the original query. You have the SplitString function in the cte. Can you post the code for that? I have a feeling I may know why your performance is taking a hit.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 5, 2012 at 1:45 pm
ALTER FUNCTION [dbo].[SplitString]
(
@String VARCHAR(MAX),
@Delimeter Char(1)
)
RETURNS @RtnValue TABLE
(
Value VARCHAR(100)
)
AS
BEGIN
DECLARE @Cnt INT
SET @Cnt = 1
WHILE (CHARINDEX(@Delimeter, @String) > 0)
BEGIN
INSERT INTO @RtnValue (Value)
SELECT
Data = ltrim(rtrim(Substring(@String,1,Charindex(@Delimeter,@String)-1)))
SET @String = Substring(@String,Charindex(@Delimeter,@String)+1,len(@String))
SET @Cnt = @Cnt + 1
END
INSERT INTO @RtnValue (Value)
SELECT Data = ltrim(rtrim(@String))
RETURN
END
I'm aware this is RBAR and i'm aware of tally techinics but the strings that this functions breaks are usually of something about 200 characters? but mostly it comes as a null, can that make enourmous IO counts?
September 5, 2012 at 2:39 pm
ariel_mlk (9/5/2012)
ALTER FUNCTION [dbo].[SplitString]
(
@String VARCHAR(MAX),
@Delimeter Char(1)
)
RETURNS @RtnValue TABLE
(
Value VARCHAR(100)
)
AS
BEGIN
DECLARE @Cnt INT
SET @Cnt = 1
WHILE (CHARINDEX(@Delimeter, @String) > 0)
BEGIN
INSERT INTO @RtnValue (Value)
SELECT
Data = ltrim(rtrim(Substring(@String,1,Charindex(@Delimeter,@String)-1)))
SET @String = Substring(@String,Charindex(@Delimeter,@String)+1,len(@String))
SET @Cnt = @Cnt + 1
END
INSERT INTO @RtnValue (Value)
SELECT Data = ltrim(rtrim(@String))
RETURN
END
I'm aware this is RBAR and i'm aware of tally techinics but the strings that this functions breaks are usually of something about 200 characters? but mostly it comes as a null, can that make enourmous IO counts?
No offense but if you are aware that this is subpar for performance why not change it out? This process and every other process that uses your splitter will gain an advantage.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 5, 2012 at 3:04 pm
Sean,
No offense taken at all ! right now the procedure that i copied here is the only one that uses it, i just did some copy paste of existing code ( yeah I was lazy ) as this is still not in production, but it will be changed, i'm just yet looking for where the rCTE IO comes from right now, or if I'm misinterpreting results.
I don't have the actual numbers right now but the rCTE does something like a hundred table scans on the #hierarchy table ! the like approach doesn't gets even near that and still is outperformed by the rCTE. Right now I'm more worried on understanding that behavior than using a tally for string split, which I *think* will cause minor improvement on this, or RBARs can really be THAT bad on, such small strings (200 is small isn't it) ?
thanks for the attention so far =)
September 5, 2012 at 3:18 pm
ariel_mlk (9/5/2012)
Sean,No offense taken at all ! right now the procedure that i copied here is the only one that uses it, i just did some copy paste of existing code ( yeah I was lazy ) as this is still not in production, but it will be changed, i'm just yet looking for where the rCTE IO comes from right now, or if I'm misinterpreting results.
I don't have the actual numbers right now but the rCTE does something like a hundred table scans on the #hierarchy table ! the like approach doesn't gets even near that and still is outperformed by the rCTE. Right now I'm more worried on understanding that behavior than using a tally for string split, which I *think* will cause minor improvement on this, or RBARs can really be THAT bad on, such small strings (200 is small isn't it) ?
thanks for the attention so far =)
Get rid of that RBAR function and then start your comparison. You are trying to read stats on a query that you know you need to drastically change. Change the query first and then look at what it is doing. I realize that with the code you posted it isn't doing anything but still...
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 5, 2012 at 3:22 pm
Sean,
Thanks for the input ! i'll get that done by tomorrow and come back to buzz you guys again =)
hopefully i'll improve the procedure more than i expect it to
September 5, 2012 at 3:33 pm
ariel_mlk (9/5/2012)
Sean,Thanks for the input ! i'll get that done by tomorrow and come back to buzz you guys again =)
hopefully i'll improve the procedure more than i expect it to
FWIW those two queries do not produce the same results.
Consider the following when your string to split is not null.
--Original Query Approach
DECLARE @pCustomerID int,
@pUserID int,
@pOUIDs varchar(max) = '367',
@pLocationIds varchar(max)
SET@pCustomerID = 14
SET@pUserID = 1
--SET@pOUIDs = NULL
SET@pLocationIDs = NULL
SELECT DISTINCT ou.OUID
FROM
(SELECT
ouChild.OUID,
ouChild.CustomerID,
ouChild.Enabled,
ou.OUID as OUIDAncestor
FROM #Hierarchy ou
INNER JOIN #Hierarchy ouChild ON ouChild.Hierarchy + '.' LIKE ou.Hierarchy + '.%'
)ou
WHERE
ou.CustomerID = @pCustomerID
AND ou.Enabled = 1
AND
(
@pOUIDs IS NULL
OR
@pOUIDs LIKE '%,' + CAST(OUIDAncestor AS VARCHAR(20)) + ',%'
);
-- CTE Approach
;WITH LocationHierarchy As
(
SELECT
ouf.OUID,ouf.OUParentID,ouf.CustomerID,ouf.Hierarchy,ouf.Enabled
FROM
#Hierarchy ouf
Where
OUID in (select val from dbo.Split(@pOUIDs,',')) or @pOUIDs is null
UNION ALL
SELECT
ouc.OUID,ouc.OUParentID,ouc.CustomerID,ouc.Hierarchy,ouc.Enabled
FROM
#Hierarchy ouc INNER JOIN
LocationHierarchy on ouc.OUParentID = LocationHierarchy.OUID
Where
ouc.Enabled = 1 and ouc.CustomerID = @pCustomerID
)
Select
DISTINCT ou.OUID
from
LocationHierarchy ou
The results are different in the two queries so your comparison is even further off track. 😉
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 5, 2012 at 3:49 pm
Sean,
that would produce different results indeed, but the old code would expect ',367,' to work properly, i can't test it right now, but i assume that with those leading and ending comma's they produce the same output, or i have screwed up on the copy paste.
September 7, 2012 at 2:56 pm
How many rows in this hierarchy?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 32 total)
You must be logged in to reply to this topic. Login to reply