Help on Performance issue on Recursive CTE

mister.magoo

SSC-Forever

Points: 47174
More actions
October 11, 2012 at 12:43 pm

#1548319

Have you got a select that produces more rows? I hadn't spotted that one was only 25 rows, which is far too small a sample.
We need one that produces about 25,000 rows judging by previous runs...
And measure the performance for the select into variables (or bitbuckets) and for the insert separately.
MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

Hall of Fame

Points: 3734

October 11, 2012 at 12:51 pm

MM

Here it is

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

(25491 row(s) affected)

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 188 ms, elapsed time = 427 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

mister.magoo (10/11/2012)
Have you got a select that produces more rows? I hadn't spotted that one was only 25 rows, which is far too small a sample.
We need one that produces about 25,000 rows judging by previous runs...
And measure the performance for the select into variables (or bitbuckets) and for the insert separately.

SSC-Forever

Points: 47174

October 11, 2012 at 1:41 pm

Really baffling why the inserts are so slow during the run...

Can you try switching the index on #results to a clustered one?

CREATE CLUSTERED INDEX #IXS ON #RESULTS(LEVEL);

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Forum Etiquette: How to post Reporting Services problems

[/url]Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]How to Post Performance Problems - by Gail Shaw
[/url]

Hall of Fame

Points: 3734

October 11, 2012 at 1:53 pm

MM,

I switched to clustered Index it runs about the same amount of time. Could you think of something else I can try to see if it makes any different? If you want to IM me quicker, I am on yahoo messager: haiao2000@yahoo.com

Thanks for helping out!

mister.magoo (10/11/2012)
Really baffling why the inserts are so slow during the run...
Can you try switching the index on #results to a clustered one?
CREATE CLUSTERED INDEX #IXS ON #RESULTS(LEVEL);

Hall of Fame

Points: 3734

October 11, 2012 at 3:43 pm

Maybe this will give another angle to look at. There are 2 ways I can benefit from this query if I can get it runs fast enough.

1) get all the tree nodes (which we have been trying, so far it is not fast enough)

2) just end as soon as at least one of the nodes has a modification status of 1 (IsDirty=1)? Which is the best way to handle this?

CREATE TABLE #Result(ID INT NOT NULL, ParentID UNIQUEIDENTIFIER NOT NULL, ChildID UNIQUEIDENTIFIER, IsDirty Bit, LEVEL INT IDENTITY(1,1) NOT NULL);

CREATE INDEX #IXS1 ON #Result(LEVEL) INCLUDE(ChildID);

CREATE INDEX #IXS2 ON #Result(IsDirty);

SET IDENTITY_INSERT #Result ON;

--pop the top level node to the temp table

INSERT #Result(ID,ParentID,ChildID,IsDirty,LEVEL)

SELECT ID,ParentID,ChildID,IsDirty,1 FROM MyTable WHERE ParentID = @ObjectID;

--recursive until no more children

WHILE @@ROWCOUNT > 0 BEGIN

INSERT #Result(ID,ParentID,ChildID,IsDirty, LEVEL)

SELECT S.ID, S.ParentID,S.ChildID,S.IsDirty, SCOPE_IDENTITY()+1

FROM #Result AS R

INNER JOIN MyTable S ON S.ParentID = R.ChildID

WHERE LEVEL = SCOPE_IDENTITY();

--for item #2 above, I am thinking to add this, but I am sure it is not the best idea. anythought?

--the problem with this is when non of the nodes with IsDirty=1 then it takes at minimum, the same amount of time to traverse all nodes

IF EXISTS(SELECT ChildID FROM #Result WHERE LEVEL=SCOPE_IDENTITY() AND IsDirty=1)

BREAK;

END

SET IDENTITY_INSERT #Result OFF;

SELECT * FROM #Result

SSC-Forever

Points: 47174

October 11, 2012 at 6:07 pm

haiao2000 (10/11/2012)
Maybe this will give another angle to look at. There are 2 ways I can benefit from this query if I can get it runs fast enough.
1) get all the tree nodes (which we have been trying, so far it is not fast enough)
2) just end as soon as at least one of the nodes has a modification status of 1 (IsDirty=1)? Which is the best way to handle this?

CREATE TABLE #Result(ID INT NOT NULL, ParentID UNIQUEIDENTIFIER NOT NULL, ChildID UNIQUEIDENTIFIER, IsDirty Bit, LEVEL INT IDENTITY(1,1) NOT NULL);
CREATE INDEX #IXS1 ON #Result(LEVEL) INCLUDE(ChildID);
CREATE INDEX #IXS2 ON #Result(IsDirty);
SET IDENTITY_INSERT #Result ON;
--pop the top level node to the temp table
INSERT #Result(ID,ParentID,ChildID,IsDirty,LEVEL)
SELECT ID,ParentID,ChildID,IsDirty,1 FROM MyTable WHERE ParentID = @ObjectID;
--recursive until no more children
WHILE @@ROWCOUNT > 0 BEGIN
INSERT #Result(ID,ParentID,ChildID,IsDirty, LEVEL)
SELECT S.ID, S.ParentID,S.ChildID,S.IsDirty, SCOPE_IDENTITY()+1
FROM #Result AS R
INNER JOIN MyTable S ON S.ParentID = R.ChildID
WHERE LEVEL = SCOPE_IDENTITY();
--for item #2 above, I am thinking to add this, but I am sure it is not the best idea. anythought?
--the problem with this is when non of the nodes with IsDirty=1 then it takes at minimum, the same amount of time to traverse all nodes
IF EXISTS(SELECT ChildID FROM #Result WHERE LEVEL=SCOPE_IDENTITY() AND IsDirty=1)
BREAK;
END
SET IDENTITY_INSERT #Result OFF;
SELECT * FROM #Result

Maybe...if you do try that, do it like this, in the WHILE...and add an index on ISDIRTY in #RESULTS

WHILE @@ROWCOUNT>0 AND NOT EXISTS(SELECT 1 FROM #RESULTS WHERE ISDIRTY=1)

INSERT #RESULTS(ID,PARENTID,CHILDID,LEVEL,ISDIRTY)

SELECT S.ID,S.PARENTID,S.CHILDID,SCOPE_IDENTITY()+1,S.ISDIRTY

FROM #RESULTS AS R

JOIN #SOURCE AS S

ON S.PARENTID = R.CHILDID

WHERE LEVEL = SCOPE_IDENTITY();

What is bugging me is why you are seeing such long delays in inserting into #results - that is really odd.

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Forum Etiquette: How to post Reporting Services problems

[/url]Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]How to Post Performance Problems - by Gail Shaw
[/url]

Hall of Fame

Points: 3734

October 11, 2012 at 6:35 pm

mister.magoo (10/11/2012)
haiao2000 (10/11/2012)
Maybe this will give another angle to look at. There are 2 ways I can benefit from this query if I can get it runs fast enough.
1) get all the tree nodes (which we have been trying, so far it is not fast enough)
2) just end as soon as at least one of the nodes has a modification status of 1 (IsDirty=1)? Which is the best way to handle this?

CREATE TABLE #Result(ID INT NOT NULL, ParentID UNIQUEIDENTIFIER NOT NULL, ChildID UNIQUEIDENTIFIER, IsDirty Bit, LEVEL INT IDENTITY(1,1) NOT NULL);
CREATE INDEX #IXS1 ON #Result(LEVEL) INCLUDE(ChildID);
CREATE INDEX #IXS2 ON #Result(IsDirty);
SET IDENTITY_INSERT #Result ON;
--pop the top level node to the temp table
INSERT #Result(ID,ParentID,ChildID,IsDirty,LEVEL)
SELECT ID,ParentID,ChildID,IsDirty,1 FROM MyTable WHERE ParentID = @ObjectID;
--recursive until no more children
WHILE @@ROWCOUNT > 0 BEGIN
INSERT #Result(ID,ParentID,ChildID,IsDirty, LEVEL)
SELECT S.ID, S.ParentID,S.ChildID,S.IsDirty, SCOPE_IDENTITY()+1
FROM #Result AS R
INNER JOIN MyTable S ON S.ParentID = R.ChildID
WHERE LEVEL = SCOPE_IDENTITY();
--for item #2 above, I am thinking to add this, but I am sure it is not the best idea. anythought?
--the problem with this is when non of the nodes with IsDirty=1 then it takes at minimum, the same amount of time to traverse all nodes
IF EXISTS(SELECT ChildID FROM #Result WHERE LEVEL=SCOPE_IDENTITY() AND IsDirty=1)
BREAK;
END
SET IDENTITY_INSERT #Result OFF;
SELECT * FROM #Result
Maybe...if you do try that, do it like this, in the WHILE...and add an index on ISDIRTY in #RESULTS
WHILE @@ROWCOUNT>0 AND NOT EXISTS(SELECT 1 FROM #RESULTS WHERE ISDIRTY=1)
INSERT #RESULTS(ID,PARENTID,CHILDID,LEVEL,ISDIRTY)
SELECT S.ID,S.PARENTID,S.CHILDID,SCOPE_IDENTITY()+1,S.ISDIRTY
FROM #RESULTS AS R
JOIN #SOURCE AS S
ON S.PARENTID = R.CHILDID
WHERE LEVEL = SCOPE_IDENTITY();
What is bugging me is why you are seeing such long delays in inserting into #results - that is really odd.

Great that works for (2). I am not sure why there is such delay either , but I thought it is reasonable amount of time since it inserted roughly 35k records also it needs to insert into index tables. the biggest delay is on the index seek for source table?

Any other mehod beside using temp table? I understand that we want to take advance of indexing on temp table

Hall of Fame

Points: 3734

October 11, 2012 at 6:48 pm

I upload new execution plan . if someone can help identify one more break thru like MM did would be a home run for me.

Thanks in advance.

SSC Guru

Points: 1004434

October 11, 2012 at 10:50 pm

haiao2000 (10/11/2012)
I upload new execution plan . if someone can help identify one more break thru like MM did would be a home run for me.
Thanks in advance.

Is this the same data as on your other post???

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Hall of Fame

Points: 3734

October 12, 2012 at 7:42 am

Jeff Moden (10/11/2012)
haiao2000 (10/11/2012)
I upload new execution plan . if someone can help identify one more break thru like MM did would be a home run for me.
Thanks in advance.
Is this the same data as on your other post???

Jeff,

Yes it is the same, it just sometime I ran agains small dataset vs. a larger one. the last execution plan was ran against the large dataset. This one has 4 levels. With 34,689 records return in around 49 seconds. The table has 10,831,321 records

SSC-Forever

Points: 47174

October 12, 2012 at 11:36 am

Hi, I am going to post a test script so we can be sure we are comparing apples with apples...

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Forum Etiquette: How to post Reporting Services problems

[/url]Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]How to Post Performance Problems - by Gail Shaw
[/url]

SSC-Forever

Points: 47174

October 12, 2012 at 6:32 pm

Here is my sample data creation: 10,000,000 + rows in tempdb - you may want to put it somewhere else...it's about 1GB of data.

Oh, and it relies on a TALLY table called TALLY with a column called N.

See here for one if you need it : http://www.sqlservercentral.com/scripts/Advanced+SQL/62486/

-- Include this if you want to rebuild the source table later - takes about 2 to 3 minutes

-- DROP TABLE #SOURCE;

IF OBJECT_ID('TEMPDB..#SOURCE') IS NULL

BEGIN

CREATE TABLE #SOURCE(ID INT IDENTITY(1,1) NOT NULL, PARENTID UNIQUEIDENTIFIER NOT NULL, CHILDID UNIQUEIDENTIFIER, ISDIRTY CHAR(1) DEFAULT('N'));

CREATE UNIQUE CLUSTERED INDEX #IX_CLUST ON #SOURCE (PARENTID,CHILDID);

-- CREATE LEVEL 1 <- 100 ROWS

INSERT #SOURCE(PARENTID,CHILDID)

SELECT TOP 100 NEWID(),NEWID()

FROM TALLY

-- CREATE LEVEL 2 <- ~ 10,000 ROWS

INSERT #SOURCE(PARENTID,CHILDID)

SELECT CHILDID,NEWID()

FROM #SOURCE,TALLY

WHERE N<100

-- CREATE LEVEL 3 <- ~ 100,000 ROWS

INSERT #SOURCE(PARENTID,CHILDID)

SELECT CHILDID,NEWID()

FROM #SOURCE,TALLY

WHERE N<100

AND ID >100

-- CREATE LEVEL 4 <- ~ 10,000,000 ROWS

INSERT #SOURCE(PARENTID,CHILDID,ISDIRTY)

SELECT CHILDID,NEWID(),'Y'

FROM #SOURCE,TALLY

WHERE N<10

AND ID >10100

-- Add an index just to make the selection of a starting point quicker

CREATE INDEX #IX_ID ON #SOURCE (ID);

END

And the results table :

-- Prepare a temp table to hold the result set

IF OBJECT_ID('TEMPDB..#RESULTS') IS NOT NULL

DROP TABLE #RESULTS;

CREATE TABLE #RESULTS(ID INT NOT NULL, PARENTID UNIQUEIDENTIFIER NOT NULL, CHILDID UNIQUEIDENTIFIER, LEVEL INT IDENTITY(1,1) NOT NULL,ISDIRTY CHAR(1)) ;

-- Add an index to aid with recursion

CREATE INDEX #IXS ON #RESULTS(LEVEL) INCLUDE(CHILDID);

-- Add an index to help with the decision of when to stop - when ISDIRTY="Y"

CREATE INDEX #IXDIRTY ON #RESULTS(ISDIRTY);

-- We need this to be able to insert the LEVEL value

SET IDENTITY_INSERT #RESULTS ON;

Now add the first level to the results table:

-- Get the starting point, arbritrarily chosen as row with ID=1 - they all have the same number of children, so it doesn't matter.

DECLARE @START UNIQUEIDENTIFIER;

SELECT TOP 1 @START = PARENTID FROM #SOURCE ORDER BY ID OPTION(FAST 1);

-- INSERT the first row in our hierarchy - level 1

INSERT #RESULTS(ID,PARENTID,CHILDID,LEVEL,ISDIRTY)

SELECT ID,PARENTID,CHILDID,1,ISDIRTY

FROM #SOURCE

WHERE PARENTID=@START;

And process the rest...

-- Do a dirty old loop (only iterates once per level in the hierarchy) while we haven't hit out stop condition and there were rows found

WHILE @@ROWCOUNT>0 AND NOT EXISTS(SELECT 1 FROM #RESULTS WHERE ISDIRTY='Y')

-- pick up the last inserted set and go find their children to insert to the table with an incremented LEVEL

INSERT #RESULTS(ID,PARENTID,CHILDID,LEVEL,ISDIRTY)

SELECT S.ID,S.PARENTID,S.CHILDID,SCOPE_IDENTITY()+1,S.ISDIRTY

FROM #RESULTS AS R

JOIN #SOURCE AS S

ON S.PARENTID = R.CHILDID

WHERE LEVEL = SCOPE_IDENTITY()

OPTION( RECOMPILE );

-- the WHILE loop only repeats the INSERT, nothing else

SET IDENTITY_INSERT #RESULTS OFF;

With this data, you will see the highest wait times at level 3 and 4, but if you stop the query at level 3 by adding

AND SCOPE_IDENTITY()<3

to the WHILE condition, then run this - the level 4 SELECT:

SELECT S.ID,S.PARENTID,S.CHILDID,S.ISDIRTY

FROM #RESULTS AS R

JOIN #SOURCE AS S

ON S.PARENTID = R.CHILDID

WHERE LEVEL = 3

You will see that it takes a long time (15-20 secs) just to select the data.

I get this execution plan: http://www.sqlservercentral.com/Forums/Attachment12601.aspx

If something can be done to speed up this select, then we've cracked it - but at the moment I cannot see how.

The fact that that last SELECT takes 15-20 secs with a clean buffer, but zero seconds once the data is cached kind of points towards there just being a physical limit on the speed of reading that many rows from this sort of data? maybe? Or is it just a matter of the join being slow? I can't see the wood for the trees now, so maybe one of the more intelligent beings can help...

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Forum Etiquette: How to post Reporting Services problems

[/url]Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]How to Post Performance Problems - by Gail Shaw
[/url]

Hall of Fame

Points: 3734

October 12, 2012 at 10:35 pm

mister.magoo (10/12/2012)
Here is my sample data creation: 10,000,000 + rows in tempdb - you may want to put it somewhere else...it's about 1GB of data.
Oh, and it relies on a TALLY table called TALLY with a column called N.
See here for one if you need it : http://www.sqlservercentral.com/scripts/Advanced+SQL/62486/
-- Include this if you want to rebuild the source table later - takes about 2 to 3 minutes
-- DROP TABLE #SOURCE;
IF OBJECT_ID('TEMPDB..#SOURCE') IS NULL
BEGIN
CREATE TABLE #SOURCE(ID INT IDENTITY(1,1) NOT NULL, PARENTID UNIQUEIDENTIFIER NOT NULL, CHILDID UNIQUEIDENTIFIER, ISDIRTY CHAR(1) DEFAULT('N'));
CREATE UNIQUE CLUSTERED INDEX #IX_CLUST ON #SOURCE (PARENTID,CHILDID);
-- CREATE LEVEL 1 <- 100 ROWS
INSERT #SOURCE(PARENTID,CHILDID)
SELECT TOP 100 NEWID(),NEWID()
FROM TALLY
-- CREATE LEVEL 2 <- ~ 10,000 ROWS
INSERT #SOURCE(PARENTID,CHILDID)
SELECT CHILDID,NEWID()
FROM #SOURCE,TALLY
WHERE N<100
-- CREATE LEVEL 3 <- ~ 100,000 ROWS
INSERT #SOURCE(PARENTID,CHILDID)
SELECT CHILDID,NEWID()
FROM #SOURCE,TALLY
WHERE N<100
AND ID >100
-- CREATE LEVEL 4 <- ~ 10,000,000 ROWS
INSERT #SOURCE(PARENTID,CHILDID,ISDIRTY)
SELECT CHILDID,NEWID(),'Y'
FROM #SOURCE,TALLY
WHERE N<10
AND ID >10100
-- Add an index just to make the selection of a starting point quicker
CREATE INDEX #IX_ID ON #SOURCE (ID);
END
And the results table :
-- Prepare a temp table to hold the result set
IF OBJECT_ID('TEMPDB..#RESULTS') IS NOT NULL
DROP TABLE #RESULTS;
CREATE TABLE #RESULTS(ID INT NOT NULL, PARENTID UNIQUEIDENTIFIER NOT NULL, CHILDID UNIQUEIDENTIFIER, LEVEL INT IDENTITY(1,1) NOT NULL,ISDIRTY CHAR(1)) ;
-- Add an index to aid with recursion
CREATE INDEX #IXS ON #RESULTS(LEVEL) INCLUDE(CHILDID);
-- Add an index to help with the decision of when to stop - when ISDIRTY="Y"
CREATE INDEX #IXDIRTY ON #RESULTS(ISDIRTY);
-- We need this to be able to insert the LEVEL value
SET IDENTITY_INSERT #RESULTS ON;
Now add the first level to the results table:
-- Get the starting point, arbritrarily chosen as row with ID=1 - they all have the same number of children, so it doesn't matter.
DECLARE @START UNIQUEIDENTIFIER;
SELECT TOP 1 @START = PARENTID FROM #SOURCE ORDER BY ID OPTION(FAST 1);
-- INSERT the first row in our hierarchy - level 1
INSERT #RESULTS(ID,PARENTID,CHILDID,LEVEL,ISDIRTY)
SELECT ID,PARENTID,CHILDID,1,ISDIRTY
FROM #SOURCE
WHERE PARENTID=@START;
And process the rest...
-- Do a dirty old loop (only iterates once per level in the hierarchy) while we haven't hit out stop condition and there were rows found
WHILE @@ROWCOUNT>0 AND NOT EXISTS(SELECT 1 FROM #RESULTS WHERE ISDIRTY='Y')
-- pick up the last inserted set and go find their children to insert to the table with an incremented LEVEL
INSERT #RESULTS(ID,PARENTID,CHILDID,LEVEL,ISDIRTY)
SELECT S.ID,S.PARENTID,S.CHILDID,SCOPE_IDENTITY()+1,S.ISDIRTY
FROM #RESULTS AS R
JOIN #SOURCE AS S
ON S.PARENTID = R.CHILDID
WHERE LEVEL = SCOPE_IDENTITY()
OPTION( RECOMPILE );
-- the WHILE loop only repeats the INSERT, nothing else
SET IDENTITY_INSERT #RESULTS OFF;
With this data, you will see the highest wait times at level 3 and 4, but if you stop the query at level 3 by adding
AND SCOPE_IDENTITY()<3
to the WHILE condition, then run this - the level 4 SELECT:
SELECT S.ID,S.PARENTID,S.CHILDID,S.ISDIRTY
FROM #RESULTS AS R
JOIN #SOURCE AS S
ON S.PARENTID = R.CHILDID
WHERE LEVEL = 3
You will see that it takes a long time (15-20 secs) just to select the data.
I get this execution plan: http://www.sqlservercentral.com/Forums/Attachment12601.aspx
If something can be done to speed up this select, then we've cracked it - but at the moment I cannot see how.
The fact that that last SELECT takes 15-20 secs with a clean buffer, but zero seconds once the data is cached kind of points towards there just being a physical limit on the speed of reading that many rows from this sort of data? maybe? Or is it just a matter of the join being slow? I can't see the wood for the trees now, so maybe one of the more intelligent beings can help...

Hi MM.Thank you for continue looking into this, I was experiencing another method,

-I created a separated table to hold materialized path with 2 fields (ID, Path) reason I used separated table because a child can have multiple parents

-added 2 indexes to both fields ID and Path

-then use your identity hack method to populate the materialized path table by passing the top most ParentId to it.It runs remarkably fast.

-now with materialized path i can query the source table using ID (indexed) which yield good performance definitely beat the other method.

SELECT @PATH = Path from MaterializedPath WHERE ID=@ID)

SELECT ParentID, ChildID

FROM #Source Where ID in

(Select ID from MaterializedPath where Path like @PATH+'%')

Viewing 13 posts - 31 through 42 (of 42 total)

You must be logged in to reply to this topic. Login to reply