October 4, 2012 at 10:21 am
Hello All,
I am seeking for an alternate method to Common table Expression. In the nutshell, I like to drill down a tree structure to get all IDs of data nodes who data have been modified in the last x days. I used CTE for this, it works however for the size of database we have it takes few minutes to completed, which is not acceptable for us, I want to see if this can be done in seconds instead. Here are some facts about our database, this is just simplified DDL our actual tables use UNIQUEIDENTIFIER as datatype instead of INT and each table have alot more fields than in this example. The MyUpdateTracking contains around 29million records, MyTree, MyData ~ 11millon records each.
Thank you in advance for anyone dare to help me on this.
Regards
CREATE TABLE MyOwner(
OwnerID int,
Name varchar(50)
)
GO
insert into MyOwner(OwnerID,Name)
select 1,'Owner1' union all
select 2,'Owner2' union all
select 3,'Owner3' union all
select 4,'Owner4'
CREATE TABLE MyData(
DataID int ,
DataName varchar(50),
OwnerID int
)
GO
insert into MyData(DataID,DataName,OwnerID)
select 1,'Data1',1 union all
select 2,'Data2',2 union all
select 3,'Data3',3 union all
select 4,'Data4',4 union all
select 5,'Data5',4 union all
select 6,'Data6',4 union all
select 7,'Data7',4
CREATE TABLE MyTree(
ID int identity(1,1) ,
ParentDataID int,
ChildDataID int
)
GO
insert into MyTree(ParentDataID,ChildDataID)
select 1,2 union all
select 2,3 union all
select 2,5 union all
select 2,6 union all
select 3,4 union all
select 3,7
CREATE TABLE MyUpdateTracking(
DataID int,
UpdateID int
)
GO
insert into MyUpdateTracking(DataID,UpdateID)
select 1,1 union all
select 2,2 union all
select 3,3 union all
select 4,3 union all
select 5,3 union all
select 6,3 union all
select 7,3
CREATE TABLE MyUpdateEvent(
UpdateID int,
TimeStart datetime
)
GO
insert into MyUpdateEvent(UpdateID,UpdatedDate)
select 1,GETDATE() union all
select 2,GETDATE() union all
select 3,GETDATE()
DECLARE @NodeID int
set @NodeID = 1;
WITH TempCTE AS (
SELECT ParentDataID, ChildDataID, 0 as Depth
FROM MyTree
WHERE ParentDataID=@NodeID
UNION ALL
SELECT R.ParentDataID, R.ChildDataID, Depth + 1
FROM MyTree R
INNER JOIN TempCTE CTE on CTE.ChildDataID = R.ParentDataID
)
select CTE.ChildDataID,CTE.Depth, D.DataName,e.UpdatedDate
from TempCTE CTE
inner join MyData D on D.DataID=CTE.ChildDataID
inner join MyOwner O on O.OwnerID=D.OwnerID
inner join MyUpdateTracking T on T.DataID=CTE.ChildDataID
inner join MyUpdateEvent E on e.UpdateID=T.UpdateID
where DATEDIFF(DAY, E.UpdatedDate , GETDATE()) <= 3;
October 4, 2012 at 10:24 am
Trees in sql require a little different modeling and query strategies.
a quick google search on "Trees in SQL" reveals tons of articles.
Joe Celko's book on the exact subject discusses this.
here's a good read also.
October 4, 2012 at 10:37 am
What are the indexes on thoee tables, specifically mytree? Try running the results of the rCTE into a #temp table and joining the other tables to it.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
October 4, 2012 at 11:02 am
ChrisM@home (10/4/2012)
What are the indexes on thoee tables, specifically mytree? Try running the results of the rCTE into a #temp table and joining the other tables to it.
Chris,
There are both clusteredd & non clustered indexes on both ParentDataID, ChildDataID, UpdatedDate.
I will try your suggestion
Updated: I modified to use temp table but the performace is not improved significantly
Thanks!
October 4, 2012 at 11:03 am
Ray M (10/4/2012)
Trees in sql require a little different modeling and query strategies.a quick google search on "Trees in SQL" reveals tons of articles.
Joe Celko's book on the exact subject discusses this.
here's a good read also.
Thanks, I will look at the article.
Updated: I read the article, however having triggers behind these tables is probably not a good idea for my case as that will add extra overhead to our app as it is already heavy. we have schedule job that runs periodically to import data to the database
October 4, 2012 at 12:33 pm
Without a copy of the actual execution plan, it's impossible to tell for sure but I do see one potential problem that guarantees and index or table scan because of the way it's written. Here's that code snippet...
where DATEDIFF(DAY, E.UpdatedDate , GETDATE()) <= 3;
Like I said, that code guarantees an index scan of at least the table the "E" alias is pointing at. The code should be rewritten as follows to allow for an index seek (if you have a proper index for it) which should beat the tar out of the scan.
WHERE E.UpdatedDate >= DATEADD(dd,-3,GETDATE())
--Jeff Moden
Change is inevitable... Change for the better is not.
October 4, 2012 at 2:00 pm
Jeff Moden (10/4/2012)
Without a copy of the actual execution plan, it's impossible to tell for sure but I do see one potential problem that guarantees and index or table scan because of the way it's written. Here's that code snippet...
where DATEDIFF(DAY, E.UpdatedDate , GETDATE()) <= 3;
Like I said, that code guarantees an index scan of at least the table the "E" alias is pointing at. The code should be rewritten as follows to allow for an index seek (if you have a proper index for it) which should beat the tar out of the scan.
WHERE E.UpdatedDate >= DATEADD(dd,-3,GETDATE())
Still no improvement at all.
October 4, 2012 at 2:11 pm
haiao2000 (10/4/2012)
Jeff Moden (10/4/2012)
Without a copy of the actual execution plan, it's impossible to tell for sure but I do see one potential problem that guarantees and index or table scan because of the way it's written. Here's that code snippet...
where DATEDIFF(DAY, E.UpdatedDate , GETDATE()) <= 3;
Like I said, that code guarantees an index scan of at least the table the "E" alias is pointing at. The code should be rewritten as follows to allow for an index seek (if you have a proper index for it) which should beat the tar out of the scan.
WHERE E.UpdatedDate >= DATEADD(dd,-3,GETDATE())
Maybe you could add the execution plan!
Still no improvement at all.
October 4, 2012 at 2:19 pm
Ignacio A. Salom Rangel (10/4/2012)
haiao2000 (10/4/2012)
Jeff Moden (10/4/2012)
Without a copy of the actual execution plan, it's impossible to tell for sure but I do see one potential problem that guarantees and index or table scan because of the way it's written. Here's that code snippet...
where DATEDIFF(DAY, E.UpdatedDate , GETDATE()) <= 3;
Like I said, that code guarantees an index scan of at least the table the "E" alias is pointing at. The code should be rewritten as follows to allow for an index seek (if you have a proper index for it) which should beat the tar out of the scan.
WHERE E.UpdatedDate >= DATEADD(dd,-3,GETDATE())
Maybe you could add the execution plan!
Still no improvement at all.
See attachment. notice it is ran against our actual db, so it will not look the same as posted DDL but the concept is the same.
Updated. no answer...attachment deleted. 🙂
Thanks!
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply