January 27, 2020 at 2:04 pm
Hi all
I've got a table that contains 1.8million (yes, million) records in a parent/child relationship.
Now, children can also be parents and have their own children (if that makes sense).
What I need to do is take a top-level parent (call it level 1) and then list all the children under it (with the appropriate level number).
Either that or start with the lowest child value and work up the "tree" to get the top level parent.
In either case, I need to list all the intermediate levels and number them accordingly.
I've done some googling and come up with the following code:-
;WITH cte AS
(
SELECT
a.IsActive
,Child= a.SourceID
,Parent = a.DestinationID
,Level= 1
FROM
Lookups.dbo.tbl_SCT2_Relationships AS a
WHERE
a.TypeID= 116680003
--AND a.SourceID IN
--(83821000000107, 100000000, 100005005)
UNION ALL
SELECT
a.IsActive
,Child= a.SourceID
,Parent = a.DestinationID
,Level= c.Level + 1
FROM
Lookups.dbo.tbl_SCT2_RelationshipsAS a
JOIN cteAS c
ON a.SourceID = c.Parent
)
SELECTDISTINCT
c1.Parent
,c1.Child
,c1.IsActive
,c1.Level
,c2.Level
FROM
cteAS c1
LEFT JOIN cte AS c2
ON c1.Child = c2.Parent
ORDER BY
c1.Parent
,c1.Child
OPTION (MAXRECURSION 0)
If I run it with just a few records (see the numbers commented out in the first part of the CTE) it runs in around 40 seconds. I know, that's fairly slow but gives me exactly what I want.
If I run it against the full table (around 1.8million records), it runs forever. It's currently been running for 4 hours with no end in sight.
I was looking at putting separate indexes on the parent and child columns in the original table but I'm not sure how much extra that will give (especially as the execution plan isn't recommending any).
Anyone any ideas on how I can speed this up?
Thanks in advance
Richard
January 27, 2020 at 2:45 pm
what indexes do you have on Lookups.dbo.tbl_SCT2_Relationships? what does the execution plan look like?
For better, quicker answers, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
January 27, 2020 at 2:57 pm
Hi Mike
Here's the table:-
USE [Lookups]
GO
/****** Object: Table [dbo].[tbl_SCT2_Relationships] Script Date: 27/01/2020 14:53:52 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[tbl_SCT2_Relationships](
[pkID] [BIGINT] NOT NULL,
[EffectiveDate] [DATE] NOT NULL,
[IsActive] [BIT] NOT NULL,
[ModuleID] [BIGINT] NOT NULL,
[SourceID] [BIGINT] NOT NULL,
[DestinationID] [BIGINT] NOT NULL,
[RelationshipGroup] [INT] NOT NULL,
[TypeID] [BIGINT] NOT NULL,
[CharacteristicTypeID] [BIGINT] NOT NULL,
[ModifierID] [BIGINT] NOT NULL,
[SYSDateLoaded] [DATETIME] NOT NULL,
[SYSDateLastUpdated] [DATETIME] NOT NULL,
[SYSIsDeleted] [BIT] NOT NULL,
CONSTRAINT [PK__tbl_sct2__3213E83FA2182033] PRIMARY KEY CLUSTERED
(
[pkID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[tbl_SCT2_Relationships] ADD DEFAULT (GETDATE()) FOR [SYSDateLoaded]
GO
ALTER TABLE [dbo].[tbl_SCT2_Relationships] ADD DEFAULT (GETDATE()) FOR [SYSDateLastUpdated]
GO
ALTER TABLE [dbo].[tbl_SCT2_Relationships] ADD DEFAULT ((0)) FOR [SYSIsDeleted]
GO
And here the only index:-
USE Lookups;
GO
/****** Object: Index [idx_tbl_SCT2_Relationships_TypeID_SYSIsDeleted_DestinationID] Script Date: 27/01/2020 14:54:10 ******/
CREATE NONCLUSTERED INDEX idx_tbl_SCT2_Relationships_TypeID_SYSIsDeleted_DestinationID
ON dbo.tbl_SCT2_Relationships(TypeID ASC, SYSIsDeleted ASC, DestinationID ASC, SourceID ASC)
INCLUDE(IsActive)
WITH(PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON
,FILLFACTOR = 90
)
ON [PRIMARY];
GO
It won't let me add the execution plan (as an exeuction plan file) as it's a banned file-type.
I've changed the extension to ".txt" so you'll need to change it back to ".sqlplan" to open it.
Cheers
Richard
January 27, 2020 at 3:20 pm
I think performance might improve if you add an index on SourceId including other columns in the query.
Try this and see if there is an improvement:
CREATE INDEX IX_tbl_SCT2_Relationships_1
ON dbo.tbl_SCT2_Relationships(SourceID)
INCLUDE (IsActive, DestinationID, TypeID);
January 27, 2020 at 3:28 pm
What you've created as part of the CTE JOIN CTE is joining 1.8million records to 1.8million records where the engine has to build both 1.8 million record sets twice.
The expression is built at run time, so "FROM CTE" builds the 1st version of the result set "JOIN CTE" builds the next version of the same result set, then the "ON x = y" has to process the joins which is all done in memory.
If you where to build the expression and dump it to a #table and joined that instead, you're only expressing the data once (not twice) which will speed up half of the problem.
;WITH cte AS
(
SELECT
a.IsActive
,Child= a.SourceID
,Parent = a.DestinationID
,Level= 1
FROM
Lookups.dbo.tbl_SCT2_Relationships AS a
WHERE
a.TypeID= 116680003
--AND a.SourceID IN
--(83821000000107, 100000000, 100005005)
UNION ALL
SELECT
a.IsActive
,Child= a.SourceID
,Parent = a.DestinationID
,Level= c.Level + 1
FROM
Lookups.dbo.tbl_SCT2_RelationshipsAS a
JOIN cteAS c
ON a.SourceID = c.Parent
)
SELECT * INTO #TEMP FROM cte
SELECTDISTINCT
c1.Parent
,c1.Child
,c1.IsActive
,c1.Level
,c2.Level
FROM
#tempAS c1
LEFT JOIN #temp AS c2
ON c1.Child = c2.Parent
ORDER BY
c1.Parent
,c1.Child
OPTION (MAXRECURSION 0)
DROP TABLE #TEMP
January 27, 2020 at 3:30 pm
Thanks Jonathan
My smaller version of the query (using just 3 IDs) has now gone from 40 seconds to almost instant (less than 2 seconds).
I'm now doing a full against all the records and I'll let you know what happens.
Thanks Anthony
I'll give that a go as well.
January 27, 2020 at 3:47 pm
It's a good idea from Anthony, you might get even better performance if you also add an index to #TEMP:
IF OBJECT_ID('tempdb..#TEMP') IS NOT NULL
DROP TABLE #TEMP;
;WITH cte AS
(
...
...
)
SELECT *
INTO #TEMP
FROM cte;
/* One of the two indexes below */
CREATE INDEX IX_#TEMP_1
ON #TEMP(Parent)
INCLUDE (Child, IsActive, Level);
CREATE INDEX IX_#TEMP_2
ON #TEMP(Child)
INCLUDE (Parent, IsActive, Level);
SELECT DISTINCT
...
;
January 27, 2020 at 3:50 pm
Thanks all
I'll try out the ideas and let you know when I finally get this thing to run.
January 28, 2020 at 11:35 am
hey,
hold on - wasn't this why hierarchyID types were invented ?
you can even use persisted computed columns to make it nice and fast.
you would need a 1 time hit on your database for a new column and calculate it's position in the hierarchy then you have access to Hierarchycolumn.getlevel(), .getancestor(), .getdescendants() etc etc - you can even reparent entire sections of the tree.
no need for a CTE at all
MVDBA
January 28, 2020 at 1:13 pm
hey,
hold on - wasn't this why hierarchyID types were invented ?
you can even use persisted computed columns to make it nice and fast.
you would need a 1 time hit on your database for a new column and calculate it's position in the hierarchy then you have access to Hierarchycolumn.getlevel(), .getancestor(), .getdescendants() etc etc - you can even reparent entire sections of the tree.
no need for a CTE at all
I was thinking that but I've never used hierarchyid so can't give a solution.
Maybe you could provide some code that would do the job for this question?
January 28, 2020 at 1:55 pm
might take me a little while just to pull all of the examples and test data together - it's 2 pm and I've not had lunch yet... my laptop is on a go "slow" session as it's shrinking files and the SSDs can't cope - give me an hour
MVDBA
January 28, 2020 at 2:05 pm
I haven't heard of the HeirarchyID datatype, so can you give me some pointers please?
I've also tried using a temp table (with indexing) against the full dataset but my tempdb went bananas.
Just to add a possible curveball, I'll need all the parents for any given code all the way to the top of the "tree".
Not sure if that will cause any issues, but thought I'd better mention it.
January 28, 2020 at 2:14 pm
hey,
hold on - wasn't this why hierarchyID types were invented ?
you can even use persisted computed columns to make it nice and fast.
you would need a 1 time hit on your database for a new column and calculate it's position in the hierarchy then you have access to Hierarchycolumn.getlevel(), .getancestor(), .getdescendants() etc etc - you can even reparent entire sections of the tree.
no need for a CTE at all
I've found that HierarchyID relatively sucks especially if "repairs" to the structure need to be made. 😀 Nested Sets have worked much better for me. Whatever you do, don't use the classic "push stack" to build them. It will take several days (literally) to execute even just for a million rows. Please see the following article which also has demonstrable code that creates a million node hierarchy to test against. The second article shows a possible alternative that takes the same amount of time to execute and pre-aggregates most answers that you might ask of a Parent-Child table.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 28, 2020 at 2:18 pm
ok here is the most basic stuff you need - the rest you can google and find lots of cool code
CREATE TABLE #htest (id INT IDENTITY, hierarchyvalue hierarchyid PRIMARY key, sometext VARCHAR(500) )
DECLARE @h HIERARCHYID
SET @h=hierarchyid::GetRoot()
INSERT INTO #htest (hierarchyvalue,sometext) SELECT @h,'this is the root node - the cheif exec of the business'
INSERT INTO #htest (hierarchyvalue,sometext) SELECT @h.GetDescendant(NULL,NULL),'line manager number 1'
SELECT *,hierarchyvalue.GetLevel() AS lvl,hierarchyvalue.ToString() AS location FROM #htest
DROP TABLE #htest
basically I create a table (temp table just for the demo) - I add a "ROOT" node (@h) using ::GetRoot() - be carefull this is case sensitive - this creates my parent
next I add a child (line manager number 1) to the data using "GetDescendant of the node value for the chief exec
have a look at the query I use in the select statement - I have .ToString (shows me a pretty picture of where the row sits in the structure)
/ = root
/1 = first child
/2= sister or brother of the first child
/1.1=first child of /1
also notice .GetLevel() - it shows you the level in the tree
To query the tree you can use functions like "isdescendantof" (is it anywhere under the parent you are looking at ) - combine this with getlevel and you can do really cool stuff
read the MS documentation around some of these functions - I don't want to re-type them all 🙂 but have fun learning
if you want help on a specific function of HierarchyID stuff then PM me.. ( getdescendant can be tricky)
MVDBA
January 28, 2020 at 2:29 pm
MVDBA (Mike Vessey) wrote:hey,
hold on - wasn't this why hierarchyID types were invented ?
you can even use persisted computed columns to make it nice and fast.
you would need a 1 time hit on your database for a new column and calculate it's position in the hierarchy then you have access to Hierarchycolumn.getlevel(), .getancestor(), .getdescendants() etc etc - you can even reparent entire sections of the tree.
no need for a CTE at all
I've found that HierarchyID relatively sucks especially if "repairs" to the structure need to be made. 😀 Nested Sets have worked much better for me. Whatever you do, don't use the classic "push stack" to build them. It will take several days (literally) to execute even just for a million rows. Please see the following article which also has demonstrable code that creates a million node hierarchy to test against. The second article shows a possible alternative that takes the same amount of time to execute and pre-aggregates most answers that you might ask of a Parent-Child table.
Hi jeff
I found the HierarchyID stuff "waterproofed" me from developer idiots who allowed the data (id, managerid type scenario) to create an infinite loop where the janitor is the boss of the CTO and we loop forever - even maxrecursion on a CTO cant fix developers getting it wrong
I found the cleanup of using Int values for managers was even harder to repair. you can put a unique constraint on the hierarchy to make sure it's all nice and clean... and it's not possible to make my grandson my father. (although there are parts of the UK we think that is possible)
I wouldn't dismiss it out of hand - maybe not fast, but it does protect you.
and if you ever need to reposition columns in a table on screen they are amazing - rather than shuffle a "display order" field using a bubble sort - you simply just just move the hierarchy object to "somewhere" between the objects you want - no need to touch the other rows . Hierarchy stuff doesn't always have to be vertical (familly tree) - it can be horizontal (gant chart style)
MVDBA
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply