November 1, 2012 at 4:30 am
Hi,
I am writing a script to update bad data in a column of a table. The Table has 566787 rows and this script is taking forever to complete. It updated 1000 records in 5 mins which I think is too much.
There are two triggers and 4 indexes on this table but for now I have deleted the indexes and disabled the triggers to speed up the process
-- The script updates the FullPath column of the table according to the values
-- in Title column and hirNodeParent Column
DECLARE
@hirNode int,
@hirNodeParent int,
@Title varchar(500),
@Return varchar(500),
@LasthirNode int,
@Node int,
@FullPath varchar(1000)
Select hirNode into #hirNodes from hirNodes
While exists (Select * From #hirNodes)
Begin
Set @hirNode = (Select Top 1 hirnode from #hirNodes)
Set @Node = @hirNode
Set @FullPath = (select FullPath from hirNodes where hirNode = @hirNode)
SET @Return = ''
WHILE @hirNode > 0
BEGIN
SELECT @hirNodeParent = hirNodeParent, @Title = Title
FROM hirNodes
WHERE hirNode = @hirNode
SET @LasthirNode = @hirNode
SET @Return = @Title + '\' + @Return
SET @hirNode = @hirNodeParent
END
SET @Return = '\\' + @Return
if(@FullPath <> @Return)
begin
Update hirNodes set FullPath = @Return where hirNode = @Node
end
Delete from #hirNodes where hirNode = @Node
print(@Node)
End
Drop table #hirNodes
--Table Script
CREATE TABLE [dbo].[hirNodes](
[hirNode] [int] IDENTITY(1,1) NOT NULL,
[hirNodeParent] [int] NULL,
[hirLevel] [int] NOT NULL,
[Title] [varchar](500) NULL,
[Brief] [varchar](16) NULL,
[Description] [varchar](256) NULL,
[Active] [bit] NOT NULL,
[DisplayOrder] [varchar](50) NULL,
[FullPath] [varchar](500) NULL,
CONSTRAINT [PK_hirNodes_1] PRIMARY KEY CLUSTERED
(
[hirNode] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
November 1, 2012 at 4:39 am
The problem is that this is RBAR as your only ever updating 1 row at a time instead of trying to update 566787 records in one set based operation, meaning that the triggers fire 566787 times instead of once etc etc.
If you would provide some sample data prior to the update and what the expected result after the update should be along with the trigger and index definitions there might be a quicker set based operation to performing the task.
November 1, 2012 at 4:40 am
This looks like you're trying to build a hierachy list using a while loop.
Can you provide some sample data for #HirNodes, as I think you're using a sledgehammer to crack nut, and a Recursive CTE may help but with out sample data its guesswork.
_________________________________________________________________________
SSC Guide to Posting and Best Practices
November 1, 2012 at 4:44 am
What you are trying to do will obviously takes time since you have used loop statement which means it will consume more time.
can you give some sample records at-least 5 records for this table.
Thanks!
November 1, 2012 at 4:53 am
I just added the Sample data..
Do you want the code for Triggers?
The triggers only fire on Update of hirNodeParent column and Title column and I am only updating the full path column and anyway I have disabled the triggers for now.
I am attaching the code for indexes but for now I have deleted them too
/****** Object: Index [idx_hirNodes_fullPath] Script Date: 10/31/2012 19:39:16 ******/
CREATE NONCLUSTERED INDEX [idx_hirNodes_fullPath] ON [dbo].[hirNodes]
(
[FullPath] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
GO
/****** Object: Index [idx_hirNodes_hirLevel] Script Date: 10/31/2012 19:39:52 ******/
CREATE NONCLUSTERED INDEX [idx_hirNodes_hirLevel] ON [dbo].[hirNodes]
(
[hirLevel] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
GO
/****** Object: Index [idx_hirNodes_hirNodeParent] Script Date: 10/31/2012 19:41:47 ******/
CREATE NONCLUSTERED INDEX [idx_hirNodes_hirNodeParent] ON [dbo].[hirNodes]
(
[hirNodeParent] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
GO
/****** Object: Index [idx_hirNodes_Title] Script Date: 10/31/2012 19:47:55 ******/
CREATE NONCLUSTERED INDEX [idx_hirNodes_Title] ON [dbo].[hirNodes]
(
[Title] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
GO
November 1, 2012 at 5:13 am
Definately looks like a job for a recursive CTE in order to build the Path then a simple update after, this is a simple test rig for a select, but the update isnt too bad.
drop table #hirNode
go
Create Table #hirNode(
hirNode Int
,hirNodeParent int
,Title varchar(100)
)
Insert into #hirNode
values (1,NULL,'Start')
,(2,1,'Second')
,(3,2,'Third')
,(4,NULL,'Start2')
,(5,4,'Second2');
WITH BuildPath(hirNode,hirNodeParent, NodeLevel, NodePath)
AS (SELECT hirNode,
hirNodeParent,
1,
CONVERT(varchar(255), '\\'+Title)
FROM #hirNode
WHERE hirNodeParent IS NULL --Start at the Top
UNION ALL
SELECT
h.hirNode,
h.hirNodeParent,
NodeLevel + 1,
CONVERT (varchar(255), RTRIM(NodePath) + '\' + Title)
FROM #hirNode AS h
JOIN BuildPath AS p ON h.hirNodeParent = p.hirNode
)
Select * from BuildPath
The output for NodePath is
\\Start
\\Start\Second
\\Start\Second\Third
\\Start2
\\Start2\Second2
It might need ordering but its just an example, is that what you are looking for?
_________________________________________________________________________
SSC Guide to Posting and Best Practices
November 6, 2012 at 5:22 am
Jason this looks right. But how to update FullPath column using Recursive CTE? I have to update all the rows of the table for FullPath column. Actually the right way would be to check if the FullPath column has correct value if not then update it.
November 6, 2012 at 6:38 am
this should work
drop table #hirNode
go
Create Table #hirNode(
hirNode Int
,hirNodeParent int
,Title varchar(100)
)
Insert into #hirNode
values (1,NULL,'Start')
,(2,1,'Second')
,(3,2,'Third')
,(4,NULL,'Start2')
,(5,4,'Second2');
WITH BuildPath(hirNode,hirNodeParent, NodeLevel, NodePath)
AS (SELECT hirNode,
hirNodeParent,
1,
CONVERT(varchar(255), '\\'+Title)
FROM #hirNode
WHERE hirNodeParent IS NULL --Start at the Top
UNION ALL
SELECT
h.hirNode,
h.hirNodeParent,
NodeLevel + 1,
CONVERT (varchar(255), RTRIM(NodePath) + '\' + Title)
FROM #hirNode AS h
JOIN BuildPath AS p ON h.hirNodeParent = p.hirNode
)
Update
hr
Set
FullPath=bp.NodePath
From
hirNodes hr
JOIN BuildPath bp on hr.hirNode=bp.hirNode
but you should check it
_________________________________________________________________________
SSC Guide to Posting and Best Practices
November 6, 2012 at 7:30 am
This works like a charm.. updated 70k records in 24 sec.
Thanks Jason 🙂
November 6, 2012 at 8:22 am
Glad to Help,
You could add a where clause to check the NodePath with the FullPath column eg
WHERE hr.FullPath!=NodePath
Which should reduce the dataset, and save you having unnecessary updates.
_________________________________________________________________________
SSC Guide to Posting and Best Practices
November 7, 2012 at 8:00 am
I just found out that the live server is SQL 2000 and so I can't use Recursive CTE :crying:
I have created this new script that would work on SQL Server 2000. This is way slower than recursive CTE. Please suggest any way of optimizing it.
CREATE TABLE #tbl([hirNode]INT,
[hirNodeParent]INT,
[FullPath]varchar(1000))
INSERT INTO #tbl
SELECT hirNode, hirNodeParent, CONVERT(varchar(1000), '\\'+Title + '\')
FROM hirNodes
WHERE hirNodeParent IS NULL
WHILE EXISTS(SELECT * FROM hirNodes AS t2 join #tbl t1
on t2.hirNodeParent = t1.hirNode
where t2.hirNode NOT IN (SELECT hirNode FROM #tbl))
BEGIN
INSERT INTO #tbl
SELECT t2.hirNode,
t2.hirNodeParent,
CONVERT (varchar(255), RTRIM(t1.FullPath) + Title + '\')
FROM hirNodes AS t2,
#tbl t1
WHERE t2.hirNodeParent = t1.hirNode
AND t2.hirNode NOT IN (SELECT hirNode
FROM #tbl)
END
SELECT hirNodes.hirNode, hirNodes.FullPath, #tbl.FullPath FROM hirNodes
join #tbl on hirNodes.hirNode = #tbl.hirNode
where hirNodes.FullPath <> #tbl.FullPath
Drop Table #tbl
Thanks,
Kavita
November 7, 2012 at 12:25 pm
Kavita,
I know this is not a perfect solution, but it might work for you on SQL Server 2000. I am assuming your hierarchies are at most 7 levels deep. (If they are deeper, you can alter the script to accomodate for additional depth; though this would increase the processing time too)
The idea is to build the list of nodes involved in each hierarchy and concatenate their titles upfront and then use this info to do an update.
I ran this script on my laptop's sql server 2008 for around 540000 rows and it took around 40-45 seconds.
Let me know how this works out for you.
----Build Hierarchy
SELECT IDENTITY(INT, 1, 1) AS RowId, H1.hirNode As L1, H2.hirnode AS L2, H3.hirNode AS L3, H4.hirNode AS L4, H5.hirNode AS L5, H6.hirNode As L6, H7.hirNode AS L7,
H1.Title AS LT1,
H1.Title + '\\' + H2.Title AS LT2,
H1.Title + '\\' + H2.Title + '\\' + H3.Title AS LT3,
H1.Title + '\\' + H2.Title + '\\' + H3.Title + '\\' + H4.Title As LT4,
H1.Title + '\\' + H2.Title + '\\' + H3.Title + '\\' + H4.Title + '\\' + H5.Title AS LT5,
H1.Title + '\\' + H2.Title + '\\' + H3.Title + '\\' + H4.Title + '\\' + H5.Title + '\\' + H6.Title AS LT6,
H1.Title + '\\' + H2.Title + '\\' + H3.Title + '\\' + H4.Title + '\\' + H5.Title + '\\' + H6.Title + '\\' + H7.Title AS LT7
INTO #a
from dbo.HirNodes H1
LEFT JOIN dbo.HirNodes H2 ON (H1.hirNode = H2.hirNodeParent)
LEFT JOIN dbo.HirNodes H3 ON (H2.hirNode = H3.hirNodeParent)
LEFT JOIN dbo.HirNodes H4 ON (H3.hirNode = H4.hirNodeParent)
LEFT JOIN dbo.HirNodes H5 ON (H4.hirNode = H5.hirNodeParent)
LEFT JOIN dbo.HirNodes H6 ON (H5.hirNode = H6.hirNodeParent)
LEFT JOIN dbo.HirNodes H7 ON (H6.hirNode = H7.hirNodeParent)
--SELECT * FROM #a
----From the above, use only those rows where the complete hierarchy is represented
SELECT A1.*
INTO #b
FROM #a A1
LEFT JOIN #a A2 ON (A1.L1 = A2.L2)
WHERE A2.L1 IS NULL
--SELECT * FROM #b
SELECT DISTINCT A.Node, A.Title AS FullPath
INTO #c
FROM (
SELECT B1.L1 AS Node, B1.LT1 AS Title FROM #b B1
UNION
SELECT B2.L2 AS Node, B2.LT2 AS Title FROM #b B1 INNER JOIN #b B2 ON (B1.L1 = B2.L1)
UNION
SELECT B3.L3 AS Node, B3.LT3 AS Title FROM #b B2 INNER JOIN #b B3 ON (B2.L2 = B3.L2)
UNION
SELECT B4.L4 AS Node, B4.LT4 AS Title FROM #b B3 INNER JOIN #b B4 ON (B3.L3 = B4.L3)
UNION
SELECT B5.L5 As Node, B5.LT5 AS Title FROM #b B4 INNER JOIN #b B5 ON (B4.L4 = B5.L4)
UNION
SELECT B6.L6 AS Node, B6.LT6 AS Title FROM #b B5 INNER JOIN #b B6 ON (B5.L5 = B6.L5)
UNION
SELECT B7.L7 AS Node, B7.LT7 AS Title FROM #b B6 INNER JOIN #b B7 ON (B6.L6 = B7.L6)
) AS A
WHERE A.Node IS NOT NULL
/*
---- Use this to get a complete list of hierarchy for each node
SELECT A.RowId, A.R1, A.Node, A.Title AS FullPath, A.Lvl
INTO #c
FROM (
SELECT B1.RowId, B1.L1 AS R1, B1.L1 AS Node, B1.LT1 AS Title, 1 As Lvl FROM #b B1
UNION
SELECT B1.RowId, B1.L1 AS R1, B2.L2 AS Node, B2.LT2 AS Title, 2 As Lvl FROM #b B1 INNER JOIN #b B2 ON (B1.L1 = B2.L1)
UNION
SELECT B2.RowId, B2.L1 AS R1, B3.L3 AS Node, B3.LT3 AS Title, 3 As Lvl FROM #b B2 INNER JOIN #b B3 ON (B2.L2 = B3.L2)
UNION
SELECT B3.RowId, B3.L1 AS R1, B4.L4 AS Node, B4.LT4 AS Title, 4 As Lvl FROM #b B3 INNER JOIN #b B4 ON (B3.L3 = B4.L3)
UNION
SELECT B4.RowId, B4.L1 AS R1, B5.L5 As Node, B5.LT5 AS Title, 5 As Lvl FROM #b B4 INNER JOIN #b B5 ON (B4.L4 = B5.L4)
UNION
SELECT B5.RowId, B5.L1 AS R1, B6.L6 AS Node, B6.LT6 AS Title, 6 As Lvl FROM #b B5 INNER JOIN #b B6 ON (B5.L5 = B6.L5)
UNION
SELECT B6.RowId, B6.L1 AS R1, B7.L7 AS Node, B7.LT7 AS Title, 7 As Lvl FROM #b B6 INNER JOIN #b B7 ON (B6.L6 = B7.L6)
) AS A
WHERE A.Node IS NOT NULL
ORDER BY RowId, R1, Lvl
*/
--SELECT COUNT(*) FROM #c
--SELECT COUNT(*) FROM dbo.HirNodes
----Use Hierarchy
UPDATE H SET H.FullPath = C.FullPath
--SELECT H.hirNode, H.hirNodeParent, H.hirLevel, H.Title, H.FullPath, C.FullPath
FROM dbo.HirNodes H
INNER JOIN #c C ON (H.hirNode = C.Node)
DROP TABLE #a, #b
DROP TABLE #c
November 8, 2012 at 3:22 am
Thanks Sam.
It IS faster though its not returning the correct fullPath.. I am working on it.
I'll let you know how it goes
November 8, 2012 at 4:46 am
Hi Kavita,
Sorry the CTE didnt help on the client site due to having SQL 2000 installed, just file it away for future reference, and beat the client with a big stick until they upgrade to SQL2008 or later.:-D
_________________________________________________________________________
SSC Guide to Posting and Best Practices
November 8, 2012 at 5:11 am
I apologize Sam. Its working perfectly fine.
I ran it for 460K records and it took ~5mins which is really not bad.
Thank You!
Jason - I wish but it was a great learning experience. I am excited to use it in the future.
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply