Advice of Optimization Techniques for Mass Replace Command on a 1 million+ record DB.

Question

Post reply

Advice of Optimization Techniques for Mass Replace Command on a 1 million+ record DB.

huston.dunlap

SSCommitted

Points: 1551
More actions
October 27, 2009 at 5:15 pm

#216787

I currently have a 1 million+ record DB that I am having to do a large number of string replaces on. I should note that the columns I am working on are static, I don't need to worry about outside updates as I work.
Take this example:
I have a table with 3 rows.
Billy
Timmy
Tommy
I want to remove the ending on each one (this is just an example, most of my replaces are more dramatic). So my final table should look like this:
Bill
Tim
Tom
Currently what I am doing is creating a TEMP Table to store the records I want to work on (this part is mainly to ensure I am running everything properly. Then I run 3 update statements to do the replaces. The query looks like this:
CREATE TABLE #TEMP (FULLNAME nvarchar (50))
--We fill the table using a select statement, not relevant to this discussion.
UPDATE #TEMP SET FULLNAME = REPLACE(FULLNAME,'BILLY','BILL')
UPDATE #TEMP SET FULLNAME = REPLACE(FULLNAME,'TIMMY','TIM')
UPDATE #TEMP SET FULLNAME = REPLACE(FULLNAME,'TOMMY','TIM')
In the real DB, I am doing this on a million record DB, and doing 200+ replaces, which are taking a very long time. Are there some optimization techniques I can use to speed this up?

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

jagadeesanpv Ten Centuries Points: 1006 More actions · Answer 1

Instaead of using replace function why don't you use where clause

like

Update #temp

set fullname = 'bill'

where fullname = 'billy'

I think if you use this way you can avoid calling replace function on each row

ChiragNS One Orange Chip Points: 26137 More actions · Answer 2

When you say 200 replaces does it mean multiple updates/replaces in a column?

"Keep Trying"

steveb. SSC-Forever Points: 46733 More actions · Answer 3

A more efficent way to do this would be to create a lookup table with the oldnames and the new names in them;

CREATE TABLE [dbo].[NameLookup](

OldFullName VARCHAR (200),

NewFullName VARCHAR (200)

)

INSERT INTO dbo.NameLookup (

OldFullName,

NewFullName

)

SELECT 'Billy', 'Bill' UNION

SELECT 'Timmy', 'Tim' UNION

SELECT 'Tommy', 'Tom'

CREATE CLUSTERED INDEX ix_OldFullName ON dbo.NameLookup (OldFullName)

And then do the updates with an Inner Join betweem the two tables

UPDATE YourTable SET FullName = NameLookup.NewFullName

FROM YourTalbe INNER JOIN dbo.NameLookup

ON YourTable.FullName = NameLookup.OldFullName

huston.dunlap SSCommitted Points: 1551 More actions · Answer 4

jagadeesanpv (10/28/2009)
Instaead of using replace function why don't you use where clause
like
Update #temp
set fullname = 'bill'
where fullname = 'billy'
I think if you use this way you can avoid calling replace function on each row

The example I give may be too simple I apologize. In many cases I will be changing only part of a field, not the whole field, so I need to use replaces I believe.

huston.dunlap SSCommitted Points: 1551 More actions · Answer 5

I tried something new today to see if it would help.

I split my data into two tables, one where the length of the FULLNAME field was less than 450 characters (its unicode so it equals the 900 byte max for an index).

I then put a CLUSTERED INDEX on the Table with the Shorter Fullnames, and just ran the other table like before.

I have found my query is actually running slower than before. I know that Indexes slow down update statements, but I assumed my query would have so much more reading than writing that the index would give me a big performance boost.

Is my thinking off?

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 6

Yes... sometimes you can get yourself into some pretty deep Kimchi if you update what you are using as a criteria. In these types of situations, I'll also bring the data into a temp table along with the PK column(s). I'll do all the updates in an extra column that I form while doing a high speed SELECT/INTO. Then, it's a very simple update based on PK joins.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

huston.dunlap SSCommitted Points: 1551 More actions · Answer 7

Jeff Moden (10/28/2009)
I'll do all the updates in an extra column that I form while doing a high speed SELECT/INTO. Then, it's a very simple update based on PK joins.

Jeff I'm not quite sure what you mean by this but I thought of this idea and maybe its on the same basis as what you are saying.

What I can do is Create a new column like you mentioned, and insert the results of my replaces into it. Then remove my previous index and index the new column (the index creation so far has been extrememly quick, not a bottleneck at all). Then update my first column with the replaces from the 2nd. And basically just ping pong back and forth through my 200+ replaces.

That will probably give me a speed increase and ensure I'm not missing any replaces. I'll give this a try tomorrow!

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 8

huston.dunlap (10/28/2009)
Jeff Moden (10/28/2009)
I'll do all the updates in an extra column that I form while doing a high speed SELECT/INTO. Then, it's a very simple update based on PK joins.
Jeff I'm not quite sure what you mean by this but I just thought of something to try.
What I can do is Create a new column like you mentioned, and insert the results of my replaces into it. Then remove my previous index and index the new column (the index creation so far has been extrememly quick, not a bottleneck at all). Then update my first column with the replaces from the 2nd. And basically just ping pong back and forth through my 200+ replaces.
That will probably give me a speed increase and ensure I'm not missing any replaces. I'll give this a try tomorrow!

If you add a column to the original table, make sure you rebuild the Clustered Index to get rid of the masive page splits you're going to end up with.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)