October 27, 2009 at 5:15 pm
I currently have a 1 million+ record DB that I am having to do a large number of string replaces on. I should note that the columns I am working on are static, I don't need to worry about outside updates as I work.
Take this example:
I have a table with 3 rows.
Billy
Timmy
Tommy
I want to remove the ending on each one (this is just an example, most of my replaces are more dramatic). So my final table should look like this:
Bill
Tim
Tom
Currently what I am doing is creating a TEMP Table to store the records I want to work on (this part is mainly to ensure I am running everything properly. Then I run 3 update statements to do the replaces. The query looks like this:
CREATE TABLE #TEMP (FULLNAME nvarchar (50))
--We fill the table using a select statement, not relevant to this discussion.
UPDATE #TEMP SET FULLNAME = REPLACE(FULLNAME,'BILLY','BILL')
UPDATE #TEMP SET FULLNAME = REPLACE(FULLNAME,'TIMMY','TIM')
UPDATE #TEMP SET FULLNAME = REPLACE(FULLNAME,'TOMMY','TIM')
In the real DB, I am doing this on a million record DB, and doing 200+ replaces, which are taking a very long time. Are there some optimization techniques I can use to speed this up?
October 28, 2009 at 2:56 am
Instaead of using replace function why don't you use where clause
like
Update #temp
set fullname = 'bill'
where fullname = 'billy'
I think if you use this way you can avoid calling replace function on each row
October 28, 2009 at 2:59 am
When you say 200 replaces does it mean multiple updates/replaces in a column?
"Keep Trying"
October 28, 2009 at 5:38 am
A more efficent way to do this would be to create a lookup table with the oldnames and the new names in them;
CREATE TABLE [dbo].[NameLookup](
OldFullName VARCHAR (200),
NewFullName VARCHAR (200)
)
INSERT INTO dbo.NameLookup (
OldFullName,
NewFullName
)
SELECT 'Billy', 'Bill' UNION
SELECT 'Timmy', 'Tim' UNION
SELECT 'Tommy', 'Tom'
CREATE CLUSTERED INDEX ix_OldFullName ON dbo.NameLookup (OldFullName)
And then do the updates with an Inner Join betweem the two tables
UPDATE YourTable SET FullName = NameLookup.NewFullName
FROM YourTalbe INNER JOIN dbo.NameLookup
ON YourTable.FullName = NameLookup.OldFullName
October 28, 2009 at 7:23 am
jagadeesanpv (10/28/2009)
Instaead of using replace function why don't you use where clauselike
Update #temp
set fullname = 'bill'
where fullname = 'billy'
I think if you use this way you can avoid calling replace function on each row
The example I give may be too simple I apologize. In many cases I will be changing only part of a field, not the whole field, so I need to use replaces I believe.
October 28, 2009 at 9:32 am
I tried something new today to see if it would help.
I split my data into two tables, one where the length of the FULLNAME field was less than 450 characters (its unicode so it equals the 900 byte max for an index).
I then put a CLUSTERED INDEX on the Table with the Shorter Fullnames, and just ran the other table like before.
I have found my query is actually running slower than before. I know that Indexes slow down update statements, but I assumed my query would have so much more reading than writing that the index would give me a big performance boost.
Is my thinking off?
October 28, 2009 at 8:42 pm
Yes... sometimes you can get yourself into some pretty deep Kimchi if you update what you are using as a criteria. In these types of situations, I'll also bring the data into a temp table along with the PK column(s). I'll do all the updates in an extra column that I form while doing a high speed SELECT/INTO. Then, it's a very simple update based on PK joins.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 28, 2009 at 9:05 pm
Jeff Moden (10/28/2009)
I'll do all the updates in an extra column that I form while doing a high speed SELECT/INTO. Then, it's a very simple update based on PK joins.
Jeff I'm not quite sure what you mean by this but I thought of this idea and maybe its on the same basis as what you are saying.
What I can do is Create a new column like you mentioned, and insert the results of my replaces into it. Then remove my previous index and index the new column (the index creation so far has been extrememly quick, not a bottleneck at all). Then update my first column with the replaces from the 2nd. And basically just ping pong back and forth through my 200+ replaces.
That will probably give me a speed increase and ensure I'm not missing any replaces. I'll give this a try tomorrow!
October 28, 2009 at 9:09 pm
huston.dunlap (10/28/2009)
Jeff Moden (10/28/2009)
I'll do all the updates in an extra column that I form while doing a high speed SELECT/INTO. Then, it's a very simple update based on PK joins.Jeff I'm not quite sure what you mean by this but I just thought of something to try.
What I can do is Create a new column like you mentioned, and insert the results of my replaces into it. Then remove my previous index and index the new column (the index creation so far has been extrememly quick, not a bottleneck at all). Then update my first column with the replaces from the 2nd. And basically just ping pong back and forth through my 200+ replaces.
That will probably give me a speed increase and ensure I'm not missing any replaces. I'll give this a try tomorrow!
If you add a column to the original table, make sure you rebuild the Clustered Index to get rid of the masive page splits you're going to end up with.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply