Advice of Optimization Techniques for Mass Replace Command on a 1 million+ record DB.

  • I currently have a 1 million+ record DB that I am having to do a large number of string replaces on. I should note that the columns I am working on are static, I don't need to worry about outside updates as I work.

    Take this example:

    I have a table with 3 rows.

    Billy

    Timmy

    Tommy

    I want to remove the ending on each one (this is just an example, most of my replaces are more dramatic). So my final table should look like this:

    Bill

    Tim

    Tom

    Currently what I am doing is creating a TEMP Table to store the records I want to work on (this part is mainly to ensure I am running everything properly. Then I run 3 update statements to do the replaces. The query looks like this:

    CREATE TABLE #TEMP (FULLNAME nvarchar (50))

    --We fill the table using a select statement, not relevant to this discussion.

    UPDATE #TEMP SET FULLNAME = REPLACE(FULLNAME,'BILLY','BILL')

    UPDATE #TEMP SET FULLNAME = REPLACE(FULLNAME,'TIMMY','TIM')

    UPDATE #TEMP SET FULLNAME = REPLACE(FULLNAME,'TOMMY','TIM')

    In the real DB, I am doing this on a million record DB, and doing 200+ replaces, which are taking a very long time. Are there some optimization techniques I can use to speed this up?

  • Instaead of using replace function why don't you use where clause

    like

    Update #temp

    set fullname = 'bill'

    where fullname = 'billy'

    I think if you use this way you can avoid calling replace function on each row

  • When you say 200 replaces does it mean multiple updates/replaces in a column?

    "Keep Trying"

  • A more efficent way to do this would be to create a lookup table with the oldnames and the new names in them;

    CREATE TABLE [dbo].[NameLookup](

    OldFullName VARCHAR (200),

    NewFullName VARCHAR (200)

    )

    INSERT INTO dbo.NameLookup (

    OldFullName,

    NewFullName

    )

    SELECT 'Billy', 'Bill' UNION

    SELECT 'Timmy', 'Tim' UNION

    SELECT 'Tommy', 'Tom'

    CREATE CLUSTERED INDEX ix_OldFullName ON dbo.NameLookup (OldFullName)

    And then do the updates with an Inner Join betweem the two tables

    UPDATE YourTable SET FullName = NameLookup.NewFullName

    FROM YourTalbe INNER JOIN dbo.NameLookup

    ON YourTable.FullName = NameLookup.OldFullName

  • jagadeesanpv (10/28/2009)


    Instaead of using replace function why don't you use where clause

    like

    Update #temp

    set fullname = 'bill'

    where fullname = 'billy'

    I think if you use this way you can avoid calling replace function on each row

    The example I give may be too simple I apologize. In many cases I will be changing only part of a field, not the whole field, so I need to use replaces I believe.

  • I tried something new today to see if it would help.

    I split my data into two tables, one where the length of the FULLNAME field was less than 450 characters (its unicode so it equals the 900 byte max for an index).

    I then put a CLUSTERED INDEX on the Table with the Shorter Fullnames, and just ran the other table like before.

    I have found my query is actually running slower than before. I know that Indexes slow down update statements, but I assumed my query would have so much more reading than writing that the index would give me a big performance boost.

    Is my thinking off?

  • Yes... sometimes you can get yourself into some pretty deep Kimchi if you update what you are using as a criteria. In these types of situations, I'll also bring the data into a temp table along with the PK column(s). I'll do all the updates in an extra column that I form while doing a high speed SELECT/INTO. Then, it's a very simple update based on PK joins.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/28/2009)


    I'll do all the updates in an extra column that I form while doing a high speed SELECT/INTO. Then, it's a very simple update based on PK joins.

    Jeff I'm not quite sure what you mean by this but I thought of this idea and maybe its on the same basis as what you are saying.

    What I can do is Create a new column like you mentioned, and insert the results of my replaces into it. Then remove my previous index and index the new column (the index creation so far has been extrememly quick, not a bottleneck at all). Then update my first column with the replaces from the 2nd. And basically just ping pong back and forth through my 200+ replaces.

    That will probably give me a speed increase and ensure I'm not missing any replaces. I'll give this a try tomorrow!

  • huston.dunlap (10/28/2009)


    Jeff Moden (10/28/2009)


    I'll do all the updates in an extra column that I form while doing a high speed SELECT/INTO. Then, it's a very simple update based on PK joins.

    Jeff I'm not quite sure what you mean by this but I just thought of something to try.

    What I can do is Create a new column like you mentioned, and insert the results of my replaces into it. Then remove my previous index and index the new column (the index creation so far has been extrememly quick, not a bottleneck at all). Then update my first column with the replaces from the 2nd. And basically just ping pong back and forth through my 200+ replaces.

    That will probably give me a speed increase and ensure I'm not missing any replaces. I'll give this a try tomorrow!

    If you add a column to the original table, make sure you rebuild the Clustered Index to get rid of the masive page splits you're going to end up with.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply