UPDATE Query Question

  • I am attempting to update a table (21 million rows) with information from a much smaller table (250k rows).

    When I run my query I end up with the process taking an inordinate amount of time, which leads me to believe that what I am doing is EXCEPTIONALLY inefficient.

    Below is a sample of the update query that I wrote, and what I would like to know is if anyone can shed some light on how I might be better able to do it.

    Information on the tables:

    Q12004 is Indexed on Column COL035(contains a phone number) (Char (10)

    ADNH has PK on Column Phone (Char (10)

    Query :

    update q12004

    set q12004.batch = adnh.batch

    from adnh, q12004

    where adnh.phone = q12004.col035

    This "runs" but is ridiculous in the execution time, anyone know of a better method to do this?

  • Run your query through the Estimated Execution Plan checker in Query Analyzer and then run this:

    update q12004

    set q12004.batch = adnh.batch

    from adnh

    join q12004

    on adnh.phone = q12004.col035

    See if there is any difference in using a join/on instead of the join/where.

    -SQLBill

  • Almost no difference at all that I could see.

    I'm trying to run it using the code that SQLBill has written and it doesn't seem to be exceptionally different than what I had before.

    I'm making the assumption that if we do a join for adnh (250k records) and select ONLY those records which match in Q12004 (21 mill records) that so long as the columns are indexed and match the update should be reasonably quick.

    Is there possibly a different way to write this particular query to force the joins in a more efficient manner?

  • Should be, but look at the execution plan and see what it has decided the query as written would use for best performance.

  • You never mentioned how long it was taking for the update...

    My experience for these two table type of single column update joins says you'll get about 4 million rows per minute on a "quiet" server if both tables have the correct indexes.  Longer if the server is "busy".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Initial update time was 21 hours.  On a screaming machine which is why I was under the belief that I really didn't have something right.  (p4 2.8E/2 GB RAM/1 TB storage in SATA drives)  And it does nothing else but service my requests.

    I'd be ecstatic to have this thing done in under 10 minutes.

  • Test how long it takes to do the join without the update:

    SELECT * from adnh

    join q12004

    on adnh.phone = q12004.col035

    If that's efficient, check to see if q12004.batch column is contained in an index.  If it is and you can remove it (without affecting your production, apps, etc.) do that and recreate that index after the update.  If the above join is inefficient if you make changes to tune it then it should fix your update performance?

    Definitely only the matched rows (250K or less from the inner join) will get updated. 

    Good luck!

    Jeff

  • Check the defragmentation of the table using of DBCC SHOWCONTIG.

    In Scan Density [Best Count:Actual Count], Best Count and Actual count should be almost equal...

    HTH...

     


    Regards,
    Sachin Dedhia

  • A simple man's solution that will run faster that I use if your indexes are not speeding up the process is to do a left join as follows and create a new table:

    select q12004.*, coalesce(adnh.batch, null) as batch

    into new_table_name

    from q12004

    left join adnh on q12004.col035 = adnh.phone

    from adnh, q12004

    where adnh.phone = q12004.col035

     

  • Indexes a must. Sometimes filtering data will speed things up and order of tables first. Col035 and phone should both be indexed. You can add a query hint with the index name for each table and that would really speed it up.  

    update q12004

    set q12004.batch = adnh.batch

    from q12004 inner join adnh

    on  q12004.col035 = adnh.phone

    Where q12004.col034 in (select phone from adnh where q12004.col035 = adnh.phone)

  • Is your table a heap?

    Could you post the extimated execution plan?

    Have you monitored Sql Server activity, including the Freespace scans/sec counter from the SQL Server:Access Methods?


    Salvor

  • What is the data type of the batch column?

  • Please confirm your indexes on both tables:

    Does each index contain just a phone number or do the indexes contain mutiple columns and if so, what are the columns?

    Are the indexes both primary keys?

    Are both indexes are clustered?

    How mush faster is the query without the update?

    Since these phone number columns contain just numbers, why not make them integers? That will be faster than dealing with varchars.

    Eaton

     

     

  • Please run in query analyzer the following:

    set showplan_text on

    go

    update q12004

    set q12004.batch = adnh.batch

    from adnh

    join q12004

    on adnh.phone = q12004.col035

    go

     

    And post the output back here. This will show us your execution plan and will help us to understand what it is doing.

    Also, do the following

    select count(*)

    from adnh

    join q12004

    on adnh.phone = q12004.col035

    and let us know the result and how many indexes are on the table being updated. In some cases (usually rare) if ou are updating a lot of rows then you may be suffering a performance hit because of a larger number of indexes and page splits in them.

    In some case it may make more since to drop some indexes before doing the update then add them back.

    Sometimes you may need to batch the query until complete like this

    Declare @rows int

    SET @rows = 1

    set rowcount 10000

    WHILE @rows < 0

    BEGIN

     update q12004

     set q12004.batch = adnh.batch

     from adnh

     inner join q12004

     on adnh.phone = q12004.col035

     where q12004.batch != adnh.batch

     set @rows = @@rowcount

    END

    Note: If the same adnh.phone exists multiple times with different adnh.batch values then you cannot use that method. This method helps with log file growth do to logged changes.

    If you do have the same adnh.phone exists multiple times with different adnh.batch values and there is a way to determine the last occurance then you may want to subquery first so you decrease the data changes to the same column.

    EX.

    update q12004

    set q12004.batch = adnh.batch

    from adnh

    inner join q12004

    on adnh.phone = q12004.col035

    inner join

    (select X.batch, max(X.oper_date) maxdate from adnh X group by X.batch) Y

    ON

    adnh.oper_date = Y.maxdate

    Not sure without more details what may be best.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply