UPDATE from two tables

  • I am currently trying to update one (30char) column in a table (11.5mil rows, 40col) with an inner join of another table (ie. update county based on person's zipcode by looking at the zipcode table).  It has been running for 1.5 hours and I'm not sure if it should be taking this long.   There is not a primary or foreign key involved.

    UPDATE    Standard2

    SET        Current_County = ZipMkt06.County2

    FROM         Standard2 INNER JOIN

                          zipmkt06 ON standard2.patzip = zip

    Should I expect 11.5mil rows to take this long to update based on what I'm trying to do or is something wrong? Thanks for your help.

  • I suppose it could take that long depending on your hardware, how busy your db is and if there are any indexes involved...  Also, are both of your zipcode columns the same?  char(5) or varchar(9)?  If the engine is having to convert values it may take some additional time as well...

     

    To help us help you read this[/url]For better help with performance problems please read this[/url]

  • since you are updating standard2, you should not have joined it: stop the query and do it this way;

    UPDATE    Standard2

    SET        Standard2.Current_County = ZipMkt06.County2

    FROM        zipmkt06 WHERE standard2.patzip = zipmkt06.zip

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • I'm trying that now.  Any thoughts on how long it could potentially run?

  • to LauraS

    "I suppose it could take that long depending on your hardware, how busy your db is and if there are any indexes involved"

    If you can answer some of the above questions, you may get some progress on this.  A start would be to post ALL the DDL of the tables involved.

     

    to Lowell

    Currently your code is equivalent to LauraS's, just written without an explicit join statement (the execution plan should be the same).  However LauraS's would be a better style to practice as it also supports the other types of joins, without changing style...particularly the "left join" option.  I also believe that 'in-the-long-term', LauraS's is the one that the ANSI standard is promoting and that M$ is not proposing to make obselete.

  • I appreciate everyone's assistance in this.  I am fairly new to SQL so I am learning how the various queries operate.

    I ended up creating a new table with all the columns from the Standard table and creating the Current_County column based on the zipmkt table.  For some reason, creating the table from scratch completed in less than 15 minutes whereas update query ran for 16 hours until I stopped it.  It really did not make sense to me how I could run a Select statement requesting all that information which would run in a few seconds, but doing the same query as an update just kept spinning.

    I'm not sure how busy the db is, because the server is maintained by a different department.

  • Oh no, no, no... not true... check Books Online... every single example where a join is involved has the target of the update in the FROM clause, as well.  In fact, we've had several problems on our systems where they used the form of update you suggest... pegged a 4 CPU box for 2 hours on what should have been a 2 minute update.  Once the form of the UPDATE was corrected, the update only took 2 minutes.  And, it was repeatable in our case... when we changed the form back, it would again peg the CPU's.

    Both forms will work most of the time... the form where the target table is included in the FROM clause in the presence of a join will work all the time.

    The fact that there is no primary key on the table could certainly be the cause of the slowness.  Another factor may be that if the log file is too small, you could be waiting for growth.  Same with TempDB... most folks just make it way too small.

    And, there's a "tipping point" on every box... for example, if I update a million rows on a particular table on my box at home, it has no problem and does the update in short order.  If I do an update on 10 million rows of the same table, it works as expected by performing in about 10 times as much time as the million rows... but if I try to update 30 million rows, instead of it only taking 30 times as long, it took a whole weekend before I killed it.

    To find out if you've reached that "tipping point", try a smaller update, then a larger, etc.  If you really need to UPDATE the whole table and it's beyond what you figure as a tipping point, break the update into several smaller pieces of, say, a million rows per update (most systems have no problems with a million rows).  The CPU will love you...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It might also help to add a WHERE clause if the majority of the rows don't need to be updated because the Current_County is already correct.

    UPDATE Standard2
    SET
     Current_County = ZipMkt06.County2
    FROM
     Standard2
     INNER JOIN
     zipmkt06
     ON standard2.patzip = zipmkt06.zip
    where
     Standard2.Current_County is null or
     Standard2.Current_County <> ZipMkt06.County2
     

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply