Runaway UPDATE on large table

  • From Query Analyzer I attempted to run an update to a 13GB unindexed table (about 20 million rows) in a database sized at 30GB containing only the table in question and a 400-row joining table. After 36 hours running I cancelled the query. Here's the query code:

    UPDATE P

    SET P.Sys = W.Sys

    FROM PAC P INNER JOIN WORK W ON (P.CID = W.CID)

    WHERE P.Sys IS NULL

    Specifics:

    WIN2K Standard Server

    SQL 2K Standard

    4GB memory

    Promise ATA RAID 10

    Dual Pentium 1.8 processors

    PerfMon samples:

    Processor: between 1 and 3% utilized

    Avg. Disk sec/transfer: between .008 and .800

    Avg. Disk queue length: between 1 and 140

    Avg. Disk sec/write: between .005 and .800

    Avg. Disk sec/read: between .007 and 1.500

    By these statistics I infer that my disk subsystem is woefully inadequate, but I'm also wondering if memory could be a problem since memory usage is at about 1.9GB during the attempted UPDATE.

    I'm pretty green around the ears when it comes to tables of this size, but a very similar update query ran successfully in about 40 minutes when the table (despite the lack of indexing) was about 1GB smaller.

    Any ideas about how I can resolve this problem without spending a ton of money on a big box, SQL Enterprise and 64GB of memory? Will a SCSI subsystem be enough? If so, should I use fiber or would Ultra160 be enough?

  • The problem with such an update is the log that has to be produced to ensure consistency, and the locks put in place to do the update.

    Sugestions, put a exclusive table lock hint.

    Actually do a select into into a new table, truncate and drop the old one. If you are updating all rows this is by far the quickest mechanism.

    Do the updates in batches using SET ROWCOUNT however this is only possible if you can identify the rows that have been updated and excluded then from the update. Ensure you checkpoint after each batch

    I can't emphasise how much quicker it is to do the insert and drop option.

    Simon Sabin

    Co-author of SQL Server 2000 XML Distilled

    http://www.amazon.co.uk/exec/obidos/ASIN/1904347088


    Simon Sabin
    SQL Server MVP

    http://sqlblogcasts.com/blogs/simons

  • Oh and set the recovery model to simple.

    Simon Sabin

    Co-author of SQL Server 2000 XML Distilled

    http://www.amazon.co.uk/exec/obidos/ASIN/1904347088


    Simon Sabin
    SQL Server MVP

    http://sqlblogcasts.com/blogs/simons

  • Thanks very much Simon. I do have recovery mode set to Simple and there are no other users on the system so I don't think I'd gain anything by an exclusive table lock. However, your recommendation about INSERTing into a new table is VERY helpful. I will try that.

    Carl

    quote:


    Oh and set the recovery model to simple.

    Simon Sabin

    Co-author of SQL Server 2000 XML Distilled

    http://www.amazon.co.uk/exec/obidos/ASIN/1904347088


  • I have seen that locks cost memory and time.

    That is the reason that one table lock is better than millions of row and table locks.

  • You might also gain a lot by indexing the join columns, even if only long enough for the operation.

    Andy

    http://www.sqlservercentral.com/columnists/awarren/

  • A queue length of 140 is scary.

    Questions

    What percentage of the table will be updated?

    Is cid on work the primary key?

    Simon Sabin

    Co-author of SQL Server 2000 XML Distilled

    http://www.amazon.co.uk/exec/obidos/ASIN/1904347088


    Simon Sabin
    SQL Server MVP

    http://sqlblogcasts.com/blogs/simons

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply