updating a 25 million records in batches

  • i have 2 tables with this schema

    CREATE TABLE tableValues(

    [LASTENCRYPTIONDT] [datetime] NULL,

    [ENCRYPTIONID] [int] NULL,

    [NAME] [varchar](50) NULL

    )

    CREATE TABLE TableToupdate(

    [LASTENCRYPTIONDT] [datetime] NULL,

    [ENCRYPTIONID] [int] NULL,

    [NAME] [varchar](50) NULL

    )

    tablevalues has 10 million records

    tableToUpdate has 25 million records

    i want to update tableToUpdate in batches of 5000 per batch and set the lastenecryptionDT to null

    based on the the join to the tableValues using the column ENCRYPTIONID, and also output updated rows into another table.incase i would need to do a rollback

  • klineandking (11/10/2014)


    i have 2 tables with this schema

    CREATE TABLE tableValues(

    [LASTENCRYPTIONDT] [datetime] NULL,

    [ENCRYPTIONID] [int] NULL,

    [NAME] [varchar](50) NULL

    )

    CREATE TABLE TableToupdate(

    [LASTENCRYPTIONDT] [datetime] NULL,

    [ENCRYPTIONID] [int] NULL,

    [NAME] [varchar](50) NULL

    )

    tablevalues has 10 million records

    tableToUpdate has 25 million records

    i want to update tableToUpdate in batches of 5000 per batch and set the lastenecryptionDT to null

    based on the the join to the tableValues using the column ENCRYPTIONID, and also output updated rows into another table.incase i would need to do a rollback

    Are these tables both heaps? If so, I can't imagine that there is a good way of doing this.

    Also, you have told us what you want to do, but you have not asked a question, leaving us to guess.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • i have asked the question,assume both tables are a heap,how would you update the table in batches of 5000

  • klineandking (11/11/2014)


    i have asked the question,assume both tables are a heap,how would you update the table in batches of 5000

    Unless there's an index on the EncryptionID of both tables, the answer would be "[font="Arial Black"]very [/font]slowly". 😉

    I also seriously question setting the last update column to NULL. You are, after all, doing an update... perhaps an "upsert" where you do an update if the EncryptionID is present in both tables and an insert if it's in the source table and not in the target table.

    I also wouldn't do it just 5,000 rows at a time. That's 2000 updates and it's just going to be slower than you expect. Under the right conditions (which includes the correct indexing, of course), SQL Server can update a million rows in just several of seconds.

    With that in mind, we need to know what the indexes are on these two tables. The CREATE INDEX statements would be the best thing to post so we can tell if they're unique, what the fill factor is, etc, etc. We also need to know if we can build a control table (adding a control column and index to the 10 million row table would be even better) and whether or not this is just an update or if it's actually supposed to be an "upsert".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply