Updating 20M rows takes 5 hours

  • Dear Friends

    I need to update 20 Million rows as part of Migration. The script is:

    update productionaudit set

    assetname = a.assetname,

    assetdesc = a.description,

    assetruncostperhour = a.runcostperhour,

    assettype = a.assettype

    from productionaudit p, asset a where p.assetid = a.assetid

    This takes about 5 hours to run. Is there a way i can update 20M rows which will be quicker.

    Please also find attached Table definations.

    Many thanks in advance.

  • You could try this:

    * Disable all indexes on productionaudit before performing the update, then reenable them after the update.

    * Dont update all 20M rows in one single transaction - use several smaller transactions instead. You could split all the entries based on ranges of assetid for example.

    * If you are updating a large part of the table, you could create a new table with the correct content using SELECT INTO, then drop the old table, rename the new table, recreate all indexes and constraints.

    /SG

  • Since you're doing updates, you do have to worry about two searches there, the search within the table for the correct records to update and the select statement & join between that table & the other. I'd examine the execution plan. I suspect one or all of these is coming up with a table scan. You might need some different indexing?

    Have you checked for contention?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I've used Stefan's suggestions before successfully. Batch this into smaller transactions, say 10k at a time and it will probably go quicker.

  • Every system has a "tipping point" and it's rather unpredictable. Try updating just a million rows... it'll probably take something much less than a minute. Then try two million rows. That will probably take only twice as long. Then, try 3 million rows... my machine at home has a tipping point somewhere between 2 and 3 million rows for most large tables and instead of it only taking 3 times longer than a million rows (which should be much less than 3 minutes), the update now takes more than 20 minutes.

    As already pointed out, the "batch" method Stefen pointed out is the way to beat this problem but I don't believe you need to knock it down to only 10K rows.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you guys for your responses.

    I tried about 2.5 Mill rows in my test bed with the following method as suggested by stefan in the above post:

    1. used select * into productionaudit_temp from productionaudit where auditid >=1 and auditid<=2500000

    2. updated the temp table

    This took roughly 2 minutes.

    When i run the update on productionaudit table where all the indexes are, it takes about 20 minutes.

    So why is the update on a temp_table where there's no index takes 2 minutes and the one with the indexes takes 20 minutes. This only means that the indexes on the tables are wrong? is my assumption correct? or it needs different index for updates!

    I suppose i can drop the indexes, run the update 2Million rows at a time, and then recreate the indexes. But is this the correct way to do it.

  • I would guess one of the columns in your update statements are being used as part of an index. If so, this forces the index to be re-built after the update statement. And that is why one part of the suggested solution by @stefan was to drop the indexes, do the update and rebuild them.

    Rebuilding the indexes do not loose any information (I guess, there might be something I as a developer do not use but a DBA may) and will not take as much time when created as they are slowing your update.

  • ss-457805 (5/17/2010)


    Thank you guys for your responses.

    I tried about 2.5 Mill rows in my test bed with the following method as suggested by stefan in the above post:

    1. used select * into productionaudit_temp from productionaudit where auditid >=1 and auditid<=2500000

    2. updated the temp table

    This took roughly 2 minutes.

    When i run the update on productionaudit table where all the indexes are, it takes about 20 minutes.

    So why is the update on a temp_table where there's no index takes 2 minutes and the one with the indexes takes 20 minutes. This only means that the indexes on the tables are wrong? is my assumption correct? or it needs different index for updates!

    I suppose i can drop the indexes, run the update 2Million rows at a time, and then recreate the indexes. But is this the correct way to do it.

    The table in TEMPDB is in the simple mode and had no indexes. The table in the production database has indexes that are probably not in the same order as you updates and are suffering huge extent splits. That cause a huge amount of unnecessary reads and even if most of them ar logical reads, guess what that's going to do to performance? You also have all of that being captured and kept in the log.

    You'll need to disable the indexes to get any speed out of this. Either that or just let it "crawl" through because you've probably already spent more time trying to figure out what's going on than if you just let it run. Hopefully, this is a one off, right?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff for the explanation,

    We do this every 3-4 months for the customers. So would like to save time and do it properly. The developer who did the work has left the company.

    Please see attached the execution plan.

    PLease let me know if adding proper indexes can resolve this.

  • ss-457805 (5/17/2010)


    I suppose i can drop the indexes, run the update 2Million rows at a time, and then recreate the indexes. But is this the correct way to do it.

    Yes, this is the way to do it if you want maximum performance.

    Note that you dont really have to drop and recreate the indexes. It is enough to disable the indexes and then rebuild them after the update.

    Disable and rebuild can be easier to script because you do not have to repeat the index definition when you rebuild the indexes.

    /SG

  • ss-457805 (5/18/2010)


    PLease let me know if adding proper indexes can resolve this.

    Adding more indexes will not solve the problem.

  • Hi Stefan

    I just diabled all the non clustered indexes and ran the update. It took 8 minutes. It is still more than the one i tested using Select * into... which did not have any indexes at all.

    I can't disable clustered index because then table won't be available.

    Am i missing something?

  • ss-457805 (5/18/2010)


    Hi Stefan

    I just diabled all the non clustered indexes and ran the update. It took 8 minutes. It is still more than the one i tested using Select * into... which did not have any indexes at all.

    I can't disable clustered index because then table won't be available.

    Am i missing something?

    No this sounds reasonable.

    SELECT INTO is the fastest way to create a table in SQL server.

    And it is correct that you should only disable the nonclustered indexes.

    How many rows did you update in 8 minutes ?

    How many total rows do you have in the productionaudit table ?

  • Updated 2.6 Million rows.

    Total number of rows in productionaudit is 2.6 million. This is the test environment. In the live production the number of rows is >20Millions

  • ss-457805 (5/18/2010)


    Updated 2.6 Million rows.

    Total number of rows in productionaudit is 2.6 million. This is the test environment. In the live production the number of rows is >20Millions

    So, if I understand you correctly you are updating all rows in this table.

    In that case you can use the third method from my original post. It is probably the most efficient in your situation.

    1) Use SELECT INTO to create a new table with the correct content by joining the two involved tables.

    2) Drop the old table

    3) rename the new table to the old name

    4) Recreate all indexes and constraints

    Something like the script below:

    Please make sure to review the script thoroughly before running it since it will delete your original table

    Also make sure that the table creating part succeeds before continuing with the rest of the script.

    -- Create the new table

    SELECT p.auditId ,

    p.vFrom ,

    p.vTo ,

    p.assetId ,

    p.availability ,

    p.status ,

    p.opMode ,

    p.qtyIn ,

    p.qtyOut ,

    p.qtyProcessed ,

    p.qtyRejected ,

    p.countUnitId ,

    p.rate ,

    p.shiftId ,

    p.runId ,

    p.productId ,

    p.crewId ,

    p.crewSize ,

    p.stopEventRefId ,

    p.rejectEventRefId ,

    p.xnCode ,

    p.version ,

    p.shiftAuditId ,

    p.cellAssetId ,

    --p.assetname ,

    --p.assetdesc ,

    --p.assetRunCostPerHour ,

    --p.assettype ,

    a.assetname ,

    a.description AS assetdesc ,

    a.runCostPerHour AS assetRunCostPerHour,

    a.assettype ,

    p.countUnitDesc ,

    p.shiftName ,

    p.shiftDesc ,

    p.runname ,

    p.productName ,

    p.productDesc ,

    p.productCountUnitId ,

    p.productCountUnitDesc ,

    p.materialCost ,

    p.crewName ,

    p.crewCostPerHourPerHead ,

    p.cellAssetName ,

    p.cellAssetDesc ,

    p.lastStatusChangeAuditId ,

    INTOdbo.New

    FROM dbo.productionAudit p

    JOIN dbo.asset a ON p.assetId = a.assetId

    -- Drop the old table

    DROP TABLE dbo.productionAudit

    -- Rename the new table to the old name

    EXEC sp_rename 'dbo.New', 'dbo.productionAudit'

    -- Recreate all indexes

    -- Please verify that this script is correct! It is based on the table definition posted on 2010-05-17

    SET ANSI_PADDING OFF

    GO

    ALTER TABLE dbo.productionAudit ADD CONSTRAINT [pk_productionAudit] PRIMARY KEY CLUSTERED

    (

    [auditId] ASC

    )

    SET ANSI_PADDING OFF

    GO

    CREATE NONCLUSTERED INDEX [ind_productionAudit] ON [dbo].[productionAudit]

    (

    [assetId] ASC,

    [vFrom] ASC,

    [vTo] ASC,

    [xnCode] ASC

    )

    INCLUDE ( [auditId],

    [availability],

    [status],

    [opMode],

    [qtyIn],

    [qtyOut],

    [qtyProcessed],

    [qtyRejected],

    [countUnitId],

    [rate],

    [shiftId],

    [runId],

    [productId],

    [crewId],

    [crewSize],

    [stopEventRefId],

    [rejectEventRefId],

    [version],

    [shiftAuditId]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX [ind_productionAudit_assetCounts] ON [dbo].[productionAudit]

    (

    [assetId] ASC,

    [availability] ASC,

    [status] ASC,

    [vFrom] ASC,

    [vTo] ASC

    )

    INCLUDE ( [qtyIn],

    [qtyOut],

    [qtyRejected],

    [qtyProcessed]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX [ind_productionAudit_assetTime] ON [dbo].[productionAudit]

    (

    [assetId] ASC,

    [assetname] ASC,

    [availability] ASC,

    [status] ASC,

    [vFrom] ASC,

    [vTo] ASC

    )

    INCLUDE ( [rate]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX [ind_productionAudit_findCountRows] ON [dbo].[productionAudit]

    (

    [vFrom] ASC,

    [vTo] ASC,

    [cellAssetId] ASC,

    [xnCode] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX [ind_productionAudit_productionRun] ON [dbo].[productionAudit]

    (

    [runId] ASC,

    [assetId] ASC,

    [vFrom] ASC,

    [vTo] ASC,

    [availability] ASC,

    [status] ASC

    )

    INCLUDE ( [rate],

    [qtyIn],

    [qtyOut],

    [qtyRejected],

    [qtyProcessed]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX [ind_productionAudit_rejects] ON [dbo].[productionAudit]

    (

    [shiftAuditId] ASC,

    [qtyRejected] ASC,

    [auditId] ASC,

    [countUnitId] ASC,

    [rejectEventRefId] ASC,

    [vFrom] ASC,

    [assetId] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX [ind_productionAudit_shift] ON [dbo].[productionAudit]

    (

    [assetId] ASC,

    [productId] ASC,

    [auditId] ASC,

    [shiftAuditId] ASC,

    [availability] ASC,

    [status] ASC

    )

    INCLUDE ( [rate],

    [qtyIn],

    [qtyOut],

    [qtyRejected]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX [ind_productionAudit_shiftAndRun] ON [dbo].[productionAudit]

    (

    [shiftAuditId] ASC,

    [runId] ASC,

    [assetId] ASC

    )

    INCLUDE ( [rate]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX [ind_productionAudit_stoppages] ON [dbo].[productionAudit]

    (

    [runId] ASC,

    [stopEventRefId] ASC,

    [availability] ASC,

    [status] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX [ind_productionAudit_vTo] ON [dbo].[productionAudit]

    (

    [vTo] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

Viewing 15 posts - 1 through 15 (of 42 total)

You must be logged in to reply to this topic. Login to reply