Large Update Statement - millions of rows

  • Hi all

    A dev has given me a script which updates millions of rows and contains multiple batches. I will be splitting our the batches into their own transactions as I am worried about filling up the log, but has anyone read any decent articles on the best way to split large updates like these down into say updating rows of 50, 000 at a time rather than all 9 million?

    Thanks

  • Just checking: Do I understand you right that you attempt to switch to simple recovery model during this process? Splitting it in several smaller transactions does not have the intended effect in bulk logged or full recovery model.

    Further, it is difficult to guide you without having any code to work with, but you could change an update statement to include top, and run it in a loop. Something like:

    declare @affected int;

    set @affected = 1

    while @affected > 0

    begin

    update top 50000

    (...)

    where <condition matched by non-updated rows only>

    set @affected = @@rowcount;

    end



    Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

    Concatenating Row Values in Transact-SQL[/url]

  • Sorry why doesn't smaller transactions have a benefit in full recovery, surely smaller transactions followed by a log backup will surely allow me to keep the log in check during the update process?

    The code isn't just one big update, it is several wrapped in a single transaction.

  • If you perform log backups, this will of course allow SQL Server to reuse inactive portions (VLFs) of the log file. What I meant is that smaller transactions alone will only have the intended effect in simple recovery model.

    For the rest of your issue, it is difficult to tell exactly how you can attack the problem without having more information about the issue.



    Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

    Concatenating Row Values in Transact-SQL[/url]

  • Kwisatz78 (2/29/2012)


    The code isn't just one big update, it is several wrapped in a single transaction.

    If it's all one transaction, batching it will have no effect. The transaction will succeed as a unit or fail as a unit, and the space in the transaction log cannot be freed until that has happened.

    Are you able to post your code?

    John

  • John Mitchell-245523 (2/29/2012)


    Kwisatz78 (2/29/2012)


    The code isn't just one big update, it is several wrapped in a single transaction.

    If it's all one transaction, batching it will have no effect. The transaction will succeed as a unit or fail as a unit, and the space in the transaction log cannot be freed until that has happened.

    Are you able to post your code?

    John

    The "no effect" is focusing on how much is logged. You're going to log all changes in full and bulk-logged, so it will not changed how much is logged in any meaningful way. That said - it probably couldhave a sertious effect on performance.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (#4) (2/29/2012)


    The "no effect" is focusing on how much is logged. You're going to log all changes in full and bulk-logged, so it will not changed how much is logged in any meaningful way. That said - it probably couldhave a sertious effect on performance.

    You're going to log all changes in simple mode, as well. And if it's all done in one transaction, that can mean some serious log growth. That was my point.

    John

  • John is correct here, the logging is the same.

    Actually, the logs written are always the same, regardless of recovery model. What differs is how long the log records are retained, or when they are reused.

    If you are in simple mode, then commited transactions in the log can be reused (cleared) after a checkpoint.

    If you are in bulk logged/full, then those transactions remain until a log backup.

    Note that only with a commited transaction can you reused/clear the space in the log, so one transaction can be an issue.

  • okbangas (2/29/2012)


    declare @affected int;

    set @affected = 1

    while @affected > 0

    begin

    update top 50000

    (...)

    where <condition matched by non-updated rows only>

    set @affected = @@rowcount;

    end

    Placing the transaction inside the WHILE loop would ensure that each batch of updates are either committed or rolled back (insert error handling there), then adding a slight delay would give some needed time for the logging and the transaction to complete. So something like:

    declare @affected int;

    set @affected = 1

    while @affected > 0

    begin

    begin transaction

    update top 50000

    (...)

    where <condition matched by non-updated rows only>

    if @@error <> 0

    ...

    else..

    ...commit (etc)

    set @affected = @@rowcount;

    WAITFOR DELAY '00:00:00:500'

    end

    Wouldn't this commit the smaller batches in a case like this?

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • You could do that, yes, but you're only explicitly doing something that would be implicitly done anyway. The original poster says that the whole thing is done in one transaction, so I'm waiting for code so that we can analyse and offer suggestions.

    John

  • Hello all - The code is as follows, plus the code I have wrapped in transactions.

    SET XACT_ABORT on

    BEGIN TRANSACTION

    update fastquote.dbo.FQ4_Project_Lines set RepresentativeQuoteId = 0, RepresentativeQuantity = 0, RepresentativeUnitPrice = 0

    update fastquote.dbo.FQ4_Project_Catalog set CatalogTotalValue = 0, RepresentativeQuoteId = 0

    select pl.projectID,pl.productSku,pl.catalogId,

    pl.RepresentativeQuoteId,pl.RepresentativeQuantity,pl.RepresentativeUnitPrice,

    ql.quoteid, ql.quantity as RepQty, (case when ql.unitPriceStd = 9999.99 then ql.unitpriceGross else ql.unitPriceStd end) as RepPrice,

    (

    case

    when (qh.ProjectPartyType & 1 = 1) then 1

    when (qh.CustomerGrade like '%C2' or qh.CustomerGrade like '%C3' or qh.CustomerGrade like '%C1') then 2

    when (qh.CustomerGrade = 'MG1' or qh.CustomerGrade = 'MG4') then 3

    else 5

    end

    ) as Ranking

    into #tempProjectLines

    from fastquote.dbo.FQ4_Project_Lines pl

    inner join fastquote.dbo.FQ4_QuoteInfo_vw qh on qh.projectID = pl.projectID

    inner join fastquote.dbo.FQ4_Quote_Lines ql on ql.quoteid = qh.quoteID and ql.productSku = pl.productSku and ql.catalogId = pl.catalogId and ql.priceOnly = 0 and (ql.unitPriceGross <> 9999.99 or ql.unitPriceStd <> 9999.99)

    update pl

    set pl.RepresentativeQuoteId = ql.quoteid,pl.RepresentativeQuantity = ql.Quantity, pl.RepresentativeUnitPrice = (case when ql.unitPriceStd = 9999.99 then ql.unitpriceGross else ql.unitPriceStd end)

    from fastquote.dbo.FQ4_Project_Lines pl

    inner join fastquote.dbo.FQ4_QuoteInfo_vw qh on qh.projectID = pl.projectID

    inner join fastquote.dbo.FQ4_Quote_Lines ql on ql.quoteid = qh.quoteID and ql.productSku = pl.productSku and ql.catalogId = pl.catalogId and ql.priceOnly = 0 and (ql.unitPriceGross <> 9999.99 or ql.unitPriceStd <> 9999.99)

    and ql.quoteid = (select top 1 quoteId

    from #tempProjectLines

    where projectID = pl.projectID and catalogId = pl.catalogid

    group by projectId, catalogid, quoteid, ranking

    having SUM(RepPrice * RepQty) > 0

    order by projectid,catalogid,Ranking,SUM(RepPrice * RepQty) desc)

    drop table #tempProjectLines

    update ph

    set RepresentativeTotalValue =

    isNull((select SUM(pl.representativeUnitPrice * pl.RepresentativeQuantity) from fastquote.dbo.FQ4_Project_Lines pl where pl.projectID = ph.projectid),0)

    from fastquote.dbo.FQ4_Project_Header ph

    update pc

    set pc.CatalogTotalValue = p.CatalogTotalValue, pc.RepresentativeQuoteId = p.RepresentativeQuoteId

    from fastquote.dbo.fq4_Project_Catalog pc

    inner join (

    select pl.projectid, pl.catalogid, SUM(pl.RepresentativeUnitPrice * pl.RepresentativeQuantity) as CatalogTotalValue, MAX(RepresentativeQuoteId) as RepresentativeQuoteId

    from fastquote.dbo.fq4_project_Lines pl

    group by pl.projectid, pl.catalogid)

    as p on pc.projectid = p.projectid and pc.catalogid = p.catalogid

    COMMIT TRANSACTION

    Here is what I have altered it to:

    USE [Fastquote]

    GO

    SET XACT_ABORT ON

    GO

    BEGIN TRANSACTION

    update fastquote.dbo.FQ4_Project_Lines set RepresentativeQuoteId = 0, RepresentativeQuantity = 0, RepresentativeUnitPrice = 0

    COMMIT TRANSACTION

    GO

    SET XACT_ABORT ON

    GO

    BEGIN TRANSACTION

    update fastquote.dbo.FQ4_Project_Catalog set CatalogTotalValue = 0, RepresentativeQuoteId = 0

    COMMIT TRANSACTION

    GO

    SET XACT_ABORT ON

    GO

    BEGIN TRANSACTION

    select pl.projectID,pl.productSku,pl.catalogId,

    pl.RepresentativeQuoteId,pl.RepresentativeQuantity,pl.RepresentativeUnitPrice,

    ql.quoteid, ql.quantity as RepQty, (case when ql.unitPriceStd = 9999.99 then ql.unitpriceGross else ql.unitPriceStd end) as RepPrice,

    (

    case

    when (qh.ProjectPartyType & 1 = 1) then 1

    when (qh.CustomerGrade like '%C2' or qh.CustomerGrade like '%C3' or qh.CustomerGrade like '%C1') then 2

    when (qh.CustomerGrade = 'MG1' or qh.CustomerGrade = 'MG4') then 3

    --when qh.CustomerGrade = 'ZC2' then 4

    else 5

    end

    ) as Ranking

    into #tempProjectLines

    from fastquote.dbo.FQ4_Project_Lines pl

    inner join fastquote.dbo.FQ4_QuoteInfo_vw qh on qh.projectID = pl.projectID

    inner join fastquote.dbo.FQ4_Quote_Lines ql on ql.quoteid = qh.quoteID and ql.productSku = pl.productSku and ql.catalogId = pl.catalogId and ql.priceOnly = 0 and (ql.unitPriceGross <> 9999.99 or ql.unitPriceStd <> 9999.99)

    update pl

    set pl.RepresentativeQuoteId = ql.quoteid,pl.RepresentativeQuantity = ql.Quantity, pl.RepresentativeUnitPrice = (case when ql.unitPriceStd = 9999.99 then ql.unitpriceGross else ql.unitPriceStd end)

    from fastquote.dbo.FQ4_Project_Lines pl

    inner join fastquote.dbo.FQ4_QuoteInfo_vw qh on qh.projectID = pl.projectID

    inner join fastquote.dbo.FQ4_Quote_Lines ql on ql.quoteid = qh.quoteID and ql.productSku = pl.productSku and ql.catalogId = pl.catalogId and ql.priceOnly = 0 and (ql.unitPriceGross <> 9999.99 or ql.unitPriceStd <> 9999.99)

    and ql.quoteid = (select top 1 quoteId

    from #tempProjectLines

    where projectID = pl.projectID and catalogId = pl.catalogid

    group by projectId, catalogid, quoteid, ranking

    having SUM(RepPrice * RepQty) > 0

    order by projectid,catalogid,Ranking,SUM(RepPrice * RepQty) desc)

    drop table #tempProjectLines

    COMMIT TRANSACTION

    GO

    SET XACT_ABORT ON

    GO

    BEGIN TRANSACTION

    update ph

    set RepresentativeTotalValue =

    isNull((select SUM(pl.representativeUnitPrice * pl.RepresentativeQuantity) from fastquote.dbo.FQ4_Project_Lines pl where pl.projectID = ph.projectid),0)

    from fastquote.dbo.FQ4_Project_Header ph

    COMMIT TRANSACTION

    GO

    SET XACT_ABORT ON

    GO

    BEGIN TRANSACTION

    update pc

    set pc.CatalogTotalValue = p.CatalogTotalValue, pc.RepresentativeQuoteId = p.RepresentativeQuoteId

    from fastquote.dbo.fq4_Project_Catalog pc

    inner join (

    select pl.projectid, pl.catalogid, SUM(pl.RepresentativeUnitPrice * pl.RepresentativeQuantity) as CatalogTotalValue, MAX(RepresentativeQuoteId) as RepresentativeQuoteId

    from fastquote.dbo.fq4_project_Lines pl

    group by pl.projectid, pl.catalogid)

    as p on pc.projectid = p.projectid and pc.catalogid = p.catalogid

    COMMIT TRANSACTION

    GO

  • You still have one transaction. The nested transactions I see in the code don't buy you anything.

  • You may want to read this article: Deleting Large Number of Records[/url].

  • How is that? Have I just explicitly stated what it was implicitly doing? None of them are nested

  • John Mitchell-245523 (2/29/2012)


    Matt Miller (#4) (2/29/2012)


    The "no effect" is focusing on how much is logged. You're going to log all changes in full and bulk-logged, so it will not changed how much is logged in any meaningful way. That said - it probably couldhave a sertious effect on performance.

    You're going to log all changes in simple mode, as well. And if it's all done in one transaction, that can mean some serious log growth. That was my point.

    John

    Correct, you do log everything, but the truncate happens immediately after commit in simple mode so small batches roughly means no log growth in that scenario.

    So the tradeoff ultimately comes down to:

    - leave the DB in Full logging (recommended), which means your log file might grow by quite a bit (even with multiple batches), or

    - keep your log files small by switching recovery modes and committing in small batches, which then screws up your log chain.

    To me #1 is manageable (you could run lots of log backups during the updates to keep log growth under control), and #2 is an unacceptable risk, but log space does seem to come up over and over again as a big concern.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply