Updating a BIG table - with a twist

  • Hello.

    I have a table with 14+ mil rows.

    I've appended a column to the table BACKUP_DATE (smalldatetime).

    All BACKUP_DATE's are NULL.

    I wish to update the first 14mil to say, 1/1/2004

    I wish to update all remaining records to say 2/2/2004

    The latter I can do UPDATE where BACKUP_DATE is null.

    How best can I go about the first update?

    No way to use the an index.

    You pick the poision (VB.NET, C# or T-SQL from QUERY ANALYZER).

     

    Thanks!

     

     

     

  • OMG!  This is probably NOT the best way of doing it but I was desperate

    I added an IDENTITY column to the table @ last check, Enterprise manager is still populating the column

     

    MUHHAHAHAHAHA!

     

    Sure hope it doesn't crash the server!!!

    Anyway, still looking for the optimum solution

  • last effort at typing not saved for some reason so here goes for a 2nd time...

     

    you need to manage the size of the transaction log...a WHILE loop should suffice.

    recordsupd = 1

    while recordsupd > 0

    begin trans

    set rowcount 10000

    update tablex

    set cola = y

    where colb = condx

    recordsupd = @@recordcount

    commit trans

    loop

     

     

    something like the above should control the size of your transactinn log....search here for other examples....adjust the 10000 to a number of your liking.

     

    also....it's best to avoid using EM to database design/data changes....the GUI is uncontrollable....using DDL/DML statements in QA gives more control.

  • Shouldn't the statement

    recordsupd = @@recordcount

    be

    recordsupd = @rowcount

    Pete

  • oops mistype, I meant

    recordsupd = @@rowcount

     

    DOH!

     

    Pete

  • You have a table WITHOUT an Identity column ? Is there a Primary key at all ?

    Query Analyzer with a WHERE clause would be the way to go. If your data is sorted (cause of your primary key) then pick the row you want to update till, and use it in your WHERE.. as in:

    UPDATE MyTable SET BACKUP_DATE = '1/1/2004' WHERE PrimaryKeyColumns < The14MillionRowPrimaryKeyValues

     

     


    Julian Kuiters
    juliankuiters.id.au

  • peter..."recordsupd = @@recordcount"...written in haste!!!

     

    wanted to introduce the principle....not concerned (this time) to solve 100% of the problem.

  • OMG!  I just lost my first reply as well!

    I had a clustered index on a field that is hit often and produces a sizable recordet.  From that sense, the index seemed appropriate.  However, I believe it interlaced the physical inserts so that the sequential insert order was destroyed

    Not sure why you mention the trans log as I'd just prefer to turn it off to speed performance.

    Thanks for the QA tip... I depend on EM because I'm a programmer 'trying' to be a SQL gurur but alas, my bytes show through

    I do appreciate the responses from all parties concerned!

    Does it look like I'm good to go?

    How would I go about processing all 14mil?

    Do I do an outer loop in addition to the one below?

     

    recordsupd = 1

    while recordsupd > 0

    begin trans

    set rowcount 10000

    update tablex

    set cola = y

    where colb = condx

    recordsupd = @@rowcount

    commit trans

    loop

     

     

  • code as is should be enough...

    basically the update keeps getting called until there are no more records to be updated....with at most 10000 records being updated in a batch.

    try it on a smallish file!!!...just to see the technique working....with a smaller number instead of the 10000....(a number i picked off the top of my head)

  • Yes, the plan is to process a small table first then to move on to bigger issues.

    However, Assuming I wanted to update a 20,000 row table and chose to break the update into 2 passes, would I require an outer loop to wrap the inner?

    Forgive my VB coding below as I am not versed enough in T-SQL...

    for recordsupd = 1 to 20000 step 10000

      rc = recordsupd + 10000

      while recordsupd > 0

        begin trans

          set rowcount rc

            update tablex

              set cola = y

              where colb = condx

          recordsupd = @@rowcount

        commit trans

      loop

    next

     

  • no...just the inner loop is required.

    try it...with a small number/small dataset!!!

     

    don't worry about the vb style....that's my environment too.

  • i think you are misunderstanding the nature of "set rowcount".

     

    it doesn't refer to specific records...it limits the number of records being processed by all subsequent select/update/delete statements on the connection until it's value is adjusted/cleared(to)

    see bol for details/examples.

  •  

    As long as your where clause restricts your result set to the first 14 million records, that will work.  Otherwise it will update all records in the table in whatever increment you have chosen.

    Here's another solution if you're not worried about ordering, it updates the table in 100,000 record batches.  If ordering is an issue and you can use a PK column, replace the where clause below with one like Julian's above with a slight modification 'where PKColumn <= 14Millionth_Record_Value and BACKUP_DATE is not null'.


    set rowcount 100000

    declare @i as int

    set @i = 0

    while @i < 140

    begin

      update TABLE_NAME

      set BACKUP_DATE = '1/1/2004'

      where BACKUP_DATE is null

      set @i = @i + 1

    end


    If you aren't worried about the T-Log, you could use Julian's solution.

    No matter which solution you choose, you should probably schedule to run during off peak hours to avoid any performance problems.

    Hope this helps.

    Jarret

  • Jaret:

    Your solution is wonderful but from the start, my requirement WAS to consider the update sequentially.

    I must update the FIRST or TOP 14 mil rows and then update what's left over using a geneal UPDATE statement.

    Andrew:

    As I recall when using a transaction log, records are pushed there BEFORE they hit the table to facilitate COMMIT/ROLLBACK.

    If the log is turned off, can I still use your solution?

    Julian:

    Your solution was what I had in mind as I attempted to add an INDENTY column to the table.  However, SQL SERVER 'choked' (undestandably) and after locking user out for perhaps a few hours, finaly returned control WITHOUT having added my IDENTITY column.

    This has been a wonderful learning experience for me and I'm thankful for all of your input.

    I believe that I must pursue an alternative course of action if I'm to beat my deadline.

    By this, I'm reloading the table WITH a BACKUP_DATE

    Of course, I left the old table in place in case a workable solution "popped up'.

    If I've missed it, my apologies.

    It seems that all solutions somehow depend on conditions to be 'just right' and not sure they are.

    Please however, correct me if wrong

     

     

  • The solution I gave will update 14 million records since it is in the while loop (140 iterations of 100,000 records).  However, the "first" 14 million records can not be guaranteed since there is no primary key.  It would all depend on how SQL would return the data to you.

    If I may ask, how are you to determine the "top" 14 million records?

    Good luck!!!

    Jarret

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply