Updating a BIG table - with a twist

  • For minimul logging you could BCP out the data using a view which would put the date on most of the records and then BCP the data back in.

  • I loaded 14 million rows and then loaded an additional 1.5 million rows.

    TOP would better be classified as FIRST

    A late breaking requirement was to differentiate the two loads

  • BCP?

  • How can you tell the difference between the first 14 million records and the last 1.5 million records?  Is there some field on the record you can check that could possibly identify which upload the record came from?

  • That was kinda my point in posting originally.

    Assuming all records went into table sequentually as written, the first 14 mil would have been the first batch and anything later would have been the second batch.

    Using a cursor based solution (I thought), I was going to, row by row, update the BACKUP_DATE field until a counter hit 14mil.

    If I was using an ADO recordset (or the like), I'd do something of that nature.

    However I thought doing external code would have been slow.

    Also, was afraid if opening a recordset that my code would attempt to drop all the rows on my PC brute force!

    So I turned to the guru's for help

     

  • Just because they were entered first doesn't mean they will be returned from the SQL server in that order.  Since there is no clustered index, you can not guarantee that the data is stored in any order, chronological or otherwise.  It is stored however the engine decided to write the data to the pages.

    Yes, you would have to assume that the data was stored sequentially as entered in order to use that kind of solution, but that may not be the case.

    Jarret

  • I believe that I read something like that just recently while researching the problem: that the records may not have inserted as I wrote them.

    However, I DID use a clustered index when I wrote the records.

    But, what would have caused both the first and 2nd batch of data to interlace between themselves as the cluster would have physically inserted the 2nd recordset into the first!

    The only way that would have worked is if I'd have used an IDENTITY but if I had, I'd not have needed a CLUSTERED index, at least for my original solution.  Of course, would have wanted a solid index on the IDENTITY to have made the update go smooth.

    So, I've decided to reload the table with a BACKUP_DATE.

    That raises a few interesting issues.

    Eventually the table will have 15mil rows.

    The only field that generally will be searched is the first called SYMBOL.

    There will be a lot of redundancy in the SYMBOL column.

    But the recordsets returned will expect a lot of rows returned.

    For the initial load of the table, I turned off logging and removed the index (assuming both would help the table load faster).

    That means of course that the records won't physically sit together.

    If I create a clustered index on the table afterwords, will there be a benefit?

    I understand that if the table is ever optimized, the process will force the records to be physically sort themselves.

    So did I really gain anything by loading without an index and will I gain anything by indexing later?

    Am assuming if the table is ever optimized, I need to leave town for a while so the server crash will have smoothed over by the time I come back

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply