copying a billion row lines

  • Any body has any experience dealing with copying a billion row lines of text from one table to another, especially in prod environment?

    Thanks

     

    Shas3

  • not that big. Done a few million, but that's it.

  • Done about 8 million text/integer using BCP which was fairly fast and painless, used the Import Data Wizard for a 3 million row table - which was a bad thing as it took a long time and the transaction log nearly took the log disk out. (And shrinking it took a long time toooooooo......)

    Have never tried putting a large table into a separate filegroup and backing that up then restoring to different database/filegroup but keen to try...wonder if it would be a good thing ??

    Experiment, then tell us .....

     


    The systems fine with no users loggged in. Can we keep it that way ?br>

  • I think bcp is your best bet.

    bcp out using native format. Remove all indexes before you bcp in (unless the data is sorted in your clustered index order, in which case you can leave this in place). Rebuild indexes after the load has completed.

    Still going to take a long time and a lot of disk space.

    Andy

  • I've not hit the magic billion but I have done 100's millions.

    I used to take the production server down for a morning when I did it.

    My approach was as follows

    • Calculate the size of the table to be copied.  See BOL and in the index type "calculating table size" the top three entries are the valid ones.
    • Expand the target database by that size.  You don't want the database autoresizing when this baby runs!
    • If you use trunctate log on checkpoint then switch it off.
    • Expand the transaction log, again you don't want it auto-resizing.
    • Drop all indices/constraints/primary keys/triggers on the target database etc
    • Use DTS Wizard to build and run the insert.
    • Rebuild indices/constraint/primary keys/triggers etc.
    • You may want to truncate the transaction log and DBCC SHRINKFILE it.

    I'm a bit rusty on the process because it is something that I did in my last job, over 4 years ago but I seem to remember I had to expand TEMPDB for the reindexing part of the process.  Again, you don't want autoresizing to take place whilst the job takes place.

    I also had to kick everyone off the database server when I did the process.

    The choice was stay off the server and have everything back by lunchtime, or stay on the server and this urgent data might be ready next week.

  • I agree with David's steps, but with two additional things you might consider, if you're exporting the data out to a flat file first...

    1. If the table you're inserting into is clustered, you may want to presort the data. And if it is clustered, you wouldn't want to drop that index on the 'to' table when you insert the data.
    2. You may want to try the BULK INSERT command instead of BCP - it is slightly more efficient. Be sure to include the TABLOCK and ORDER clauses for minimal logging - and make sure the database is in either simple or bulk-logged recovery mode.

    David is correct too. TEMPDB will expand when you rebuild indexes on tables that large, so be conservative on its resize. It's easier to shrink it later than let it expand during the CREATE INDEX statements.

    Good Luck

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply