Transaction Log Growth during BULK INSERT with Recovery Model set to SIMPLE

  • Hi there,

    I hope this is the right forum for this, if not I apologize, just let me know where posting would be more appropriate.

    I'm in the process of designing a split/migration process for a database I have. What I've done is that I've chosen to go with a BCP queryout to output the contents of each table in csv files and then use BULK INSERT commands to load into the destination database. The destination database recovery model was set to SIMPLE in order to ease the transaction log activity. I've added commands between each BULK INSERT statement so print out the size of the transaction log, and after each BULK INSERT I have a GO to start a fresh batch and hopefully reuse the log from the previous batch (my command uses a 50,000 rows per batch parameter as well).

    My question is the following: one of my tables is fairly large (the exported csv files takes about 4Gb) and after its import, my log file goes from 1000Mb (original size) to 48300Mb. Now I can handle that in terms of disk space and I could probably restrict the maximum log file size to whatever it is I want in the database settings, but I can't really understand why there is such an exponential growth in the log file because of that table, and despite the fact that I set a 50,000 rows per batch... AND despite the fact that I have another table, which takes up 3Gb in CSV and does not cause any growth in my log file (unless my query for sys.sysfiles cannot be considered a good indicator if the growth is delayed...?) ? Does SQL load the entire file and somewhat parses it in the transaction log before doing the whole 50,000 rows per batch thing?

    But even then, why so big ? :unsure:

    Anyway, any help as to why this could be happening would be greatly appreciated.

    Many thanks,

    Greg

  • Hello,

    There are other pre-requisites to ensure Bulk Insert operations are minimally logged, apart from setting the Recovery Model. I would assume from the growth of your Transaction Log that one or more pre-requisites are not being met.

    Please see the BOL Topic “Prerequisites for Minimal Logging in Bulk Import”. The notes on using batches might be particularly relevant in your scenario.

    Regards,

    John Marsh

    www.sql.lu
    SQL Server Luxembourg User Group

  • Hi,

    I've read the link topic you are referring to. Here's what I got:

    - Replication: Not used in that case

    - 1 batch per table: yes (there's an explicit GO between each BULK INSERT command)

    - Use of TABLOCK: yes

    - Indexing: I had kept the clustered index but disabled the nonclustered indexes (they were getting rebuilt after the whole import of all tables was over). I'm gonna check if that changes if I disable the clustered index... From what I read that could be the culprit (I'm going a bit by elimination here...).

    I'll post back my results once I've tried that!

    Thanks,

    Greg

    John Marsh (4/18/2009)


    Hello,

    There are other pre-requisites to ensure Bulk Insert operations are minimally logged, apart from setting the Recovery Model. I would assume from the growth of your Transaction Log that one or more pre-requisites are not being met.

    Please see the BOL Topic “Prerequisites for Minimal Logging in Bulk Import”. The notes on using batches might be particularly relevant in your scenario.

    Regards,

    John Marsh

  • Mmh... Coming to think of it, it's a bit more complicated than just disabling my clustered indexes like I already do for the nonclustered ones... If I disable it I cannot access the table, even via BULK INSERT commands. I'll need to drop them and re-create them post-import. This is kinda frustrating actually. I'll try it out on monday just for that table that seemed to make the log file grow so much and will keep posted on my findings.

    Greg

  • Hello Greg,

    You are right about the Clustered Index:-

    1) As the Table data is contained within the Clustered Index, you are no longer able to access the Table if it is disabled.

    2) Assuming that there is already data in the Table when you start the Bulk Insert, then having a Clustered Index will cause the Bulk Insert to be fully logged. Even if the Table is empty to start with, if you use the BatchSize Keyword then only the first Batch will be minimally logged, the rest will be fully logged.

    Regards,

    John Marsh

    www.sql.lu
    SQL Server Luxembourg User Group

  • Hi there,

    I ended up not waiting and doing a test for that large table by dropping the clustered index first and then re-creating it. My log "only" grew to about 40Gb instead of 48Gb. I'm assuming that if I was doing that for all my tables I would end up with something noticeably smaller than that. I'm going to have to see whether we can have a larger log during the import or if we forget about the batch size hint, which did give me some speed improvements.

    Thanks a lot for the help everybody. At least I understand better what's going on now 🙂

    Greg

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply