Maximum Insert Commit Size

Question

Maximum Insert Commit Size

SonTac

Old Hand

Points: 332
More actions
November 23, 2011 at 2:15 pm

#247389

I need to load a flat file of 12 million records to a table. I want it to be All or None as an success or failure.
So if I set Maximum insert commit size = 1,000,000 ( one million) so it will need to commit 12 batches to finish.
My question is: If package failed on the 3rd batch, does it mean that my table would have 2 million rows committed even though I have set the Data Flow's TransactionOption = 'Required'? or would it rollback everything and my table had 0 row.
My goal is that I want the package to be ALL or NONE but I also don't want lock up the table for too long. Any help is appreciated.

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004685 More actions · Answer 1

My goal is that I want the package to be ALL or NONE but I also don't want lock up the table for too long. Any help is appreciated

Then don't try to insert directly into the final table until you know whether all the data will fly or not. A rollback on a direct insert will take comparative ages to occur. Instead, load up a working table and prequalify all the rows. If they don't prequalify, then don't even start an insert. If they do prequalify, then you can insert in any size batch you want.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

SonTac Old Hand Points: 332 More actions · Answer 2

Thanks Jeff.

Do you have the answer for my question? 2 million or 0 row?

My gut tells me that I have to set the Maximum Insert Commit Size = 12 millions OR Higher and take the long lock on table if I want it All or None.

Your suggestion should work perfectly but a failure could be any thing beyond just data failure (some idiot unplugs the power while it is still running perhapse) LOL. would the commited batches be permanently inserted or they would be all rollbacked due to the Data Flow Task was set as TransactionOption = Required?

Jeff Moden SSC Guru Points: 1004685 More actions · Answer 3

Heh... if "some idiot [can unplug] the power while it is still running", then you may have larger problems at hand. 😀 But, I get what you mean.

I don't actually know how to do the following in SSIS unless you have a process that calls a stored procedure, but the method is very, very effective.

1. Load the 12 million rows into a new, uniquely named (perhaps with a date appended to the table name) table.

2. Beat the tar out of the data to pre-qualify it in your "all or nothing" manner. Since this is being done in a separate table, the will be very little lock contention with the table that you'll ultimately try to "insert" the data into. If any data doesn't pre-qualify, produce an "errata" report from the table and then simply delete the table and you're done... no need to worry about rollbacks or anything. If all of the data pre-qualifies, go to the next step.

3. If all of the data pre-qualifies ("ALL" has happened), then depending on which edition of SQL Server you have, either make it a part of the "table paritioning" for the main table (Enterprise Edition) or quickly rebuild a "partitioned view" to include the table with the other existing table(s). Either way, the total "lock time" on the existing data should be about 65 milliseconds.

To answer your question about what the batch size should be, the answer is "There shouldn't be INSERTs at all". You should simply "attach" a new table of information to the "table partition" or the "partitioned view". 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Koen Verbeeck SSC Guru Points: 259197 More actions · Answer 4

I would go with Jeff's solution. If you want the all or nothing approach in SSIS, you need to incorporate transactions, which can take up quite some time to rollback, especially with such a large dataset. The locking cost will also be too high. Partition switching is imho the most efficient solution.

Regarding your initial question: if you want to be 100% sure about the all or nothing, I would leave the maximum insert commit size at the default value (which is the highest number an integer can hold. Somewhere around 2 billion). If you have transactions enabled, it will all rollback (but have a long locking time).

You could also enforce transactions by placing an Execute SQL Task with BEGIN TRAN before your dataflow and an Execute SQL Task with COMMIT after the dataflow. Put your connectionmanager to RetainSameConnection to TRUE, and you are enforcing transactions on the database level. This way, the data will be rolled back even if someone takes the plug out 🙂 (when SQL Server restarts it will kick the unfinished transaction out of the transaction log)

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

SonTac Old Hand Points: 332 More actions · Answer 5

Thank you Koen and Jeff for the great advise. I'll look into the partition approach. I think that is the best option for me.

Jeff Moden SSC Guru Points: 1004685 More actions · Answer 6

Thanks for the feedback, SonTac.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)