How to commit transaction log during a large ETL process

Question

Post reply

How to commit transaction log during a large ETL process

Lord Slaagh

SSCertifiable

Points: 5745
More actions
September 16, 2024 at 2:45 pm
Go to Answer
#4458745

Hello SSC!
First, I would like to thank everyone who has responded to my posts. This website has helped me tremendously over the years!
My question...
I have a large ETL process that I inherited. I have been tasked with optimizing this process which uses SSIS and plenty of stored procedures. The source tables are very large, over 500 million rows.
Is there a way to commit the transaction log after a proc executes, so the log doesn't fill up and kill the job? I know the keyword "COMMIT" supposedly helps, but I have had instances where it doesn't do what's expected or I did not use it correctly. I am a developer who has dabbled in the Admin arts, but certainly I am no expert. So, I ask the SSC experts. Looking forward to your responses.
Any assistance would be greatly appreciated!
Dave
The are no problems, only solutions. --John Lennon
Jeffrey Williams

SSC Guru

Points: 90305
More actions
September 16, 2024 at 8:14 pm
Answer
#4458851

In the OLEDB Destination - in SSIS - you control the batch and commit sizes. If you leave those as default then all of the rows processed through the pipeline will be committed in a single transaction.
You can set a reasonable commit size and after that many rows have been inserted - a commit will be issued. Not only will this manage the space in the transaction log, it can dramatically improve the performance of the data load.
Also check the batch size - adjust as needed.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
September 25, 2024 at 7:28 am

This was removed by the editor as SPAM

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Jo Pattyn SSC-Dedicated Points: 32350 More actions · Answer 1

Is there a way to commit the transaction log after a proc executes, so the log doesn't fill up and kill the job:

Smaller transactions: as long the transaction is running the log of it can't be truncated as it needed for rollback

Regular log backups: archive non-active portions of logfile

Different logmodels: full (point in time), bulk-logged (as for etl-loading processes), simple (no point in time)

https://www.mssqltips.com/sqlservertutorial/5/sql-server-bulk-logged-recovery-model

Lord Slaagh SSCertifiable Points: 5745 More actions · Answer 2

Thank you Jo,

This is very helpful. The recovery Model is set to Simple, so I don't think that is the issue. I will take executing in batches into consideration, the only issue is that I have a lot of procs to consider.

I am really looking for a way to commit everything in the transaction log after a proc completes successfully. Not sure if that is possible or recommended.

The are no problems, only solutions. --John Lennon

Ed B Hall of Fame Points: 3669 More actions · Answer 3

Lord Slaagh wrote:

I am really looking for a way to commit everything in the transaction log after a proc completes successfully. Not sure if that is possible or recommended.

Simple recovery mode already auto-commits transactions as soon as they complete, so there is no need to do it after a procedure finishes unless you have created an explicit transaction before executing the procedure.

Do you have a problem with log space? If so then I would identify what step is blowing up the log and think about how do the same thing in smaller chunks. You mentioned stored procedures so I assume the ETL is using execute sql tasks not data flows. is this correct? Data flows automatically commit data in batches, but if you have sql code performing large inserts you may need to make the inserts smaller. It's possible you could take advantage of bulk logged inserts, but that's unlikely to be a single solution.

Lord Slaagh SSCertifiable Points: 5745 More actions · Answer 4

Hi Jeffery,

Responding here for other users with the same issue....

Changing the commit size and row batch worked like a charm! Plus, I dropped the indexes on the table before loading and recreated them after the table was loaded. Forgot that loading a table with indexes causes issues. Indexes should be used for SELECTS and not INSERTS.

Thanks again Jeffery and the folks @ SSC!

Dave

This reply was modified 3 months, 1 week ago by Lord Slaagh.

The are no problems, only solutions. --John Lennon

Jeffrey Williams SSC Guru Points: 90305 More actions · Answer 5

Thank you for the update - glad this worked for you.

FYI - my approach is to disable the non-clustered indexes and rebuild all indexes after the data loads. I only do that if I can see a clear improvement in the overall process - but in general, rebuilding all indexes after a load works well. If the loads are more than once a day I probably would not disable/rebuild every time - it really depends on the workflow.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs