Sql Insert in Batches

Question

Sql Insert in Batches

er.mayankshukla

SSCarpal Tunnel

Points: 4190
More actions
March 13, 2015 at 2:34 am

#315652

Hi Experts,
I use Batch wise delete in my procedures as they help me to reduce log size.
As we know the deletes causes ghost records, so commiting in batches makes sense.
However, is Insert in batches also a good practice if I am not using any transaction in my procedures.
As soon as a row is inserted in the destination table it gets implicitly committed, so log should not be maintained.
Any ideas on this

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

spaghettidba SSC Guru Points: 105732 More actions · Answer 1

It really depends on what's important for you. If you need to be able to rollback changes if something goes wrong during any of the inserts, a single bulk insert is preferable.

If you're ok with single rows being available as soon as inserted, processing them one by one might be the right thing to do.

-- Gianluca Sartori

er.mayankshukla SSCarpal Tunnel Points: 4190 More actions · Answer 2

Let me try to simplify my question.

Lets assume I am not using any transactions, and there are two cases i.e. for Delete and Insert

Case 1. Delete.

If I am deleting 1 million records its preferable to perform this in Batches so that the ghost cleanup process can remove the ghost records and free up the space on the data page.

As we know the delete doesn't physically deletes the record, but marks them as Ghost Record.

Case 2. Insert

If I am inserting 1 million record to some destination table, what difference does it make if I am inserting in in one statement like

insert into Dest select * from Source

Or If I start a do while loop and insert in batches (say 10000 records)

spaghettidba SSC Guru Points: 105732 More actions · Answer 3

It depends on many factors.

The main issue with deleting rows in bulk is not the ghost record status and its cleanup, but 1) the fact that the transaction log might grow very large during the operation and 2) other sessions might get blocked during the delete operation.

In the same way, INSERTs generate log records and you might want to control the size of each transaction (each statement is in its own transaction). If you don't have enough room in the transaction log file, the transaction will fail and will have to rollback all the changes. With smaller batches and frequent transaction log backups, you can control the size of the transaction log file.

From a performance standpoint, inserting rows from another source is made of the INSERT operation itself and locating the rows to insert. If the SELECT part of your statement is expensive (not just a plain SELECT something FROM sometable), breaking the process in smaller batches could end up being much more expensive than simply inserting all the rows in a single pass. You would also have to take care of some logic to identify the rows that are already in the destination table and skip them in subsequent iterations of the loop.

-- Gianluca Sartori

er.mayankshukla SSCarpal Tunnel Points: 4190 More actions · Answer 4

er.mayankshukla

SSCarpal Tunnel

Points: 4190

March 21, 2015 at 8:24 am

#1785512

Thank you , I got your point

Jeff Moden SSC Guru Points: 1004362 More actions · Answer 5

Also be aware of the "tipping point" that every server has. Depending on a seemingly intangible and tangible set of multiple factors, 1 million rows might insert in just 3 seconds. As expected, 2 million rows would take 6 seconds and 3 million rows would only take 9 seconds (all depending on where the tipping point is). But, 4 million rows might drive the system over the tipping point and instead of taking only the expected 12 seconds, could take (for example) 20 minutes or even 2 hours depending on your machine, other traffic, and a host of other factors.

And yeah... even if you're in the SIMPLE recovery mode, doing a straight insert of 4 million rows is going to be fully logged and so the log file will grow to accommodate at least all that data if it's not already big enough. Chances are that it'll be even bigger than the 4 million rows if other people are doing inserts, updates, or deletes.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eirikur Eiriksson SSC Guru Points: 182912 More actions · Answer 6

er.mayankshukla (3/13/2015)
Hi Experts,
I use Batch wise delete in my procedures as they help me to reduce log size.
As we know the deletes causes ghost records, so commiting in batches makes sense.
However, is Insert in batches also a good practice if I am not using any transaction in my procedures.
As soon as a row is inserted in the destination table it gets implicitly committed, so log should not be maintained.
Any ideas on this

Quick question, what SQL Server Edition is this? If you are on Enterprise Edition then the most efficient way is to partition the table, push the "deleted" records into a separate partition and purge that partition.

😎

Jeff Moden SSC Guru Points: 1004362 More actions · Answer 7

Eirikur Eiriksson (3/22/2015)
er.mayankshukla (3/13/2015)
Hi Experts,
I use Batch wise delete in my procedures as they help me to reduce log size.
As we know the deletes causes ghost records, so commiting in batches makes sense.
However, is Insert in batches also a good practice if I am not using any transaction in my procedures.
As soon as a row is inserted in the destination table it gets implicitly committed, so log should not be maintained.
Any ideas on this
Quick question, what SQL Server Edition is this? If you are on Enterprise Edition then the most efficient way is to partition the table, push the "deleted" records into a separate partition and purge that partition.
😎

You can do similar and nearly as effectively (especially with deletes) in the Standard Edition using Partitioned Views. A lot of people look down their nose at partitioned views but they are just as effective as Partitioned Tables unless you have an IDENTITY column and even that has a workaround. You just can't have as many partitions in Partitioned Views as you can with Partitioned Tables.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)