Performance improvement with BULK INSERT

Question

Post reply

Performance improvement with BULK INSERT

arjun.tewari

Ten Centuries

Points: 1013
More actions
September 30, 2008 at 5:28 am

#382890

Hey guys,
I am currently stuck with the task of improvement the performance of my data load stored procedures. Currently, we get major data from outside vendors in flat files, from where we pull it in our OLTP system using BULK INSERT.
At times, the data need to be imported goes above 20 million rows. All non-clustered indexes are removed from destination table before performing the data load. Presently, it takes somewhere around 2 hours to get the data in the system
Can any one please guide me about how to go to improve the performance? Will switching to SSIS or making some architecture changes improve the performance?
PS: Please ask for further information.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

Richard Fryar Hall of Fame Points: 3457 More actions · Answer 1

There are a few tips here. It refers to SQL 2000, but the principles are the same.

http://msdn.microsoft.com/en-us/library/aa178096.aspx

In my experience, setting ROWS_PER_BATCH and using BULK LOGGED recovery model have been particularly effective.

Check Your SQL Servers Quickly and Easily
www.sqlcopilot.com

Nikhil Shikarkhane Default port Points: 1498 More actions · Answer 2

If i understand this correctly, your table is not available for 2 hours because you are loading data into it ?

How about partitioning the table ?

1. Partition your main table

2. Create an staging table with same structure as main table but single partition.

3. Load the 20M rows in the staging table. This will take same amount of time i suppose.

Note: Your main table is still available and accessible.

4. Switch the staging partition into the main table. ( this will be in few secs )

You are done with 0 down time on your main table. Of course your IO was utilized for 2 hrs but no one was getting blocked all this while :).

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 3

Yep... I absolutely agree with Richard that batch size is important... trying to import 20 million rows will drive the log file absolutely nuts if you don't do the Bulk Insert the right way even if the DB is set for Bulk Logged.

But, there's a lot of conditions to actually pull off a minimally logged Bulk Insert... folks really need to lookup in Books Online and see what all of those conditions are instead of just assuming.

Done correctly, I've used Bulk Insert to import 5.1 million rows of 20 columns in 60 seconds flat, but ya gotta check Books Online to meet all of the conditions for minimally logged imports.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)