September 30, 2008 at 5:28 am
Hey guys,
I am currently stuck with the task of improvement the performance of my data load stored procedures. Currently, we get major data from outside vendors in flat files, from where we pull it in our OLTP system using BULK INSERT.
At times, the data need to be imported goes above 20 million rows. All non-clustered indexes are removed from destination table before performing the data load. Presently, it takes somewhere around 2 hours to get the data in the system
Can any one please guide me about how to go to improve the performance? Will switching to SSIS or making some architecture changes improve the performance?
PS: Please ask for further information.
September 30, 2008 at 4:03 pm
There are a few tips here. It refers to SQL 2000, but the principles are the same.
http://msdn.microsoft.com/en-us/library/aa178096.aspx
In my experience, setting ROWS_PER_BATCH and using BULK LOGGED recovery model have been particularly effective.
October 3, 2008 at 5:33 pm
If i understand this correctly, your table is not available for 2 hours because you are loading data into it ?
How about partitioning the table ?
1. Partition your main table
2. Create an staging table with same structure as main table but single partition.
3. Load the 20M rows in the staging table. This will take same amount of time i suppose.
Note: Your main table is still available and accessible.
4. Switch the staging partition into the main table. ( this will be in few secs )
You are done with 0 down time on your main table. Of course your IO was utilized for 2 hrs but no one was getting blocked all this while :).
October 3, 2008 at 5:59 pm
Yep... I absolutely agree with Richard that batch size is important... trying to import 20 million rows will drive the log file absolutely nuts if you don't do the Bulk Insert the right way even if the DB is set for Bulk Logged.
But, there's a lot of conditions to actually pull off a minimally logged Bulk Insert... folks really need to lookup in Books Online and see what all of those conditions are instead of just assuming.
Done correctly, I've used Bulk Insert to import 5.1 million rows of 20 columns in 60 seconds flat, but ya gotta check Books Online to meet all of the conditions for minimally logged imports.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply