Performance improvement with BULK INSERT

  • Hey guys,

    I am currently stuck with the task of improvement the performance of my data load stored procedures. Currently, we get major data from outside vendors in flat files, from where we pull it in our OLTP system using BULK INSERT.

    At times, the data need to be imported goes above 20 million rows. All non-clustered indexes are removed from destination table before performing the data load. Presently, it takes somewhere around 2 hours to get the data in the system

    Can any one please guide me about how to go to improve the performance? Will switching to SSIS or making some architecture changes improve the performance?

    PS: Please ask for further information.

  • There are a few tips here. It refers to SQL 2000, but the principles are the same.

    http://msdn.microsoft.com/en-us/library/aa178096.aspx

    In my experience, setting ROWS_PER_BATCH and using BULK LOGGED recovery model have been particularly effective.

  • If i understand this correctly, your table is not available for 2 hours because you are loading data into it ?

    How about partitioning the table ?

    1. Partition your main table

    2. Create an staging table with same structure as main table but single partition.

    3. Load the 20M rows in the staging table. This will take same amount of time i suppose.

    Note: Your main table is still available and accessible.

    4. Switch the staging partition into the main table. ( this will be in few secs )

    You are done with 0 down time on your main table. Of course your IO was utilized for 2 hrs but no one was getting blocked all this while :).

  • Yep... I absolutely agree with Richard that batch size is important... trying to import 20 million rows will drive the log file absolutely nuts if you don't do the Bulk Insert the right way even if the DB is set for Bulk Logged.

    But, there's a lot of conditions to actually pull off a minimally logged Bulk Insert... folks really need to lookup in Books Online and see what all of those conditions are instead of just assuming.

    Done correctly, I've used Bulk Insert to import 5.1 million rows of 20 columns in 60 seconds flat, but ya gotta check Books Online to meet all of the conditions for minimally logged imports.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply