Speeding up Bulk Insert

  • Hi all

    I have a bulk insert query that runs every 8 or so seconds, inserting 1000 records at a time. If I run it on a 2GB machine the hard disk light is constantly on. What can I do to speed this up?

    Should I modify the table's indexes somehow? Have locking hints on the bulk insert statement? Should I wait until I have bigger batch sizes? Would increasing the amount of RAM improve things radically? Should I reduce the number of indexes?

    A job with 10 000 records took 20 seconds when the table had less than 2 000 000 records. Now, on 6 000 000+, it's down to about 100 seconds per 10 000 records. Is there some maintenance I should run periodically, besides update statistics?

    Here is the table:

    CREATE TABLE BodyPos (

    FrameID int NOT NULL,

    BodyIndex int NOT NULL,

    X float NULL,

    FrameletID int NULL,

    Y float NULL,

    Vx float NULL,

    Vy float NULL

    )

    go

    CREATE UNIQUE INDEX XPKBodyPos ON BodyPos

    (

    FrameID,

    BodyIndex

    )

    go

    CREATE INDEX XIF49BodyPos ON BodyPos

    (

    FrameID

    )

    go

    CREATE INDEX XIF53BodyPos ON BodyPos

    (

    BodyIndex

    )

    go

    CREATE INDEX XIF55BodyPos ON BodyPos

    (

    FrameletID

    )

    go

    My Bulk Insert SP (extract) - I suspect this could be optimised. I do double insert, but I'm not sure how to insert traight from the CSV file into my table:

    IF OBJECT_ID('tempdb..#FrameletIn') IS NOT NULL DROP TABLE #FrameletIn

    create table #FrameletIn (

    BodyIndex int null,

    Mass float null,

    X float null,

    Y float null,

    Vx float null,

    Vy float null)

    set @SQL = 'BULK INSERT #FrameletIn FROM '''+@FileName+''' WITH (FIELDTERMINATOR = '','',ROWTERMINATOR = '''')'

    exec (@SQL)

    delete from BodyPos where FrameletID=@FrameletID

    insert into BodyPos(FrameID,BodyIndex,FrameletID,X,Y,Vx,Vy)

    select @FrameID,BodyIndex,@FrameletID,X,Y,Vx,Vy

    from #FrameletIn

    drop table #FrameletIn

    Many thanks

  • Try dropping or disabling your indexes before the insert.

    http://stackoverflow.com/questions/751039/slow-bulk-insert-for-table-with-many-indexes

  • Smarties (7/1/2009)


    Try dropping or disabling your indexes before the insert.

    http://stackoverflow.com/questions/751039/slow-bulk-insert-for-table-with-many-indexes

    Thanks, I can hold off any queries on the table until the insert is complete, but the rebuild of indexes afterward will take ages. I can't do this within the 10 second timespan of every insert. I need the indexes once the inserts are done.

    This assumes I need to do a rebuild after re-enabling the indexes??

  • hein (6/30/2009)


    A job with 10 000 records took 20 seconds when the table had less than 2 000 000 records. Now, on 6 000 000+, it's down to about 100 seconds per 10 000 records.

    Hi,

    Your job having the both bulk insert and the delete/insert into statement???,

    ensure the Bulk insert itself time taken,

    or the delete/ insert into statement takes the time

    from the post statement,don't delete the data from BodyPos

    (Just update-old data/insert-new data)

    try this

    insert into BodyPos(FrameID,BodyIndex,FrameletID,X,Y,Vx,Vy)

    select @FrameID,BodyIndex,@FrameletID,X,Y,Vx,Vy

    from #FrameletIn

    where FrameletID@FrameletID

    update BodyPos

    set BodyIndex = BodyIndex,

    FrameletID = FrameletID,

    X=X,...

    from ..

    where FrameletID=@FrameletID

    ARUN SAS

  • Just an extension to arun's fine approach. Use an OUTPUT statement to get the the handled IDs to INSERT the new data without joining the large destination table.

    You should also try a SSIS package. Your current solution copies your data twice. First into the temp table and second into your destination table. You could try to use a SSIS job which works in memory without copying data twice to disk.

  • Use Batch Update if you are updating/inserting records from ASP.Net.

  • naresh.patel3386 (7/1/2009)


    Use Batch Update if you are updating/inserting records from ASP.Net.

    Way way slower than bulk insert. SQL Server 2008 offers great new features to handle this, but SQL Server 2005 has good option to handle bulk operations from client side. Have a look to this article:

    http://www.sqlservercentral.com/articles/SQL+Server+2008/66554/

  • Add BatchSize option for your BULK INSERT operation.

    Set to approx 1000 records.


    N 56°04'39.16"
    E 12°55'05.25"

  • Thank you Arun

    I forgot to remove the delete from BodyPos line. I've only had it there while trying to figure out why I get duplicate keys sometimes.

    Is there a way to insert straight into my BodyPos destination table from the CSV file using Bulk Insert?

    That way I can avoid using the temp table and additional insert.

    Sorry I'm not familiar with the bulk insert method.

    BTW - an index rebuild took 5 hours :o. I'll see if I can get rid of some.

    Many thanks again.

  • Peso (7/1/2009)


    Add BatchSize option for your BULK INSERT operation.

    Set to approx 1000 records.

    I believe BatchSize's default is the entire bulk insert size by default? I insert 1000 records at a time.

  • Florian Reischl (7/1/2009)


    Just an extension to arun's fine approach. Use an OUTPUT statement to get the the handled IDs to INSERT the new data without joining the large destination table.

    You should also try a SSIS package. Your current solution copies your data twice. First into the temp table and second into your destination table. You could try to use a SSIS job which works in memory without copying data twice to disk.

    Thanks Florian

    I dont need the delete from BodyPos - it was just a cheat around a problem I fixed in the mean time. I expect to have correct index values on every insert.

    Are there any pointers to learning SSIS you could refer me to?

    Many thanks

  • hein,

    didi you try SSIS as I mentioned? Maybe this will handle your problem.

    Another tool you can try is Microsoft LogFile Parser

  • Thanks Florian

    I'm going to look at SSIS definately.

    In the mean time, I've dropped 2 indexes I don't use at all (my ER tool created them autmatically).

    Performance is now up from 25 000 milliseconds per insert to 100!! Crazy. I had to check if there records are actually there...

    Many thanks again all

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply