SSIS data flow vs insert into..select

  • HI,

    I am doing some tests to see what is the best way to move a 40,000,000 rows (6GB) from one database to another one. The table has a primary key.

    Where I do "insert db2.table_dest.... select .... from db1.table_ori" the entire move take 2 minutes. When I use a data flow, it takes more then 8 minutes. I have tried to modify DefaultBufferSize DefaultBuffermaxRow without succes. In both case, the primary key on the destination table is there before the start of the transfer.

    How I can improve the performance on the data flow?

    Thanks

  • you can use fast load on the destination,

    make sure you use a select statement in the source rather than connect directly to the table/view.

    make sure you are doing no uneccesary sorting or grouping in the data flow.

  • Thanks SteveB,

    I am already using fast load on the destination, a select statement at the source (without grouping and sorting)

    I found this strange that data flow is much slower than insert/select.

    Should I use insert/select instead if data flow?

    Should I remove the primary key at destination and create an cluster index instead after the load?

    Thanks

  • I find it strange that the difference is that great, I can usally get data flows to match the performance of insert/select give or take a few MS..

    Perhaps change the buffer sizes back to the defaults and try it again.

    but if you can't get the time down to an acceptable limit, then I would go ahead and use insert/select

  • When I remove the primary key on my destination table, the execution time is decent. I need to have this for performance issue. This is my fact table and I will have a few views connecting this table for reporting issue.

    Should I recreate the PK after the load with the data flow or create a unique cluster index instead?

    But I still don't understand why the insert/select works fast even with the PK.

  • Rem70Rem (9/7/2011)


    When I remove the primary key on my destination table, the execution time is decent. I need to have this for performance issue. This is my fact table and I will have a few views connecting this table for reporting issue.

    Should I recreate the PK after the load with the data flow or create a unique cluster index instead?

    But I still don't understand why the insert/select works fast even with the PK.

    The extra time in the data flow is because of the fact that the clustered index has to be updated for every batch of records inserted.

    When you do this through a select into statement, there is only one update to the clustered index (one "batch" is inserted).

    If you want the performance to be on par, you will have to run the transfer without the clustered index in your destination...and recreate it afterwards.

    Another possibility is to adjust the max insert commit size. A value of zero would commit all rows at once.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply